Thu Dec 4 12:23:51 CET 2014

Shrinking stage3

Today I wanted to fetch a new stage3. And I had many disappoint: Sooo big. 200MB for a stage3?
So I started cutting things up and filed some bugs, and the results are nice:
Starting point: stage3-amd64-20141127
Original:   207758877 bytes / 199M
Shrinked-1: 165755734 bytes / 159M
Shrinked-2: 161092189 bytes / 154M
Shrinked-3: 112635928 bytes / 108M
Shrinked-4: 109833196 bytes / 105M
What changed?
First step: Remove two of three python implementations. I think having only python2 is enough for a start - that's a bit over 150MB unpacked just gone.
Changes needed: PYTHON_TARGETS and USE="-python3" during build

Second: Build pkgconfig with USE="internal-glib". This removes the dependency on glib and its mad ball of dependencies for another ~40MB on-disk.

Repacking that gives Shrinked-1
Then I noticed that Python installs tests unconditionally. This is not needed at runtime and barely needed at buildtime, for another 28MB saved. That's Shrinked-2.

Shrinked-3 is Shrinked-1 with xz instead of bzip2; Shrinked-4 is Shrinked-2 with xz instead of bzip2.

So there, within a few minutes I've halved the size of the tarball by pruning unneeded bits ...

Posted by Patrick | Permalink

Sun Nov 23 09:26:01 CET 2014

DBus, FreeDesktop, and lots of madness

For various reasons I've spent a bit of time dissecting how dbus is supposed to work. It's a rather funny game, but it is confusing to see people trying to use these things. It starts quite hilarious, the official documentation (version 0.25) says:
The D-Bus protocol is frozen (only compatible extensions are allowed) as of November 8, 2006. However, this specification could still use a fair bit of work to make interoperable reimplementation possible without reference to the D-Bus reference implementation. Thus, this specification is not marked 1.0
Ahem. After over a decade (first release in Sept. 2003!!) people still haven't documented what it actually does.
Allrighty then!
So we start reading:
"D-Bus is low-overhead because it uses a binary protocol"
then
"Immediately after connecting to the server, the client must send a single nul byte."
followed by
"A nul byte in any context other than the initial byte is an error; the protocol is ASCII-only."
Mmmh. Hmmm. Whut?
So anyway, let's not get confused ... we continue:
The string-like types are basic types with a variable length. The value of any string-like type is conceptually 0 or more Unicode codepoints encoded in UTF-8, none of which may be U+0000. The UTF-8 text must be validated strictly: in particular, it must not contain overlong sequences or codepoints above U+10FFFF. Since D-Bus Specification version 0.21, in accordance with Unicode Corrigendum #9, the "noncharacters" U+FDD0..U+FDEF, U+nFFFE and U+nFFFF are allowed in UTF-8 strings (but note that older versions of D-Bus rejected these noncharacters).
So there seems to be some confusion what things like "binary" mean, and UTF-8 seems to be quite challenging too, but no worry: At least we are endian-proof!
A block of bytes has an associated byte order. The byte order has to be discovered in some way; for D-Bus messages, the byte order is part of the message header as described in the section called “Message Format”. For now, assume that the byte order is known to be either little endian or big endian.
Hmm? Why not just define network byte order and be happy? Well ... we're even smarterer:
The signature of the header is: "yyyyuua(yv)"
Ok, so ...
1st BYTE Endianness flag; ASCII 'l' for little-endian or ASCII 'B' for big-endian. Both header and body are in this endianness.
We actually waste a BYTE on each message to encode endianness, because ... uhm ... we run on ... have been ... I don't get it. And why a full byte (with a random ASCII mnemonic) instead of a bit? The whole 32bit bytemash at the beginning of the header could be collapsed into 8 bit if it were designed. Of course performance will look silly if you use an ASCII protocol over the wire with generous padding ... so let's kdbus because performance. What teh? Just reading this "spec" makes me want to get more drunk.

Here's a radical idea: Fix the wire protocol to be more sane, then fix the configuration format to not be hilarious XML madness (which the spec says is bad, so what were the authors not thinking?)
But enough about this idiocy, let's go up the stack one layer. The freedesktop wiki uses gdbus output as an API dump (would be boring if it were an obvious format), so we have a look at it:
      @org.freedesktop.systemd1.Privileged("true")
      UnlockSessions();
Looking through the man page there's no documentation what a line beginning with "@" means. Because it's obvious!@113###
So we read through the gdbus sourcecode and start crying (thanks glib, I really needed to be reminded that there are worse coders than me). And finally we can correlate it with "Annotations"

Back to the spec:
Method, interface, property, and signal elements may have "annotations", which are generic key/value pairs of metadata. They are similar conceptually to Java's annotations and C# attributes.
I have no idea what that means, but I guess as the name implies it's just a textual hint for the developer. Or not? Great to see a specification not define its own terms.

So what I interpret into this fuzzy text is most likely very wrong, and someone should define these items in the spec in a way that can be understood before you understood the spec. Less tautological circularity and all that ...
Let's assume then that we can ignore annotations for now ... here's our next funny:
The output of gdbus is not stable. If you were to write a dbus listener based on the documentation, well, the order of functions is random, so it's very hard to compare the wiki API dump 'documentation' with your output.
Oh great ... sigh. Grumble. Let's just do fuzzy matching then. Kinda looks similar, so that must be good enough.
(Radical thought: Shouldn't a specification be a bit less ambiguous and maybe more precise?)

Anyway. Ahem. Let's just assume we figure out a way to interact with dbus that is tolerable. Now we need to figure out what the dbus calls are supposed to do. Just for fun, we read the logind 'documentation':
      @org.freedesktop.systemd1.Privileged("true")
      CreateSession(in  u arg_0,
                    in  u arg_1,
                    in  s arg_2,
                    in  s arg_3,
                    in  s arg_4,
                    in  s arg_5,
                    in  u arg_6,
                    in  s arg_7,
                    in  s arg_8,
                    in  b arg_9,
                    in  s arg_10,
                    in  s arg_11,
                    in  a(sv) arg_12,
                    out s arg_13,
                    out o arg_14,
                    out s arg_15,
                    out h arg_16,
                    out u arg_17,
                    out s arg_18,
                    out u arg_19,
                    out b arg_20);
with the details being:
CreateSession() and ReleaseSession() may be used to open or close login sessions. These calls should never be invoked directly by clients. Creating/closing sessions is exclusively the job of PAM and its pam_systemd module.
*cough*
*nudge nudge wink wink*
Zero of twenty (!) parameters are defined in the 'documentation', and the same document says that it's an internal function that accidentally ended up in the public API (instead of, like, being, ah, a private API in a different namespace?)
Since it's actively undefined, and not to be used, a valid action on calling it would be to shut down the machine.

Dogbammit. What kind of code barf is this? Oh well. Let's try to figure out the other functions -
LockSession() asks the session with the specified ID to activate the screen lock.
And then we look at the sourcecode to learn that it actually just calls:
session_send_lock(session, streq(sd_bus_message_get_member(message), "LockSession"));
Which calls a function somewhere else which then does:
        return sd_bus_emit_signal(
                        s->manager->bus,
                        p,
                        "org.freedesktop.login1.Session",
                        lock ? "Lock" : "Unlock",
                        NULL);
So in the end it internally sends a dbus message to a different part of itself, and that sends a dbus signal that "everyone" is supposed to listen to.
And the documentation doesn't define what is supposed to happen, instead it speaks in useless general terms.

      PowerOff(in  b arg_0);
      Reboot(in  b arg_0);
      Suspend(in  b arg_0);
      Hibernate(in  b arg_0);
      HybridSleep(in  b arg_0);
Here we have a new API call for each flavour of power management. And there's this stuff:
The main purpose of these calls is that they enforce PolicyKit policy
And I have absolutely no idea about the mechanism(s) involved. Do I need to query PK myself? How does the dbus API know? Oh well, just read more code, and interpret it how you think it might work. Must be good enough.

While this exercise has been quite educational in many ways I am surprised that this undocumented early-alpha quality code base is used for anything serious. Many concepts are either not defined, or defined by the behaviour of the implementation. The APIs are ad-hoc without any obvious structure, partially redundant (what's the difference between Terminate and Kill ?), and not documented in a way that allows a reimplementation.
If this is the future I'll stay firmly stuck in the past ...

Posted by Patrick | Permalink

Sat Nov 15 05:14:02 CET 2014

Having fun with networking

Since the last minor upgrade my notebook has been misbehaving in funny ways.
I presumed that it was NetworkManager being itself, but ... this is even more fun. To quote from the manpage:
If the hostname is currently blank, (null) or localhost, or force_hostname is YES or TRUE or 1 then dhcpcd
     sets the hostname to the one supplied by the DHCP server.
Guess what. Now my hostname is 192.168.0.7, I mean 192.168.0.192.168.0.7, err...
And as a bonus this even breaks X in funny ways so that starting new apps becomes impossible. The fix?
Now the hostname is set to "localhorst". Because that's the name of the machine!111 (It doesn't have an explicit name, so localhost used to be ok)

Posted by Patrick | Permalink

Sun Nov 9 12:06:17 CET 2014

gentooJoin 2004/04/11

How time flies!
gentooJoin: 2004/04/11

Now I feel ooold

Posted by Patrick | Permalink

Wed Nov 5 09:38:47 CET 2014

Just a simple webapp, they said ...

The complexity of modern software is quite insanely insane. I just realized ...
Writing a small webapp with flask, I've had to deal with the following technologies/languages:
  • System package manager, in this case portage
  • SQL DBs, both SQLite (local testing) and PostgreSQL (production)
  • python/flask, the core of this webapp
  • jinja2, the template language usually used with it
  • HTML, because the templates don't just appear magically
  • CSS (mostly hidden in Bootstrap) to make it look sane
  • JavaScript, because dynamic shizzle
  • (flask-)sqlalchemy, ORMs are easier than writing SQL by hand when you're in a hurry
  • alembic, for DB migrations and updates
  • git, because version control
So that's about a dozen things that each would take years to master. And for a 'small' project there's not much time to learn them deeply, so we staple together what we can, learning as we go along ...

And there's an insane amount of context switching going on, you go from mangling CSS to rewriting SQL in the span of a few minutes. It's an impressive polyglot marathon, but how is this supposed to generate sustainable and high-quality results?

Und then I go home in the evening and play around with OpenCL and such things. Learning never ends - but how are we going to build things that last for more than 6 months? Too many moving parts, too much change, and never enough time to really understand what we're doing :)

Posted by Patrick | Permalink

Sun Sep 21 13:59:16 CEST 2014

bcache

My "sacrificial box", a machine reserved for any experimentation that can break stuff, has had annoyingly slow IO for a while now. I've had 3 old SATA harddisks (250GB) in a RAID5 (because I don't trust them to survive), and recently I got a cheap 64GB SSD that has become the new rootfs initially.

The performance difference between the SATA disks and the SSD is quite amazing, and the difference to a proper SSD is amazing again. Just for fun: the 3-disk RAID5 writes random data at about 1.5MB/s, the crap SSD manages ~60MB/s, and a proper SSD (e.g. Intel) easily hits over 200MB/s. So while this is not great hardware it's excellent for demonstrating performance hacks.

Recent-ish kernels finally have bcache included, so I decided to see if I can make use of it. Since creating new bcache devices is destructive I copied all data away, reformated the relevant partitions and then set up bcache. So the SSD is now 20GB rootfs, 40GB cache. The raid5 stays as it is, but gets reformated with bcache.
In code:
wipefs /dev/md0 # remove old headers to unconfuse bcache
make-bcache -C /dev/sda2 -B /dev/md0 --writeback --cache_replacement_policy=lru
mkfs.xfs /dev/bcache0 # no longer using md0 directly!
Now performance is still quite meh, what's the problem? Oh ... we need to attach the SSD cache device to the backing device!
ls /sys/fs/bcache/
45088921-4709-4d30-a54d-d5a963edf018  register  register_quiet
That's the UUID we need, so:
echo 45088921-4709-4d30-a54d-d5a963edf018 > /sys/block/bcache0/bcache/attach
and dmesg says:
[  549.076506] bcache: bch_cached_dev_attach() Caching md0 as bcache0 on set 45088921-4709-4d30-a54d-d5a963edf018
Tadaah!

So what about performance? Well ... without any proper benchmarks, just copying the data back I see very different behaviour. iotop shows writes happening at ~40MB/s, but as the network isn't that fast (100Mbit switch) it's only writing every ~5sec for a second.
Unpacking chromium is now CPU-limited and doesn't cause a minute-long IO storm. Responsivity while copying data is quite excellent.

The write speed for random IO is a lot higher, reaching maybe 2/3rds of the SSD natively, but I have 1TB storage with that speed now - for a $25 update that's quite amazing.

Another interesting thing is that bcache is chunking up IO, so the harddisks are no longer making an angry purring noise with random IO, instead it's a strange chirping as they only write a few larger chunks every second. It even reduces the noise level?! Neato.

First impression: This is definitely worth setting up for new machines that require good IO performance, the only downside for me is that you need more hardware and thus a slightly bigger budget. But the speedup is "very large" even with a cheap-crap SSD that doesn't even go that fast ...

Edit: ioping, for comparison:
native sata disks:
32 requests completed in 32.8 s, 34 iops, 136.5 KiB/s
min/avg/max/mdev = 194 us / 29.3 ms / 225.6 ms / 46.4 ms

bcache-enhanced, while writing quite a bit of data:
36 requests completed in 35.9 s, 488 iops, 1.9 MiB/s
min/avg/max/mdev = 193 us / 2.0 ms / 4.4 ms / 1.2 ms


Definitely awesome!

Posted by Patrick | Permalink

Fri Sep 5 08:41:43 CEST 2014

32bit Madness

This week I ran into a funny issue doing backups with rsync:
rsnapshot/weekly.3/server/storage/lost/of/subdirectories/some-stupid.file => rsnapshot/daily.0/server/storage/lost/of/subdirectories/some-stupid.file
ERROR: out of memory in make_file [generator]
rsync error: error allocating core memory buffers (code 22) at util.c(117) [generator=3.0.9]
rsync error: received SIGUSR1 (code 19) at main.c(1298) [receiver=3.0.9]
rsync: connection unexpectedly closed (2168136360 bytes received so far) [sender]
rsync error: error allocating core memory buffers (code 22) at io.c(605) [sender=3.0.9]
Oopsiedaisy, rsync ran out of memory. But ... this machine has 8GB RAM, plus 32GB Swap ?!
So I re-ran this and started observing, and BAM, it fails again. With ~4GB RAM free.

4GB you say, eh? That smells of ... 2^32 ...
For doing the copying I was using sysrescuecd, and then it became obvious to me: All binaries are of course 32bit!

So now I'm doing a horrible hack of "linux64 chroot /mnt/server" so that I have a 64bit environment that does not run out of space randomly. Plus 3 new bugs for the Gentoo livecd, which fails to appreciate USB and other things.
Who would have thought that a 16TB partition can make rsync stumble over address space limits ...

Posted by Patrick | Permalink

Wed Sep 3 08:25:27 CEST 2014

AMD HSA

With the release of the "Kaveri" APUs AMD has released some quite intriguing technology. The idea of the "APU" is a blend of CPU and GPU, what AMD calls "HSA" - Heterogenous System Architecture.
What does this mean for us? In theory, once software catches up, it'll be a lot easier to use GPU-acceleration (e.g. OpenCL) within normal applications.

One big advantage seems to be that CPU and GPU share the system memory, so with the right drivers you should be able to do zero-copy GPU processing. No more host-to-GPU copy and other waste of time.

So far there hasn't been any driver support to take advantage of that. Here's the good news: As of a week or two ago there is driver support. Still very alpha, but ... at last, drivers!

On the kernel side there's the kfd driver, which piggybacks on radeon. It's available in a slightly very patched kernel from AMD. During bootup it looks like this:
[    1.651992] [drm] radeon kernel modesetting enabled.
[    1.657248] kfd kfd: Initialized module
[    1.657254] Found CRAT image with size=1440
[    1.657257] Parsing CRAT table with 1 nodes
[    1.657258] Found CU entry in CRAT table with proximity_domain=0 caps=0
[    1.657260] CU CPU: cores=4 id_base=16
[    1.657261] Found CU entry in CRAT table with proximity_domain=0 caps=0
[    1.657262] CU GPU: simds=32 id_base=-2147483648
[    1.657263] Found memory entry in CRAT table with proximity_domain=0
[    1.657264] Found memory entry in CRAT table with proximity_domain=0
[    1.657265] Found memory entry in CRAT table with proximity_domain=0
[    1.657266] Found memory entry in CRAT table with proximity_domain=0
[    1.657267] Found cache entry in CRAT table with processor_id=16
[    1.657268] Found cache entry in CRAT table with processor_id=16
[    1.657269] Found cache entry in CRAT table with processor_id=16
[    1.657270] Found cache entry in CRAT table with processor_id=17
[    1.657271] Found cache entry in CRAT table with processor_id=18
[    1.657272] Found cache entry in CRAT table with processor_id=18
[    1.657273] Found cache entry in CRAT table with processor_id=18
[    1.657274] Found cache entry in CRAT table with processor_id=19
[    1.657274] Found TLB entry in CRAT table (not processing)
[    1.657275] Found TLB entry in CRAT table (not processing)
[    1.657276] Found TLB entry in CRAT table (not processing)
[    1.657276] Found TLB entry in CRAT table (not processing)
[    1.657277] Found TLB entry in CRAT table (not processing)
[    1.657278] Found TLB entry in CRAT table (not processing)
[    1.657278] Found TLB entry in CRAT table (not processing)
[    1.657279] Found TLB entry in CRAT table (not processing)
[    1.657279] Found TLB entry in CRAT table (not processing)
[    1.657280] Found TLB entry in CRAT table (not processing)
[    1.657286] Creating topology SYSFS entries
[    1.657316] Finished initializing topology ret=0
[    1.663173] [drm] initializing kernel modesetting (KAVERI 0x1002:0x1313 0x1002:0x0123).
[    1.663204] [drm] register mmio base: 0xFEB00000
[    1.663206] [drm] register mmio size: 262144
[    1.663210] [drm] doorbell mmio base: 0xD0000000
[    1.663211] [drm] doorbell mmio size: 8388608
[    1.663280] ATOM BIOS: 113
[    1.663357] radeon 0000:00:01.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[    1.663359] radeon 0000:00:01.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
[    1.663360] [drm] Detected VRAM RAM=1024M, BAR=256M
[    1.663361] [drm] RAM width 128bits DDR
[    1.663471] [TTM] Zone  kernel: Available graphics memory: 7671900 kiB
[    1.663472] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    1.663473] [TTM] Initializing pool allocator
[    1.663477] [TTM] Initializing DMA pool allocator
[    1.663496] [drm] radeon: 1024M of VRAM memory ready
[    1.663497] [drm] radeon: 1024M of GTT memory ready.
[    1.663516] [drm] Loading KAVERI Microcode
[    1.667303] [drm] Internal thermal controller without fan control
[    1.668401] [drm] radeon: dpm initialized
[    1.669403] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    1.685757] [drm] PCIE GART of 1024M enabled (table at 0x0000000000277000).
[    1.685894] radeon 0000:00:01.0: WB enabled
[    1.685905] radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880429c5bc00
[    1.685908] radeon 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffff880429c5bc04
[    1.685910] radeon 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffff880429c5bc08
[    1.685912] radeon 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880429c5bc0c
[    1.685914] radeon 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xffff880429c5bc10
[    1.686373] radeon 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xffffc90012236c98
[    1.686375] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.686376] [drm] Driver supports precise vblank timestamp query.
[    1.686406] radeon 0000:00:01.0: irq 83 for MSI/MSI-X
[    1.686418] radeon 0000:00:01.0: radeon: using MSI.
[    1.686441] [drm] radeon: irq initialized.
[    1.689611] [drm] ring test on 0 succeeded in 3 usecs
[    1.689699] [drm] ring test on 1 succeeded in 2 usecs
[    1.689712] [drm] ring test on 2 succeeded in 2 usecs
[    1.689849] [drm] ring test on 3 succeeded in 2 usecs
[    1.689856] [drm] ring test on 4 succeeded in 2 usecs
[    1.711523] tsc: Refined TSC clocksource calibration: 3393.828 MHz
[    1.746010] [drm] ring test on 5 succeeded in 1 usecs
[    1.766115] [drm] UVD initialized successfully.
[    1.767829] [drm] ib test on ring 0 succeeded in 0 usecs
[    2.268252] [drm] ib test on ring 1 succeeded in 0 usecs
[    2.712891] Switched to clocksource tsc
[    2.768698] [drm] ib test on ring 2 succeeded in 0 usecs
[    2.768819] [drm] ib test on ring 3 succeeded in 0 usecs
[    2.768870] [drm] ib test on ring 4 succeeded in 0 usecs
[    2.791599] [drm] ib test on ring 5 succeeded
[    2.812675] [drm] Radeon Display Connectors
[    2.812677] [drm] Connector 0:
[    2.812679] [drm]   DVI-D-1
[    2.812680] [drm]   HPD3
[    2.812682] [drm]   DDC: 0x6550 0x6550 0x6554 0x6554 0x6558 0x6558 0x655c 0x655c
[    2.812683] [drm]   Encoders:
[    2.812684] [drm]     DFP2: INTERNAL_UNIPHY2
[    2.812685] [drm] Connector 1:
[    2.812686] [drm]   HDMI-A-1
[    2.812687] [drm]   HPD1
[    2.812688] [drm]   DDC: 0x6530 0x6530 0x6534 0x6534 0x6538 0x6538 0x653c 0x653c
[    2.812689] [drm]   Encoders:
[    2.812690] [drm]     DFP1: INTERNAL_UNIPHY
[    2.812691] [drm] Connector 2:
[    2.812692] [drm]   VGA-1
[    2.812693] [drm]   HPD2
[    2.812695] [drm]   DDC: 0x6540 0x6540 0x6544 0x6544 0x6548 0x6548 0x654c 0x654c
[    2.812695] [drm]   Encoders:
[    2.812696] [drm]     CRT1: INTERNAL_UNIPHY3
[    2.812697] [drm]     CRT1: NUTMEG
[    2.924144] [drm] fb mappable at 0xC1488000
[    2.924147] [drm] vram apper at 0xC0000000
[    2.924149] [drm] size 9216000
[    2.924150] [drm] fb depth is 24
[    2.924151] [drm]    pitch is 7680
[    2.924428] fbcon: radeondrmfb (fb0) is primary device
[    2.994293] Console: switching to colour frame buffer device 240x75
[    2.999979] radeon 0000:00:01.0: fb0: radeondrmfb frame buffer device
[    2.999981] radeon 0000:00:01.0: registered panic notifier
[    3.008270] ACPI Error: [\_SB_.ALIB] Namespace lookup failure, AE_NOT_FOUND (20131218/psargs-359)
[    3.008275] ACPI Error: Method parse/execution failed [\_SB_.PCI0.VGA_.ATC0] (Node ffff88042f04f028), AE_NOT_FOUND (20131218/psparse-536)
[    3.008282] ACPI Error: Method parse/execution failed [\_SB_.PCI0.VGA_.ATCS] (Node ffff88042f04f000), AE_NOT_FOUND (20131218/psparse-536)
[    3.509149] kfd: kernel_queue sync_with_hw timeout expired 500
[    3.509151] kfd: wptr: 8 rptr: 0
[    3.509243] kfd kfd: added device (1002:1313)
[    3.509248] [drm] Initialized radeon 2.37.0 20080528 for 0000:00:01.0 on minor 0
It is recommended to add udev rules:
# cat /etc/udev/rules.d/kfd.rules 
KERNEL=="kfd", MODE="0666"
(this might not be the best way to do it, but we're just here to test if things work at all ...)

AMD has provided a small shell script to test if things work:
# ./kfd_check_installation.sh 

Kaveri detected:............................Yes
Kaveri type supported:......................Yes
Radeon module is loaded:....................Yes
KFD module is loaded:.......................Yes
AMD IOMMU V2 module is loaded:..............Yes
KFD device exists:..........................Yes
KFD device has correct permissions:.........Yes
Valid GPU ID is detected:...................Yes

Can run HSA.................................YES
So that's a good start. Then you need some support libs ... which I've ebuildized in the most horrible ways
These ebuilds can be found here

Since there's at least one binary file with undeclared license and some other inconsistencies I cannot recommend installing these packages right now.
And of course I hope that AMD will release the sourcecode of these libraries ...

There's an example "vector_copy" program included, it mostly works, but appears to go into an infinite loop. Outout looks like this:
# ./vector_copy 
Initializing the hsa runtime succeeded.
Calling hsa_iterate_agents succeeded.
Checking if the GPU device is non-zero succeeded.
Querying the device name succeeded.
The device name is Spectre.
Querying the device maximum queue size succeeded.
The maximum queue size is 131072.
Creating the queue succeeded.
Creating the brig module from vector_copy.brig succeeded.
Creating the hsa program succeeded.
Adding the brig module to the program succeeded.
Finding the symbol offset for the kernel succeeded.
Finalizing the program succeeded.
Querying the kernel descriptor address succeeded.
Creating a HSA signal succeeded.
Registering argument memory for input parameter succeeded.
Registering argument memory for output parameter succeeded.
Finding a kernarg memory region succeeded.
Allocating kernel argument memory buffer succeeded.
Registering the argument buffer succeeded.
Dispatching the kernel succeeded.
^C
Big thanks to AMD for giving us geeks some new toys to work with, and I hope it becomes a reliable and efficient platform to do some epic numbercrunching :)

Posted by Patrick | Permalink

Thu Aug 7 04:45:16 CEST 2014

googlecode.com, or no tarballs for you

I'm almost amused, see this bug

So when I fetched it earlier the tarball had size 207200 bytes
Most of europe apparently gets a tarball of size 207135 bytes
When I download now again I get a tarball of size 206989 bytes

So I have to assume that googlecode now follows githerp in their tradition of being useless for code hosting. Is it really that hard to generate a consistent tarball once, and then mirror it?
Maybe I should build my own codehosting just to understand why this is apparently impossible ...

Posted by Patrick | Permalink

Mon Jul 14 08:39:32 CEST 2014

Biggest ebuilds in-tree

Random datapoint: There's only about 10 packages with ebuilds over 600 lines.

Sorted by lines, duplicate entries per-package removed, these are the biggest ones:
828 dev-lang/ghc/ghc-7.6.3-r1.ebuild
817 dev-lang/php/php-5.3.28-r3.ebuild
750 net-nds/openldap/openldap-2.4.38-r2.ebuild
664 www-client/chromium/chromium-36.0.1985.67.ebuild
654 www-servers/nginx/nginx-1.4.7.ebuild
658 games-rpg/nwn-data/nwn-data-1.29-r5.ebuild
654 media-video/mplayer/mplayer-1.1.1-r1.ebuild
644 dev-vcs/git/git-9999-r3.ebuild
621 x11-drivers/ati-drivers/ati-drivers-13.4.ebuild
617 sys-freebsd/freebsd-lib/freebsd-lib-9.1-r11.ebuild

Posted by Patrick | Permalink