Sat Oct 29 22:50:40 CEST 2011

Bleeding edge, in retrospect

Found this gem while cleaning up:
shiny...

Posted by Patrick | Permalink

Sat Oct 22 18:11:25 CEST 2011

Booting Gentoo - from init to console

I've spent some time with OpenRC and sysvinit trying to understand a few things (for example how to integrate CGroups support), and along the way I've learned a few things about the boot process that are not that well documented.
So why not document it for posteriority ...

In the beginning, through some magic, the kernel is booted. How that happens is another story and not our concern at the moment. At some point the kernel has initialized, figured out where the rootfs is (for example through the root= kernel parameter), mounted it ... and now what?
At this point we need to start userspace process #1, traditionally known as "init". The kernel has some hardcoded defaults that by default will try /sbin/init, but it's easy enough to override that with the init= kernel parameter if you want to have something else run (like /bin/bash to get just a rescue shell)
init comes from the sysvinit package and is small enough to be read in an afternoon. There are some surprisingly elegant bits in it, but it's still just doing one job well - starting the rest of the userland processes. It takes its info from /etc/inittab, which just lists different runlevels and what to start. Now in the case of Gentoo this is a bit unusual as it mostly just calls "rc", which is part of the OpenRC package, with a parameter like "rc single". This is the name of the runlevel then - "default" by default. We have sane defaults!
Init is also triggered to change runlevels, this is usually done through the "init" or "telinit" commands.

Now OpenRC needs to figure out what to start. Before the "default" runlevel is started we need to start the "sysinit" and "boot" runlevels (for details have a look at /etc/runlevels ). This starts a few things like udev and mounts local filesystems, then starts all the daemons you requested. The bookkeeping for that (what has started, is starting, has failed etc.) can be found in /lib/rc/init.d/ - just another simple directory with self-explaining filenames. And running "rc" without arguments will just try to get us back to the current runlevel defaults - start what is stopped, and stop everything not defined in /etc/runlevels. Running "rc" in cron is a nice way to keep things like sshd running even through accidents like "killall sshd" :)

How OpenRC figures out the dependencies is quite "magic", but if you trace it you find runscript (an executable) running runscript.sh with a parameter like "depend", which sources the init scripts and just outputs the value of the DEPEND line. (Read /lib/rc/sh/runscript.sh to get an idea, or if you get bored read the source of runscript). And that information is cached in /lib/rc/init.d/deptree to avoid having to re-source the init scripts as this is a "slow" process (maybe 50msec per init script, but if you have 100 scripts that's still 5 seconds you lose just parsing the init scripts instead of starting stuff)

So OpenRC starts all the things from /etc/runlevel and is now done, it returns the control to init, which now notices that it has a few lines like this in its config (/etc/inittab):
# TERMINALS
c1:12345:respawn:/sbin/agetty -c 38400 tty1 linux
So what it does now is very simple - it runs agetty, which configures the (pseudo-)terminals (tty1 here) and starts a login program (in this case /bin/login, the default). This asks us for username and password (another interesting story for a different time), and when this is done runs the login shell specified for that user.
And here we are, booted up and ready to serve our human overlords ;)

Posted by Patrick | Permalink

Wed Oct 19 15:11:02 CEST 2011

OpenRC, agetty and terminal blanking

Some people might have noticed a naughty interaction with one of the last sys-apps/util-linux updates and OpenRC.
The symptoms are described in bug 381401 - what you usually notice is that after boot the console blanks / resets and all you see is the boring login prompt.

The cause is a small change in agetty defaults, the manpage now mentions:
       -c, --noreset
              Don't reset terminal cflags (control modes). See termios(3) for more details.
So, if you are bothered by this change you need to edit /etc/inittab, which defines where and how the agetty processes are started. Change:
c1:12345:respawn:/sbin/agetty 38400 tty1 linux
to
c1:12345:respawn:/sbin/agetty -c 38400 tty1 linux
And from now on you should still have the old behaviour on reboot. And maybe we can convince Vapier that that would be a sane default so we don't have to change it on every system.

Until then you should still be able to see the boot messages in /var/log/rc.log - by default /etc/rc.conf has rc_logger="YES" set, so that should "just work"(tm)

Posted by Patrick | Permalink

Sun Oct 16 17:19:36 CEST 2011

CGroups support for OpenRC

During a train ride I spent some time implementing a prototype for CGroups support for OpenRC - the big awesome feature for SysTemD that people seem to desire the most.
Here's the most amusing bit:
$ git diff 3ad849c5d6a24ef66152004eb3149d2cff973b1c..082c04e0a1c31115417af9fd348ae83ee8ecc397 --stat
 sh/init.sh.Linux.in |   10 +++++++++
 sh/runscript.sh.in  |   54 +++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 2 deletions(-)
So I've managed to stay below my initial estimate of "under 100 lines of shell" - I don't quite get what the big deal is. Of course the initial patch is a bit rough, there are some things that just don't feel clean to me - but it works.

To explain the patch a bit - sh/init.sh.Linux.in:
+# Setup Kernel Support for CGroups
+if grep -qs cgroup /proc/filesystems; then
+       ebegin "Mounting CGroup filesystem"
+       mkdir -p /dev/cgroup
+       mount -n -t cgroup -o nodev,noexec,nosuid \
+               OpenRC-CGroup /dev/cgroup
+       eend $?
+fi
We need to mount a CGroup pseudofilesystem somewhere, I figured out that we better do it as early as possible (although procfs and sysfs init scripts aren't wrong per se). So we mount a cgroupfs named "OpenRC-CGroup" - that's just a nice marker so you see who did it. So much for setup.
Now when a service starts we just need to take the parent process (which is conveniently the runscript.sh process that runs the init script) and push it into its own group. Although, maybe, it makes sense to let some init scripts be in the same group, right?
+# Attach to CGroup - dir existing is enough for us
+if [ -d /dev/cgroup ]; then
+       # use the svcname unless overriden in conf.d
+       SVC_CGROUP=${CGROUP:-$RC_SVCNAME}
+       mkdir -p /dev/cgroup/${SVC_CGROUP}
+       # now attach self to cgroup - any children of this process will inherit this
+       echo $$ > /dev/cgroup/${SVC_CGROUP}/tasks
+       # TODO: set res limits from conf.d
+fi
Look, ma, no hands!
We allow users to set CGROUP in conf.d files, but if it's not set we just use RC_SVCNAME, which is the name of the init script usually. Then we create a subdirectory (but we don't care if it already exists) and throw ourselves into it.
Tadaah!

There's one little downside, we create directories but don't remove them. So let's add a little cleanup at the end of runscript.sh:
+# CGroup cleanup
+if [ -d /dev/cgroup ]; then
+        # use the svcname unless overriden in conf.d
+        SVC_CGROUP=${CGROUP:-$RC_SVCNAME}
+       # reattach ourself to root cgroup 
+       echo $$ > /dev/cgroup/tasks
+       # remove cgroup if empty, will fail if any task attached
+        rmdir  /dev/cgroup/${SVC_CGROUP} 2>/dev/null   
+fi
+# need to exit cleanly
+exit 0
The exit 0 is a bit naughty, but I think it's the right way to terminate. Otherwise the rmdir might give us a bad return value which makes it look like things fail even when they don't ...
There's a neat trick in it - we are still in that cgroup, so we take ourselves and move us back to the root cgroup - otherwise the cgroup will never be empty. Duh! :)

And here's a more, let's say, controversial part. It's a bit exquisitely rude, so I guess this will have to be adapted for public use - but it works too well. See, there's this leeeetle problem that OpenRC tends to not kill processes that well. So if, for example, the apache init script doesn't manage to kill all indians you may have a problem - for example re-starting might fail because there's still the last mohican protecting port 80.
Since we know that the processes end up in a specific cgroup we can just take all those, put them against the wall and shoot them dead. This has some potential side-effects, for example if there's multiple init scripts using one cgroup or if there's a service where it's expected that a process survives (although that's funkay). Anyway ...
+# Kill everything in the CGroup with maximum prejudice until it is dead
+# This is very naughty if multiple init scripts were to share a CGroup ...
+terminate()
+{
+        if ! [ -d /dev/cgroup ]; then
+                return 0
+        else
+                SVC_CGROUP=${CGROUP:-$RC_SVCNAME}
+                # we want to survive and thus must not be killed dead
+                echo $$ > /dev/cgroup/tasks
+                for signal in TERM KILL; do
+                        for i in `cat /dev/cgroup/${SVC_CGROUP}/tasks`; do
+                                kill -s $signal $i
+                        done
+                done
+        fi
+        # if anyone survived here we could try task_freezer; SIGTERM; unfreeze
+        # to be sure everyone really got the message that they are dead
+}
This works well even for uncooperative init scripts, but might not always be what you want. But at least it kills everyone dead :)
And there's some funky resource limits possible - I'm still trying to figure out how to make use of them properly, but ...
+       # cpuset support try 1
+       if [ -v CGROUP_CPUS ] && [ -e /dev/cgroup/${SVC_CGROUP}/cpuset.cpus ]; then
+               echo $CGROUP_CPUS > /dev/cgroup/${SVC_CGROUP}/cpuset.cpus
+       fi
... this nails everything from one init script to a cpuset, so you can hammer apache onto CPUs 1-7 and leave CPU 0 free for "everything else", if you want. It also allows memory limits and a few other advanced gadgets, but I haven't played with that yet.

And so in one afternoon we managed to grow a pretty decent bonus feature ..

Posted by Patrick | Permalink

Mon Aug 15 14:09:39 CEST 2011

Fighting Stupid

The mission for the day: Activate a new email account and XMPP/Jabber

Sounds easy, right? So ... let's find an XMPP client. ayttm ? Sounds nice, but ... wow ...
If it were a waiter in a restaurant it'd ask you what you want to order, then empty a glass of water over your head and bring you some raw broccoli.

Absolutely no will to live, and ignores users.
Uhm, pidgin? Certain entities recommended that. One does wonder why, because it's a deafmute that communicates by throwing things at you. Username? Pah, who needs that! And why didn't you give me a password when I didn't ask? Either way, XML Parsing Error, so I can't communicate with XMPP anyway. And you're ugly!

Good thing emerge -C works so well. Now, some other entities mentioned telepathy, and the kde telepathy bits even do something. I'm unsure if that's really the expected mode of oprtation, but it seems to have connected to ... something.

Next mission: Setting up an email account in Thunderbird. Wow. Omg. It has gotten even more pessimized since I tried the last time ...

First of all it tries to guess the communication parameters. foo@bar.com ? Mailserver must be mail.bar.com ... or grmpflzrgh.bar.com ... or any of the other 35 wrong things it tries to connect to. SMTP? Let's do the guessing game once again. Hint: On the left side there's a button "manual configuration" which aborts this guessing game. Then you can add your own entries ...
Just to notice that some idiotic moron made it impossible to *use* your own settings. Because you get a wrong popup that claims that you can continue with these untested settings (wooooh, I'm scaaared!), but ... the button? It's greyed out. What the bleep. In a movie it would be hilarious, but ... WTF. SRSLY. URTARD.

Now then, let's go back, hit the mandatory "test settings" button (why is there a "continue button" ... oh nevermind), then it changes the settings to something wrong, you hit continue, wait until fetching emails fails, then go to settings and change it to the correct values again.

Which makes no difference because it still can't connect, so my plans to get rid of ThunderFail have been immensely accelerated.

About 2h wasted again with software that doesn't have a will to live, but at least I get some good ideas for policies and testing ...

Posted by Patrick | Permalink

Sat Aug 13 11:00:20 CEST 2011

Apt-gentoo? Gentoo-apt! Hah!

I was marginally amused when I saw that some funny person had written an apt-gentoo wrapper that, err, scrolls build logs around slowly.

I guess that's a debian user's idea of an interpretation how package building works. It would have been a lot more amusing to turn debian into a proper build-from-source environment, but ... that's actually hard.

So in return I'll mention how to badly emulate apt on gentoo:
*) Create or use a binhost - this is pretty easy to set up, often FEATURES="buildpkg" is all you need to configure on the server side
*) Make the packages available, for example through http: cd /usr/portage/packages; python -m SimpleHTTPServer 80
*) set the PORTAGE_BINHOST variable to point at your newly created server
*) add "-G" to EMERGE_DEFAULT_OPTS
*) Tadaah. Now you have a proper binary Gentoo that behaves almost, but not completely unlike apt.

Next lesson: How to badly emulate system startup with OpenRC? ;)
(badly because it actually works, which is not part of the requirements)

Posted by Patrick | Permalink

Thu Jul 28 23:07:08 CEST 2011

Reboot

After spending quite a bit more time than expected in Berlin I've finally returned back "home". Having access to more than a suitcase of stuff can be convenient (although it appears to be an optional luxury now).

Due to some unfortunate hardware failures the fastest working CPU I have locally is a single-core Athlon64. It's quite fascinating to see how bloated everything has become, what used to be a droolworthy CPU not so long ago is now the lower end for a simple desktop. Especially memory consumption is insane - Thunderbird easily absorbs 5GB RAM if you push it a bit. Firefox seems to grow at a rate of ~200MB/day and needs to be regularly restarted. So much sadness.

I'm slowly catching up with gentoo things, and it appears that I've found two new minions to recruit. Just as I had realized that I have some time and motivation - I like it. More people means less work per person, so less burnout, less abandoned packages and so on. More better happy.

Thanks to the work of sochotnicky our Ohloh Statistics have finally been updated. There are some interesting results - for example the amount of committers has been roughly constant for the last 2-3 years, which means that recruiting is at least absorbing the normal attrition. But I think we need to do one better and get things growing ... how else are we supposed to keep everything in good shape?
Which also makes me think about bugs and how to squish them most effectively. There are so many bugs open that I find it hard to get an overview what is "most urgent" or what are trivial bugs that might just take 5 minutes of work to fix. So we should definitely revive the BugDays and make it easier for people to get involved and provide us with fixes. Right now I have no motivation (and not enough hardware) to do any tinderboxing as I can easily divert all processing power I have into bugfix testing. And that's going to make people happier, on average, than finding even more bugs ;) (although we need to improve on both ends - and we need more metrics so we know where we are and where we are going).

And on it goes, the infinite hamster wheel of progress - who wants to help?

Posted by Patrick | Permalink

Fri Jul 1 19:44:26 CEST 2011

MDMA

The Monitoring-Driven Master Administration


For software dev we have Test-Driven Development (TDD), unittests and all that machinery. The goal of all those methods is to catch errors, best before they can hurt anyone.
Detecting problems early saves you lots of time and frustration and makes changing and improving things easier.

For admins, we now have monitoring-driven administration:

(1) Set up monitoring. Watch it fail and notify you

(2) Set up service

(3) Watch all monitoring switch to greenlight

There are some simple rules to be followed:
No service can be deployed without monitoring. If there is any critical warning from the monitoring it needs to be fixed. If there are warnings they should be fixed, either by tackling the problems or increasing the monitoring threshold.

The default state is all green - no warnings, no errors (except during the test/integration phase of new services). Any warnings or errors should trigger you into fix-this-stuff mode.

Rationale:
When you enable a service in the greenlight state you never figure out if you monitor the right bits. Maybe the check for free disk is running locally instead of remotely? Will always look good, even if the actual service is in a failed state.

Having any warnings means that something is in a state you consider not-good. Either fix the service or the monitoring thresholds.

Results:
You'll be able to sleep a lot better if you get your daily status email and you know that everything is working fine. Then you can focus on improving things instead of playing infinite fireman.

If this sounds like stating the obvious, well, most good ideas are ...

Posted by Patrick | Permalink

Thu May 26 18:53:56 CEST 2011

The Gentoo Newsletter (or why we don't have one right now)

One of the things I've been wanting to work on was a Gentoo Newsletter.
We used to have a Weekly Newsletter, then people got upset that the editor was an editor or something, then it was on life support for a while until it became the Monthly Newsletter, and after not even a year that just ceased to be. So for roughly the last 3 years there hasn't been a (semi-)regular publication to keep us all happy.

What people regularly underestimate is the amount of time that goes into a newsletter - just little things like doing mailinglist summaries easily takes an hour for every newsletter. Then there's items like interviews that are open-ended ... of course you can finish one up in 30 minutes, but that will be a bit bland and boring. So you find new questions, ask for clarifications on answers and soon you're looking at a few hours of time to process it nicely. Then you get semi-automated tasks like bug statistics and GLSAs, and once you have all those fragments you need to glue them together sanely, check that the formatting makes sense and send it to the gentoo-core@ mailinglist. People will find dozens of issues you've overlooked, so you correct them all, send it again and wait for the next round of corrections. And once that is done you can think about publishing ...

Of course you want to keep it sustainable, it'd be silly if everyone involved quit after two weeks and you had no one writing it. So you want to be able to plan ahead a bit and commit to a long-term schedule. Usually life gets in the way anyway, but you can mitigate it by having backup people for every role.
As you can guess I've not found the stability and time to get things going again, but it's itching me. So I'm scheming and plotting and preparing ... soon, soon I say! We shall have a Newsletter again. And if it kills me ;)

I think that many others want it to happen too, so if everyone just gives one or two hours of work every other week we should have enough manpower behind it to keep it going indefinitely. And you get the awesome feeling that you made people happy by telling them what happened in Gentooland this week. How can you say no to that!

Posted by Patrick | Permalink

Thu May 26 18:48:50 CEST 2011

Life of an IT-Ronin

Some people might have noticed that I haven't been able to spend as much time as usual on all things Gentoo.

Well, I have a pretty decent excuse. Work has been absorbing quite a bit of time, and I haven't been much at home in the last weeks. Whatever home means ...
Through some luck and chance I was able to spend 3 weeks in China, in the insane place they call Shanghai. It's a really intense and exciting place. I really enjoyed my time there, and if I get another chance to go there ...
Of course there's a language barrier (but then most chinese appear to have trouble understanding each other), and lots of little things that are just different enough to confuse you badly.
But ... the food! Oh dear. The food alone is worth it, and that's why most foreigners gain 15kg in the first year. Then there's things like the omnipresent taxis - want to go somewhere? Raise your hand, there's a taxi. Like in movies, only a bit more random - sometimes normal motorists stop and offer you an impromptu taxi.

At night, when it's raining, the city center is exactly like a William Gibson book that became alive - neon lights making everything glow in false colours, people running around like ants, the rain like static noise hiding everything in the distance ... it's pretty awesome. Did I mention the food?
There's some not so nice things like the lack of regulations we take for granted in europe - a construction site will run 24/7 and no one cares that it might be noisy. The air quality is at times questionable, "blue sky" is a hilarious idea. The amount of people is pretty insane, everything is alive - even at 4 in the morning there's still people running around, and there's so many shops that are open 24/7. Well, most shops are open every day, why would you not be open? That's just time you're not earning money!
In Shanghai even at night I never felt threatened, crime is relatively low, even when there's such a huge disparity between rich and poor. At worst some beggars follow you for a while in the hope of getting a generous donation out of you. Police is almost omnipresent, but passive - they don't interfere with normal life if they can avoid it.
The traffic is insane (not just carribean-insane or belgium-insane, but omgwtf insane), sometimes you see someone driving backwards on the highway because he missed an exit ... cyclists never seem to care about traffic lights, so you need to be vigilant all the time. But once you accept that it's a freeform self-organizing traffic that is surprisingly efficient. And it reminds me of the old game "Frogger" - just in 3D on expert mode.

So anyway, if you are a PHP coder ... the nice people at The Net Circle are really decent folks with a great work environment. You'd have to move to Shanghai, but I guess that's just another job bonus. I had lots of fun there ...

And just when I had returned to Europe I was summoned by the happy people at Lieferheld, a german startup trying to conquer the market of online food delivery ordering. Compared to Shanghai the nice little town of Berlin appears so calm and peaceful, but it's still a great city.
Actually Berlin feels like multiple little towns glued together - every area has a distinct feel and architecture. There's insane stuff like the remains of the Wall that was built through the middle of the city - what on earth were people thinking back then!?
It's an international city in the sense that speaking german is optional. But this just makes it much more accessible and communicative. There's alwas something happening, so the concept of getting bored is pretty much a theoretical construct. Especially in the late spring and summer it's nice because there are also many little parks and lots of places where you can just chill, grill or get a nice sunburn.

Now if you are a python-programmer or maybe even someone with experience in django you might find an interesting challenge there.

And there's a good chance that after my stay at Lieferheld I'll be absorbed into the family of OTRS, one of germany's few real internet companies - when you work online you can be wherever you are, so they have managed to allow homeoffice for most of their programmers and IT-people.

So blame these people for absorbing way too much of my time - I really want to do more than 2 alibi-commits a month, but these modern slavedrivers just find good ways to absorb all of my time ;)

And during all my travels I've met lots of great people, made some good new friends and learned quite a bit about life, the universe and everything.

So - what's next? We'll see. And if you have some good ideas for me, please, tell me. I'm not opposed to random new experiences ...

Posted by Patrick | Permalink