barrios kernel story: 2002 kernel trap Ingo Molnar Interview

2002년 인터뷰 내용중 관심 있는 부분들만...

http://kerneltrap.org/node/517

Jeremy Andrews: When did you get started with Linux?

Ingo Molnar: i think i first heard

about Linux around 1993, but i truly got hooked on kernel development in 1995 when i bought the german edition of the 'Linux Kernel Internals' book. It might sound a bit strange but i installed my first Linux box for the sole purpose of looking at the kernel source - which i found (and still find) fascinating. So i guess i'm one of the few people who started out as a kernel developer, later on learned their way to be a Linux admin and then finally learned their way around as a Linux user ;-)

JA: What was your first contribution to the kernel?

Ingo Molnar: my very first contribution was a trivial #ifdef bugfix to the networking code, which was reviewed and merged by Alan Cox. At that point i've been lurking on the kernel mailing list for a couple of months already. My first bigger patch was to arch/i386/kernel/time.c, i implemented timestamp-counter based gettimeofday() on Pentiums (which sped up the gettimeofday() syscall by a factor of ~4) - that code is still alive in current kernels. This patch too was first reviewed by Alan Cox.

I strongly believe that a positive 'first contact' between kernel newbies and kernel oldbies is perhaps the single most important factor in attracting new developers to Linux. Besides having the ability to code, kernel developers also need the ability to talk and listen to other developers.

JA: Did you base the design on any existing scheduler implementations or research papers?

Ingo Molnar: this might sound a bit arrogant, but i have only read (most of the) research papers after writing the scheduler. This i found to be a good approach in the area of Linux - knowing about too many well-researched details can often confuse the real direction we have to take. I like writing new code, and i prefer to approach things from the physics side: take a few elementary rules and build up the 'one correct' solution, no compromises. This might not be as effective as first reading all the available material and then cherry-picking a few ideas and thinking up the remaining things, but it sure gives me lots of fun :-)

[ One thing i always try to ensure: i take a look at all existing kernel patches that were announced on the linux-kernel mailing list in the same area, to make sure there's no duplication of effort or NIH syndrome. Since such kernel mailing-list postings are progress reports of active research, it can be said that i read alot of bleeding-edge research. ]

JA: How do JVMs trigger an inefficiency in the old scheduler?

Ingo Molnar: the Java programming model prefers the use of many 'threads' - which is a valid and popular application programming model. So JVMs under Linux tend to be amongst the applications that use the most processes/threads, which are interacting in complex ways. Schedulers usually have the most work to do when there are more tasks in the systems, so JVMs tend to trigger scheduler inefficiencies sooner than perhaps any other Linux application.

JA: You're also the author of the original kernel preemption patch. How did your patch differ from the more recent work Robert Love has done in this area?

Ingo Molnar: it was a small concept-patch from early 2000 that just showed that a preemptible kernel can indeed be done by using SMP spinlocks. The patch, while it booted and appeared to work to a certain degree, had bugs and did not handle the many cases that need special care, which Robert's patches and the current 2.5 kernel handles correctly.

otherwise the base approach is IMO very similar, it has things like:

+               preempt_on();
            clear_highpage(page);

+               preempt_off();

and:

+               atomic_inc_local(&current->may_preempt);        \

which is quite similar to what we have 2.5 today, with the difference that
Robert and the kernel developer community actually did the other 95% of the work :-)

JA: Are you also actively working on 2.5 preemptible kernel development?

Ingo Molnar: The maintainer is Robert - i do tend to send smaller preempt related patches (and even a larger one, the 'IRQ lock removal' patch centered around the use of the preemption count). I'm obviously interested in the topic, and i'm happy that all the seemingly conflicting concepts as lowlatency and preemption are now properly merged into 2.5 and that we have really good kernel latencies. Other pressing topics like the scheduler and the threading code still keep me busy most of the time.

JA: Your IRQ rewrite and Robert's preemptible kernel work have resulted in a unified per-task atomic count (the preempt_count) and a lot of code being cleaned up. Do you have plans to do more work in this area?

Ingo Molnar: not at the moment - right now i think that the IRQ code could hardly be any cleaner than it is today :-)

JA: What other kernel projects are you currently working on?

Ingo Molnar: mainly the scheduler, plus these days i'm working on enhancing the handling of 'threads' under Linux, utilized by the NPTL project done by glibc maintainer Ulrich Drepper. This has a high number of components that are in the 2.5 kernel already.

JA: Can you further describe the components that have already been merged into the 2.5 kernel?

Ingo Molnar: TLS stands for 'Thread Local Storage'. You can find the first announcement of the patch at:

http://lwn.net/Articles/5851/

a number of followup patches were posted, and it all got eventually merged
into 2.5.31.

Plus there were a few other things related to threading:

http://lwn.net/Articles/8131/

http://lwn.net/Articles/8034/

http://lwn.net/Articles/7618/

http://lwn.net/Articles/7617/

http://lwn.net/Articles/7603/

http://lwn.net/Articles/7411/

http://lwn.net/Articles/7408/

(note that most of the above patches got reworked significantly before they
got into the 2.5 kernel, but the concepts were all preserved.)

JA: What other Linux kernel related projects have you worked on in the past?

Ingo Molnar: here's a probably incomplete list of the bigger pieces that made it into the kernel: software-RAID support, 3-level paging on x86 (and highmem), the recent IRQ handling rewrite in 2.5 (which also removed the 'big IRQ lock'), the timer scalability patch, kernel workqueues, the CPU affinity syscalls, the initial SMP pagecache scalability code in 2.3, and i also wrote the original 'writeback pagecache' patch for 2.3, wrote various fixes and enhancements to the 'old' scheduler, wrote the 'wake one' support patch for 2.4, wrote the original zoned allocator, bootmem and mempool subsystems. Ie. all across the spectrum.

One project that is not in the 2.5 kernel is the Tux webserver (and now FTP server as well). If you want to see a Tux/FTP server that can serve 10,000 users then do:

ftp ftp.rpmfind.net

some smaller but interesting patches: the NMI watchdog, the ability of the 2.4 kernel to create more than ~4000 processes on x86 (ie. the removal of per-thread TSS), netconsole/netdump, 'big reader locks', and one older patch from 2.2 times i'm particularly proud of: i wrote the original 'current task pointer' implementation, which uses the stack pointer to get to the 'current task pointer' on SMP systems. I also wrote the 'memleak' and 'ktrace' debugging helper tools, which have been picked up by other projects.

JA: Your list of contributions is staggering!

Ingo Molnar: well, it's just that i've been around long enough, and that i'm interested in many different areas. So a colorful mix of contributions piled up.

JA: Are you still working on the Tux webserver?

Ingo Molnar: occasionally yes, but other things take precedence currently. But life has not stopped, eg. Anton Blanchard has ported Tux to 2.5, and Arjan van de Ven keeps the 2.4 patch uptodate.

JA: What still needs to be modified in the generic kernel?

Ingo Molnar: it's mainly two VFS changes, an exit()-time cleanup function and one new TCP event callback. All the 'big' features that were induced by TUX are in the 2.5 kernel already, zerocopy and the scalability work, so TUX for 2.5 is a really unintrusive patch.

JA: Of all these many impressive accomplishments, which are you the most proud?

Ingo Molnar: well, perhaps the scheduler, it manages to solve a few really hard conceptual problems in a pretty critical piece of code that already got called a couple of thousand times while eg. reading this article on a Linux box! :-)

JA: What is your background in programming prior to getting involved with Linux?

Ingo Molnar: well, like many others, i grew up on programming all possible (and even some impossible) aspects of Commodore micro-computers, since age 11. Completely knowing a greatly simplified but fully functional computer architecture helped alot in kernel development.

I think kids today have a harder time, since hardware vendors are much more tightlipped about computer internals, and the complexity of computer systems skyrocketed as well. Linux perhaps helps here too, as a central 'documentation' and reference implementation for "all computer internals that matter".

JA: Much of your work seems to be focused on improving the performance and scalability of the 2.5 kernel. Is this the result of RedHat's product requirements, or your own interests?

Ingo Molnar: well, i'm in the fortunate position that the two are a perfect match.

JA: Can you describe your development environment, including the hardware and software tools you typically use?

Ingo Molnar: i use all the normal text based kernel development tools: vim, gcc/make/etc., i use a serial line to a test-system to debug kernels, and that's all. I like it simple when reading kernel code: i use text consoles (on an LCD screen) to do most of my development work. Occasionally i drop into X for tools that make sense only there, such as ethereal or some of the BK tools.

JA: Have you worked with any other open source kernels?

Ingo Molnar: not really. I occasionally take a look at FreeBSD - some things they do right, some things they dont, in the areas i'm most interested in the Linux kernel is currently ahead both design-wise and implementation-wise. Finally we caught up in the VM subsystem as well, with Andrea's big and important 2.4 rewrite, Rik's great rmap code and Andrew's fantastic integration work. But what other answer would one expect from a Linux kernel developer? :-)

JA: FreeBSD 5.0 is due to be released around December of this year, with some significant changes to the kernel. Have you followed this development?

Ingo Molnar: not really. The things i sometimes do is to look at their code. Also, when i search for past discussions regarding some specific topic, sometimes there's a FreeBSD hit and then i read it. That's all what i can tell. But i do wish their kernel gets better just as much as the Linux kernel gets better, there needs to be competition to drive both projects forwards. (the Windows kernel is closed up enough so that it does not create any development stimulus for Linux (and vice versa). Rarely do any Windows features get discussed.)

JA: What areas of the Linux kernel do you think still lags behind FreeBSD?

Ingo Molnar: there were two areas where i think we used to lag, the VM and the block IO subsystem - both have been significantly reworked in 2.5. Whether the VM got better than FreeBSD's remains to be seen (via actual use), but the Linux VM already has features that FreeBSD does not have, eg. support for more than 4 GB RAM on x86 (here i guess i'm biased, i wrote much of that code). But FreeBSD's core VM logic itself, ie. the state machine that decides what to throw out under memory pressure, how to swap and how to do IO, is top-notch. I think with Andrew Morton's and Jens Axobe's latest VM and IO work we are top-notch as well (with a few extras perhaps).

There's also an interesting VM project in the making, Arjan van de Ven's O(1) VM code. [without doubt i do appear to have a sweet spot for O(1) code :-) ] Rik van Riel has merged Arjan's code a couple of days ago. The code converts every important VM algorithm (laundering, aging) to a O(1) algorithm while still keeping the fundamentals - this is quite nontrivial for things like page aging. It's in essence the VM overhead reduction work that Andrea Arcangeli has started in 2.4.10, brought to the extreme. I have run Arjan's O(1) VM under high memory pressure, and it's really impressive - kswapd (the central VM housekeeping kernel thread), which used to eat up lots of CPU time under VM load, has almost vanished from the CPU usage chart.

I do have the impression that the Linux VM is close to a conceptual breakthrough - with all the dots connected we now have something that is the next level of quality. The 2.5 VM has merged all the seemingly conflicting VM branches that fought it out in 2.4, and the many complex subsystems involved suddenly started playing in concert and produce something really nice.

JA: A much earlier version of the rmap code was originally in the 2.4 kernel, but got ripped out. Do you feel it has improved enough that this won't happen again?

Ingo Molnar: this most definitely wont happen. We already rely on rmap for some other features, so it's not just a matter of undoing one patch. Rmap is essential to the new VM, without rmap the VM would be like a ferrari with an old diesel motor - looks good but is pretty unusable.

the problem of rmap in 2.4 was simply its complexity, relative youth as a project and the relative low number of people that tested it. So in 2.4 it would have been quite a stretch to keep it in. But it was a fair game for 2.5, and with Andrew's simplification/robustization/speedup of Rik's rmap code it was very manageable.

JA: What other major improvements have gone into 2.5, beyond the scheduler and VM rewrites?

Ingo Molnar: the block IO rewrite, lots of VFS changes, a rework of the module code and (plug) the new threading implementation. The block IO rewrite was long overdue and that's the one i'm most happy about.

JA: Do you feel the changes are significant enough to call the next major kernel 3.0 instead of 2.6?

Ingo Molnar: well, i do think they are significant enough to be called 3.0 - on the other hand it might not matter much whether it's called 2.6 or 3.0, after all what ordinarily people know about is this new shiny Linux 9.0 release, right? ;)

JA: Looking into the future, what do you see in store for the next development kernel, version 2.7?

Ingo Molnar: no idea, really, i dont think trying to look into the future brings many fruits, the kernel needs to handle what is available here and today. Sometimes we are lucky and create stuff that happens to work for years :-) Perhaps something like OpenMosix would be nice to have in the kernel. Plus even better (native) support for User Mode Linux. Things like this.

JA: Do you have any advice to offer those aspiring to become productive kernel developers?

Ingo Molnar: only the old mantra: to read the source and the mailing lists. And take it easy - do what you like doing most.

barrios kernel story

2002 kernel trap Ingo Molnar Interview Dec 11, 2008

0 개의 덧글:

About me

Labels

Blog Archive