I’ve had a bunch of posts like this so far, but that’s not a bad thing. Sepherosa Ziehau has a pair of optimizations that appear to make performance with big pipes (1G) and tiny packets (18b, if I read correctly) reach near the physical maximum for 1000-base-T Ethernet.
Category: Committed Code
Francois Tigeot has been working for quite a while on a VFS accounting system. It doesn’t restrict to a quota (yet), but it will give you byte totals for each mounted filesystem. It has been committed, so it looks like a good way to tell which PFS is eating your disk.
Update: Francois pointed out he’s still adding parts for this. So it’s not quite done yet, but soon.
Buildworlds are now much faster, because they can run themselves in parallel. Invoke it using the -j option to make. Matthew Dillon saw a 25% reduction in time when using ‘make -j 12 buildworld’ on a 4-core system. You may need to manually update xinstall and mkdir:
cd /usr/src/usr.bin/xinstall make clean; make obj; make all install cd /usr/src/bin/mkdir make clean; make obj; make all install
It’ll also use more memory than a non-parallel build, but heck, that’s cheap these days.
Venkatesh Srinivas made a minor change to a ddb backtrace – it now prints the raw instruction pointers. On x86_64, a backtrace would not print the correct objects out, so this is better. It’s a minor change, but I’m pointing it out because it totally helped solve a problem for me on a package-building machine.
The general rule of thumb is that if you have a function written in an interpreted language (Perl, Python, etc.), it’ll be faster in C. If you need it faster than that, you go to assembly. Prepare to have your world rocked: Venkatesh Srinivas found that strlen() in libc was actually slower written in assembly than in C. His commit message has numbers to back that up.
It’s another throughput tweak from Sepherosa Ziehau: soaccept is run differently when pulling in network data from a socket. The commit message once again shows the results of the change using httperf.
Binutils in DragonFly is now up to version 2.22 – the commit linked is one of several.
Some time ago, Matthew Dillon worked on a bulk build system that built as much of pkgsrc in parallel as possible. It’s in the tree now as ‘fastbulk‘, for anyone wanting to try it out. I used it a bit; I didn’t measure the degree of speed increase, but was able to get about 70% of the packages built.
Sepherosa Ziehau has implemented another networking speedup. Read the commit message for details on what he changed, since it’s rather in-depth. He shows an 18% improvement in netperf results.
If you’re tracking DragonFly current, you will need to do a full buildworld on your next update. Sepherosa Ziehau made some changes in route(8) that a quickworld will not catch.
Alex Hornung has created ‘dfregress’, a test framework designed to be as simple as possible for adding tests to DragonFly. This would make it easier to verify an upcoming release is correct, for instance. See his commit note for extensive details, and add a trivial test for anything you value.
This is another one of those features that I bet goes away, and nobody would notice because nobody uses it any more. Sascha Wildner has removed AppleTalk from DragonFly.
DragonFly has a new memory allocator, called (not surprisingly) “dmalloc“. It’s only present on x86_64, not i386, because it could eat up more VSZ (virtual memory) than an i386 kernel may have available.
The presence of /usr/include/crypt.h in DragonFly (starting in December 2010) meant that some programs compiled during that time will expect that file to always be there. It was recently removed, so any programs compiled in that timeframe will also need to be recompiled. Right now, this affects you only if you are running DragonFly 2.13 , since that’s the only place crypt.h was removed. This may be an issue for the release, but we’ll worry about that when we get there… I’m kicking off new 2.13 bulk builds now.
You can now have, in theory, up to 32 terabytes of RAM on your 64-bit DragonFly system, from a change made by Matthew Dillon. I’m curious to see if anyone has even 1 terabyte, as that’s at least feasible.
Matthew Dillon wrote up an explanation of how performance on systems with a lot of CPU cores has been significantly improved – up to 300%! (He says 200%, but I think he’s treating it as a percentage of a whole rather than percent changed.) Apparently finally getting rid of lock contention is the trick.
Antonio Huete Jimenez’s ‘libhammer‘, a library to make various Hammer functions available to userland programs, has been added. It implements ‘hammer info’ only at this point, if I understand correctly.
From what I can tell, Sepherosa Ziehau’s made some changes where you can control TCP timeout and keepalive timing on a per-tcpcb basis, or at least that’s what I gleaned from the docs. He’s been doing a lot of work lately, but it’s hard to link to because so much of it is at a basic level that makes it difficult to summarize in terms of how the features affect the user.
Sascha Wildner updated time zone files again. It’s a regular thing, but I wanted to draw attention to this little change:
Samoa moves from east to west of the international date line (changes from UTC-11 to UTC+13). It will skip December 30, 2011.
2011/12/30 in Samoa will never exist or have existed, which is entirely odd.
If you’re running 64-bit DragonFly, and you’re on version 2.11, you will want to rebuild with the latest sources. Peter Avalos found a bug with file descriptor passing, and Venkatesh Srinivas fixed it. It will require a quickworld/kernel build – maybe a full buildworld and kernel? I’m not sure. Some pkgsrc packages might need recompilation, too if they also passed file descriptors around.
Well, if you tell it to do so. Matthew Dillon has added a user-settable limit to the amount of memory used during deduplication, so if your Hammer-using system is low on RAM, you can conserve. This is probably most useful if you are running DragonFly in an extremely small VM, or if your name is Venkatesh.
(inside joke; Venkatesh has a crazy old desktop for DragonFly.)
I really just like that phrase and the action movie feeling of using it, like “Watch out! The pulse-width modulated time-domain multiplexer is targeting us!” Sorta like a PU-36 space modulator. It’s actually a recently-committed mechanism to improve write performance in Hammer, but my idea sounds more exciting.
Alex Hornung has made a pile of changes for disk encryption, including adding libdm, a “simple BSD-licensed libdevmapper“,and adding tcplay, a 100% compatible implementation of TrueCrypt. This should make you very happy if you like running from an encrypted disk.
Update: Alex has written an in-depth explanation of this work. It’s a huge change!
Update update: Hey, it’s showing on Hacker News too!
If you’re running a recent version of DragonFly 2.11, it’s worth updating. Matthew Dillon fixed a networking bug that I’ve seen cause problems. It was introduced within 2.11’s lifetime, so as far as I know, this won’t affect anyone on 2.10.
Matthew Dillon has made some changes to AHCI support; if you have an Intel motherboard with an SSD drive that occasionally doesn’t want to co-operate on a cold boot, this recent update may fix it.
The i386 architecture now supports LAPIC and I/O APIC. If you had weird interrupt problems when installing DragonFly before, now might be a good time to try the latest bleeding-edge version of DragonFly and see if the problem vanished.
It looks like Sepherosa Ziehau is working on getting multiprocessor kernels able to boot on single-processor systems. This makes life a bit easier, since there’s only one kernel needed for any given processor. I don’t know if it’s in a finished state yet.
If you have a really old DragonFly system, meaning you’ve been upgrading it since version… 1.8 (I think?), you may have libpthread linked to libc_r instead of libxu. This means that if you have a system that old, you will now need to set THREAD_LIB or just recompile your pkgsrc programs on your next upgrade to something after DragonFly 2.10. I don’t think this is going to apply to a lot of people.
(I hope I got the lib details right…)
ipfilter has now been removed from DragonFly, by Sascha Wildner. We now have “only” ipfw2 and pf for software firewalls.
John Marino has updated the GNU Debugger (GDB) from version 7.0 to version 7.2. The lengthy commit message describes how surprisingly complex the upgrade proved to be.
John Marino’s gone on a tear and updated GCC to version 4.4.6, diffutils from 2.87 to 3.0, and texinfo from 4.8 to 4.13. Each commit message that I linked to has plenty of notes on what’s different, so they’re worth following. This is the first update for texinfo in 6 years, so the quantity of updates is not surprising.
Then use the new LINT64 config added by Sascha Wildner. LINT kernels have every option turned on, so it’s pretty easy to have problems due to conflicts or untested parts and so on. You probably won’t get a kernel out of it, but now there’s a comprehensive list of your 64-bit kernel options for when you’re building a kernel that works.
GNU grep on DragonFly has been updated from version 2.4d to 2.7. Other BSDs have switched/will switch to bsdgrep, but as John Marino points out in his commit message, GNU grep’s still faster. He’s also brought in NetBSD’s version of sort, to replace the GNU flavor. I don’t know why on that one.
Peter Avalos also updated file to 5.06.
Sepherosa Ziehau has made some changes to default SCI settings in ACPI. This may make it possible to boot a computer, or to boot a computer with ACPI, that did not boot before. If it causes problems, he lists some various tunables to set. Just don’t ask me what SCI does.
Sascha Wildner has moved gcc in DragonFly to a slightly newer version: 4.4.5. It mostly seems to make things easier to compile, going by the reports I’ve heard. This is the version that will be in DragonFly 2.10.
Enabling the vfs.hammer.double_buffer=1 sysctl will greatly improve Hammer performance when you’ve exceeded your memory cache (at a possible slight penalty when you have not) and also speed things up when using live deduplication.
Update: Venkatesh Srinivas says:
“double_buffer makes sense when: 1) you want all CRCs to be checked on reads. 2) you’re running live dedup and care about dedup performance rather than say read-heavy performance; 3) you have swapcache but are often running into the vnode limit in what you can cache.”
So, not always useful.
Also, Matthew Dillon has made version 6 the default version of Hammer in DragonFly 2.9. Version 6 has improved handling of directory names in some circumstances. Just don’t ask me which, cause I lost track. It’s been a hard day!
If you’re using pf to control how your bandwidth is used, you may want to look at the recent fairq updates from Matthew Dillon. It should perform better now in situations where one traffic group is saturating its available bandwidth. Here’s a handy link that explains this sort of problem, yoinked from IRC.
John Marino’s work on getting support for DragonFly ‘natively’ into binutils, upstream, has been successful. Thanks, John!
There’s two recent changes for pkgsrc and DragonFly:
- DragonFly-current (2.9) now pulls the most recent pkgsrc quarterly release (2010Q4) by default, instead of pkgsrc-current. This means more packages will be working with the default setup, plus pkg_radd and other tools will be pulling the same ‘generation’ of software.
- The DragonFly/git version of pkgsrc can now be created as a shallow clone. This means less file history, but also means a much faster download.
A full buildworld/buildkernel is probably the best strategy. I’ll be rebuilding all the pkgsrc packages for 2.9 using gcc 4.4… This will take at least a week.
Matthew Dillon’s improved bridging to the point where you can now modify the MAC address of the bridge and most everything, including ARP, will come from it correctly. It’s even possible to bond 2 or more interfaces together, with the side effect of dragonflybsd.org having a lot more bandwidth.
Update 2: More notes here.
Matthew Dillon has continued his bridge work, with another commit adding various features. Go, read.
Matthew Dillon has added transparent bridging, mostly to overcome issues with the AT&T DSL modem he’s using. With this non-default feature, IP packets retain the original MAC address when retransmitted through a new interface.
As Matthew Dillon notes in a recent post, procedures are now assumed to be MPSAFE (i.e. without the Giant Lock) by default. Any new work should follow this idea, and it doesn’t have to be documented specially. The inverse used to be true, where the code that happened to work without the Lock was rare, and therefore needed to be pointed out. Now, the good result is the norm.
Matthew Dillon’s made some scheduler changes, which blogbench tests are showing give the default scheduler better performance under heavy load. It’s a pretty technical writeup, so I’ll just point you at it rather than attempt to summarize.
Matthew Dillon’s made some changes that will speed up the booting process for people with a ridiculous amount of memory, like 64G. This is x86_64 only, but that should not be a surprise if you think about it.