Performance tuning

Matthew Dillon did some more performance tuning for DragonFly.  I’ll just pull a paragraph from the commit message, since that will have more impact than anything I say:

Improves fork/exec concurrency on monster of static binaries from 14200/sec to 55000/sec+. For dynamic binaries improve from around 2500/sec to 9000/sec or so (48 cores fork/exec’ing different dynamic binaries). For the same dynamic binary it’s more around 5000/sec or so.

“monster” is a 48-core machine used for testing.