I finished and committed the topology aware cpu scheduling that I discussed in earlier posts. I also implemented a mechanism for CPU provisioning that you can use to restrict groups of processes to sets of cpus which can be dynamically migrated. This will be useful for restricting jails to certain CPUs or dedicating some CPUs to real-time special-purpose tasks for example.
Over the last few days I cleaned up my cpu switch optimizations and got those in. The results are 25% faster context switching in a yield benchmark. Even faster than linux on the same hardware. Some day I'll put open solaris on so I have something else to compare to.
Separate from the other switch benchmarks I've been working on reimplementing amd64's context switching routine almost entirely in C. I just wanted to do it because we're putting more complex things in and it was getting hard to find registers but it turns out you can make it much faster too. The yield benchmark is another 10% faster with the C switch routine. Mostly due to enabling more complex checks, like not setting MSR_FSBASE/GSBASE if they haven't changed, and getting uncommon code out of the fast path.