I ended up fine-grain locking it and removed the possibility of select collisions that unix historically had and BSD had not changed until now. Select collisions occured because there was only one datastructure per-file to record the thread that was waiting for select events. If another thread waited on the same file and that file triggered an event all threads in the system would have to re-scan their select sets to see if they were involved in the collision!
So my improved select patch actually allocates a datastructure per-file per-thread on every call to select. This was required to make the locking sane as well. I also optimized the rescan when events are detected to only rescan the descriptors that signaled events. All in all the new diff requires several lock operations and an allocation per descriptor.
Despite all of this, Diane Bruce of ircd fame benchmarked it to be slightly faster than the old code while selecting on 8k and 1k file descriptors. And Kris Kennaway benchmarked mysql and postgres using the familiar sysbench tests and got the following results:
Interesting to note that for this test pgsql is significantly faster. ~5200 vs ~3200 tps at peak. It also scales more linearly from 1 to 8 cpus. You can also see here how significant the per-cpu scheduler locks were for postgres. Compare the red line to the green line.