This also has results comparing the old FreeBSD scheduler with no affinity, SCHED_4BSD. SCHED_ULE is the version of ULE that doesn't use per-cpu locks. And SCHED_SMP is actually ULE with per-cpu locks and various other tuning. I actually have had better results than those on the graph but I don't feel like regenerating it again right now.
Linux has definitely improved but they still have some significant problems. I will try with tcmalloc later.