jeffr_tech ([info]jeffr_tech) wrote,
@ 2007-06-18 19:33:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
I have updated my scaling results with Linux 2.6.21.5 and glibc 2.6. You can see them at http://people.freebsd.org/~jeff/sysbench.png.

This also has results comparing the old FreeBSD scheduler with no affinity, SCHED_4BSD. SCHED_ULE is the version of ULE that doesn't use per-cpu locks. And SCHED_SMP is actually ULE with per-cpu locks and various other tuning. I actually have had better results than those on the graph but I don't feel like regenerating it again right now.

Linux has definitely improved but they still have some significant problems. I will try with tcmalloc later.



(Post a new comment)

Possible problem
(Anonymous)
2007-06-19 03:28 am UTC (link)
Assuming the glibc you are using is doing the optimal malloc thing now, you
still need the following patch to the kernel which AFAIK is not in 2.6.21.x.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=0a27a14a62921b438bb6f33772690d345a089be6

If you don't know whether your glibc is doing the right thing -- well if the
above patch changes anything, then it is going through that path, and if not
then it is not :)

(Reply to this) (Thread)

Re: Possible problem
[info]jeffr_tech
2007-06-19 04:34 am UTC (link)
The numbers changed slightly but it's hard to say because there is significant variance.

I don't believe this is the root cause. It looks like the scheduler is simply not keeping all of the processors busy.

(Reply to this) (Parent)(Thread)

Re: Possible problem
(Anonymous)
2007-06-19 04:42 am UTC (link)
>The numbers changed slightly but it's hard to say because there is
>significant variance.

Then I'd guess that your glibc is still using the old path for allocation /
freeing.


>I don't believe this is the root cause. It looks like the scheduler
>is simply not keeping all of the processors busy.

That's what it looks like because there is significant contention on the
mmap_sem semaphore which is shared among all threads. The patches fix that.

(Reply to this) (Parent)(Thread)

Re: Possible problem
[info]jeffr_tech
2007-06-19 04:49 am UTC (link)
I downloaded glibc 2.6 from ftp.gnu.org. How do I verify whether or not this is using the 'new' allocation routines.

(Reply to this) (Parent)(Thread)

Re: Possible problem
(Anonymous)
2007-06-19 04:53 am UTC (link)
I'm not completely sure from the version number -- I'm not involved with glibc
development too much.

You could strace MySQL daemon and see whether it is making use of
madvise(MADV_DONTNEED) or if it is doing a lot of mmap() calls. The former
is what you want.

(Reply to this) (Parent)(Thread)

Re: Possible problem
[info]jeffr_tech
2007-06-19 04:59 am UTC (link)
The malloc implementation is using MADV_DONTNEED to shrink the heap.

I have uploaded a new graph with some debugging support removed that was built in to the kernel.

(Reply to this) (Parent)(Thread)

Re: Possible problem
(Anonymous)
2007-06-19 05:15 am UTC (link)
Weird. It looks very different to the results I get, that has Linux
following a similar curve to your FreeBSD results.

http://www.thisishull.net/showpost.php?s=5d2bfa8b5a0707286a86d7c57a2c6308&p=1010222&postcount=2

sysbench transactions per sec (higher is better)

kernel is 2.6.21
threads   unpatched tps   patched tps
1,        453             476
2,        831             871
4,       1468            1529
8,       2139            2235
16,      2118            2177
32,      1051            2120
64,       917            1949

With the patched kernel, you see its only losing about 5% from peak at 8
threads to 32 threads (which looks roughly in line with what you see).

OTOH, I was testing with a patched 2.4 glibc IIRC, so maybe something else
is happening in 2.6. I'll have to retest it.

(Reply to this) (Parent)(Thread)

Re: Possible problem
[info]jeffr_tech
2007-06-19 05:20 am UTC (link)
Yes, that does look consistent with my findings. Can I see the patch you have for 2.4? I could see how the diff looks against the 2.6 sources I have.

(Reply to this) (Parent)(Thread)

Re: Possible problem
(Anonymous)
2007-06-19 05:26 am UTC (link)
I was using Jakub's uploaded glibc from this post (sorry, it was
2.5.x, not 2.4)

http://www.ussg.iu.edu/hypermail/linux/kernel/0704.2/2064.html

The rpms appear to no longer be available, but I assume his patch
is the only deviation from the glibc cvs at the time.

(Reply to this) (Parent)(Thread)

Re: Possible problem
[info]jeffr_tech
2007-06-19 05:35 am UTC (link)
Thanks, it looks like most of this patch went in, but the chunk that tries MADV_FREE is not in. The rest of the accounting was checked in. I'm testing that now.

Is this Nick Piggin, btw?

(Reply to this) (Parent)(Thread)

Re: Possible problem
[info]jeffr_tech
2007-06-19 05:40 am UTC (link)
This did not seem to make a difference. There may be some other contention that is masking the effects of this patch. How do I see what resources we're contending on in linux? Is there some lock profiling or tracing?

(Reply to this) (Parent)(Thread)

Re: Possible problem
(Anonymous)
2007-06-19 05:50 am UTC (link)
Yeah this is Nick P.

The MADV_FREE hunk is not relevant to upstream kernels, it was just implemented
to test some other ideas.

It could well be the case that there is some other contention introduced from
somewhere. I don't think there is a good lock profiling infrastructure
upstream yet -- especially not for sleeping locks where you can't just wing
it by counting CPU time.

If you boot with `profile=schedule,1`, and use the regular readprofile tool
over the running test, it should profile the number of places that invoke a
context switch -- if semaphore functions show up highly here, then it will
be because they are contended.

OTOH, that doesn't always work very well, because it can get swamped by other
things (especially if your semaphore hold times are longish, so you don't have
a context switching frenzy, but neither do you have enough runnable threads).

(Reply to this) (Parent)

new Linux scheduler
(Anonymous)
2007-06-21 10:05 pm UTC (link)
Jeff, what do you say about new Linux scheduler, please ? -- http://kerneltrap.org/node/8059

(Reply to this)


Create an Account
Forgot your login?
Login w/ OpenID
English • Español • Deutsch • Русский…