jeffr_tech ([info]jeffr_tech) wrote,
@ 2007-01-02 22:15:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
ULE 2.0
I've been undertaking some major updates to ULE. Seems like a good subject to get back into my tech journal with. This is a long discussion of scheduling mechanisms. I'll be impressed if you finish it.


I was never very happy with the double-queue mechanism for ensuring fairness that I borrowed from the Linux O(1) scheduler. Briefly, O(n) schedulers would typically iterate over every thread that was waiting to run and slightly boost its priority to ensure that it would eventually run. Think of a thread which had used 100% cpu for some time. It would have a poor priority and would be placed on the end of the queue. If many new threads with lower priorities now want to run they will starve the original thread until their priorities also drop. Initially that's what you want, but eventually it should be allowed to run again, and more quickly than it takes to decay the priorities of other threads otherwise the latency is intolerable.

Obviously this algorithm forces us to periodically visit each scheduled thread and to keep accurate cpu% estimation we would visit every thread, scheduled or not. To get O(1) scheduling, Linux and ULE 1.0 employed a double queue mechanism where threads are inserted into one queue and run from another. These queues are swapped when the running queue is empty. Some high priority threads are always inserted into the running queue.

This gives us O(1) but also some peculiar scheduling features. Firstly, it's very difficult to implement nice correctly. When I evaluated the linux scheduler years ago it just punted on the problem. This is basically because any task which is not always inserted into the active queue, which are the interactive tasks, ends up being semi-priority order round-robin. They all get one turn for each flip of the queues and everyone's turn has about the same time quantum allocated to it. If your priority is better you simply get your quantum sooner and then you wait for the flip again. I gave some response to nice values by altering the scheduling quantum based on nice value but this was more complicated than it sounds.

For fair scheduling, what we'd really like is to give all userspace threads a proportional share of the CPU available for batch jobs with the option of user adjustment via nice. To do this, the old BSD scheduler is essentially modeling how much cpu each process has used recently. The priority is adjusted to reflect this and all threads are run in priority order. The problem 4BSD had was there is no distinction between batch and interactive tasks. Threads simply get higher priority treatment when they run infrequently. In ULE I solved this by having an interactivity score which takes into consideration how much time the thread ran vs how much time it voluntarily slept. This differs from how much time it ran vs overall system time and is an important distinction.

Back to the two queues.. There were other practical problems with having two queues. Firstly, it is expensive in memory footprint and cache hits. The priority queues are big datastructures. Accessing many of them per cpu has a measurable impact. Secondly, having threads on multiple queues caused problems for the rest of the system. If a thread waiting to run needs its priority elevated for priority propagation do we have to move it from one queue to another? How does that effect our queue-flipping and starvation strategy?

So between the poor performance, the quirky batch job scheduling behavior, and the difficulty in manipulating priorities for the rest of the system, I was never quite satisfied with the double queues. To replace it, I've used a circular queue inspired by a calender queue I used in a real-time callout system. Imagine one bucket per priority with an insertion index. The priority is
relative to the insertion index. As time moves on the insertion index is moved down and a thread that was inserted with a worse
priority is elevated relative to this index without specifically visiting each thread. We have only one queue so there is little extra cpu overhead and we have more flexibility in deciding how much runtime each thread gets relative to those with a lower priority. ULE only uses this queue for batch jobs. For jobs that ULE considers interactive we place them on the same queue we place realtime and interrupt threads. They are always serviced before any non-realtime threads and they are quickly punted to the batch queue if they overuse the cpu.

In the old system priorities were decided based on the interactivity score, which is the ratio runtime to involuntary sleeps. If you have many threads that run without voluntary sleeps they quickly reach identical priorities regardless of what share of the cpu they have gotten. This was convenient because I was already calculating interactivity, but ULE also now accurately calculates CPU utilization over the last 10 seconds. I now base the priority on this metric, which gives us a better response with many competing cpu hogs. Old hogs temporarily aren't allowed to run until new hogs have evened things out.

ULE is also now slightly faster on uni-processor than 4BSD. The balance used to tip in 4BSD's favor on UP. The SMP balancing algorithms are unaffected by this latest change. I still have some tuning and adjusting to do but I expect to commit this within the next week or so. I've been running some version of the circular queue for a number of months now although I've just recently refined the design to the point of general appeal.

p.s. I sort of like programming again.



(28 comments) - (Post a new comment)


[info]jlb
2007-01-03 07:01 am UTC (link)
"I can't believe I read the whole thing."

Glad you're posting again.

(Reply to this) (Thread)

(Reply from suspended user)
test
[info]oyunlar2
2009-03-16 08:06 pm UTC (link)
test
oyunlar | nedir

(Reply to this) (Parent)(Thread)

Re: test
[info]oyunlar2
2009-03-16 08:13 pm UTC (link)
oyunlar

(Reply to this) (Parent)

So...
[info]smkelly
2007-01-03 07:04 am UTC (link)
If I read the whole thing, do I get a prize?

(Reply to this) (Thread)

Re: So...
[info]jeffr_tech
2007-01-03 08:45 pm UTC (link)
Yes, a new scheduler.

(Reply to this) (Parent)


[info]natalielekyg
2008-07-17 02:39 am UTC (link)
] Follow the link to read the full entry: 8 Stupid Things Webmasters Do To Mess Up Their Analytics * Read the whole thing, if it’s boring, just stare at the pretty screenshots until they start [.

(Reply to this) (Parent)

(Reply from suspended user)

[info]nealsid
2007-01-03 07:44 pm UTC (link)
In ULE I solved this by having an interactivity score which takes into consideration how much time the thread ran vs how much time it voluntarily slept. This differs from how much time it ran vs overall system time and is an important distinction.

By voluntarily slept, do you mean that how often it was waiting for user input? Do I/O bound processes make this distinction hard at all?

(Reply to this) (Thread)


[info]jeffr_tech
2007-01-03 11:10 pm UTC (link)
I don't draw a distinction between disk io and waiting on user input. You are correct in assuming that this gives i/o bound processes an advantage. So long as they continue to have little run time there is no harm. We detect them transitioning into a cpu bound mode fairly quickly so this is not a problem in practice.

BSD does have a facility through which I could determine, in most cases, if the sleep was for disk i/o. However, interactive tasks also often do disk i/o (think about mozilla).

(Reply to this) (Parent)


[info]nikkiqafux
2008-07-11 11:25 am UTC (link)
These comments do have their place, but all too often they make us laugh at someone else’s expense.

(Reply to this) (Parent)


[info]lostinmind
2007-01-05 04:59 am UTC (link)
Finally! I was looking forward to ULE so long ago. It's nice you've cleared up some design issues. I hope it goes well. It seems there are quite a few schedulers now, and I have no clue what the differences are.
ULE, CORE, 44BSD, and someone mentioned pluggable schedulers but that boggles my mind.

Good luck on ULE.

(Reply to this)

ULE vs. 4BSD
(Anonymous)
2007-01-06 04:09 pm UTC (link)
Hi, I am devel of PC-BSD and we already used ULE scheduler as default one in PC-BSD 1.2 release (It feel faster actually than 4BSD- everyone admit it), but unfortunately now we are using 4BSD again because of some strange behaviour (crashes) with ULE- this is not a fact but most FreeBSD kernel developers don't recommend ULE at all.

Is there any stabilization in implementation or all this crash stuff is plain FUD?

(Reply to this) (Thread)

Re: ULE vs. 4BSD
[info]jeffr_tech
2007-01-06 10:28 pm UTC (link)
What kernel version is your project using? For a long time ULE was broken by constant changes to the threading infrastructure and I was not around to maintain it. I very recently have done a considerable amount of work on it in -CURRENT. It has been very stable for me but it hasn't received as much testing by other people as I would like.

(Reply to this) (Parent)


[info]sas_spidey01
2007-01-06 07:17 pm UTC (link)
Very interesting read. I thought ULE was going to be replaced altogether.

I've never had a problem using ether scheduler but I've only gotten as close to a SMP box as a Dual Core CPU allows :-(

(Reply to this)

FreeBSD and ULE 2.0 is the way to go!
[info]arabchat
2007-02-15 09:34 am UTC (link)
I'm sure ULE 2.0 will be the way to go!

It has rocked my Turion64 X2 laptop with the KDE desktop, and will use it when it got more polishing and 7.0-RELEASE goes live! for my 9 servers which run MySQL :)

(Reply to this)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)
seks hikaye
[info]wust
2009-01-12 10:23 pm UTC (link)
a href="http://www.seksdelisi.net/">seks hikaye</a></div>

(Reply to this)

erotik hikaye
[info]wust
2009-01-12 10:25 pm UTC (link)
erotik hikaye

(Reply to this)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)
Oyuncular
[info]oyunlarx
2009-03-14 01:17 am UTC (link)
buona descrizione dei dettagli. Grazie per aver condiviso
Oyun | Sinema | Hava durumu

(Reply to this)


(28 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…