The other day I was watching Mythbusters with one of my kids and on the show they were examining whether it was faster to have one line per cash register at a grocery store (and allow people to choose their own line) or to have one line that everybody waits in with the shopper at the front going to the next open register. Part way through the episode I realized that the former solution was going to be faster. Effectively, the problem reduces to task scheduling on a multiprocessor system. The cash registers are the hardware threads in the system and the customers waiting to check out are the tasks that need to be scheduled1.
My guess that having one line per register would be faster was due to that solution being analogous to a work stealing scheduler while the single line corresponds to having a single queue that all the threads pull tasks out of. The analogy to work stealing isn’t 100 percent there since we don’t really have forking and joining of tasks going on2. Its more like we’re constantly dropping new root tasks into the queues for the hardware threads. But there seems to be enough of a connection there that one can argue about performance by analogy.
One thing that makes work stealing perform better than having a single queue is that threads aren’t constantly contending for the task at the head of the queue. In a work stealing system each thread only looks at its own queue, unless it’s empty, in which case it will try to steal work from another thread’s queue. But even when this theft is going on, unless the queue being burgled is near empty there should not be any contention since the thief tries to take from one end of the queue while the queue’s owner is grabbing tasks from the other end. The contention for the head of the single queue expresses itself in the grocery store in the time it takes the shopper at the head of the line to get from the head of the line to the next open register (this is the cache synchronization that’s going to need to be done since the head of the queue is a memory location that is shared by all the processors involved). If multiple registers become open at the same time then the cashiers will have to wait while which shopper goes to which open register gets sorted out (this is analogous to the thread synchronization, either by locks or CAS operations, that accessing the head of the queue will require).
By contrast, as long as the lines stay full, the separate lines don’t have the cache synchronization or contention penalties to pay. When one shopper is done checking out, the next shopper is right there ready to go, and may even have started unloading their cart (analogous to having a good chance of getting a cache hit in a work stealing system since you’re not constantly reading and writing to a single memory location shared by all the threads, though apparently even work stealing can be improved upon in this regard3). There is also no contention for the shopper at the head of the line since each register has its own head of the line. In the rare event that the line empties then there is some contention as shoppers jump from another line to the empty one, but it is a relatively rare event4.
In the end the single line was slower, so the analogy bears out. Though there’s another property of these systems that also appeared to carry over. In addition to timing the shoppers, the Mythbusters also asked them to rate their experience. The single line got higher marks than multiple lines. My guess, as well as the Mythbusters’, is that this is due to the single line being fair while the multiple lines is not. Here fair means that everybody has to wait the same amount of time. Or, to put it a bit more formally: once you’re in line no one that gets in line after you will be able to start checking out before you do. This is a property of the single queue system that work stealing systems do not have. The only guarantee that a work stealing system can make is that a task you give it will be run at some point. There’s no guarantees about ordering, other than that tasks that depend on other tasks will run after the tasks that they depend on. So with a single line no one has to watch someone who lined up after them get to check out before them.
Its kind of embarrassing how long it took to realize this. In hindsight it seems like a somewhat obvious analogy.↩
At least I hope there’s no spawning of new tasks going on as that would either mean that people have been in line so long that they’re now having kids while in line, or they’ve taken to performing cloning experiments while in line.↩
Admittedly the analogy breaks down a bit here. In the grocery store the shoppers are autonomous agents and if one of the lines gets noticeably shorter than the others then some shoppers will probably migrate to it of their own accord before it can become empty. This would be like having a thread dedicated to balancing the queues in a work-stealing system. In that case you’re probably better off letting the queues be unbalanced and dedicating that thread to running tasks as well.↩