In an earlier post post I mentioned software transactional memory (STM) as an alternative to using locks that maintains one’s ability to compose solutions to problems. That’s all well and good, but does anyone actually use it other than as a research toy? The answer to that question is yes: the team that I work with are using it in a production piece of software and have been for three years now. This probably raises some questions: how well does it work1? Are you insane for using a research toy in production software2? What the hell is this STM thing you keep babbling about anyways3? Let’s start with that last question and work backwards from there.
So what is STM and how does it allow us to program multi-threaded programs while maintaining composibility? To start with lets say we have some variables that are being shared between threads. So we start a transaction, don’t ask me how, we’ll get to that later. Now we’re going to read some of the variables and write to some of them. The reads are tracked by the transaction so that multiple reads of the same variable in a transaction will always yield the same value4. This read tracking will also be important in a bit when it comes time to commit the transaction. The writes are also tracked in the transaction and are not visible to other threads until we’re done messing with the variables and we commit the transaction. The first thing a commit does is check all the variables that we read to make sure that no other threads have committed any writes to them since we first read them. If any of the variables have changed then we have a conflict and we have to start our transaction over. If there were no changes then our writes are made visible to other threads in an atomic fashion (i.e. if another thread sees one of our writes it must see all of them, as long as none of those reads were done before we committed). If you’ve worked with transactional databases before then this all probably sounds familiar. It’s the same idea, just working with stuff in memory instead of database rows.
An example might make this more clear. Say we have two variables,
B in the diagram above. We start a transaction on thread 1, then before thread 1 can read anything we start a transaction on thread 2 and write to both
B. Before the transaction on thread 2 can commit, thread 1 reads
A. Even though we’ve written to
A in thread 2 we get the old value (
A = 1) in thread 1 since thread 2 hasn’t committed its transaction yet. Now thread 2 commits before thread 1 can read
B so when thread 1 does read
B it gets the new value (
B = 3). Note that if we read
A again we get the same value we read the first time (
A = 1). Now if we write to
A and then read it we get the value that we wrote (
A = 3). When we try to commit the transaction on thread 1 we get a conflict since
A was changed after thread 1 first read it. So the transaction’s writes are abandoned and the transaction starts over, this time getting both of the changes made in transaction 2.
There are lots of variations on the STM theme: word versus object based, implicit versus explicit, how privatization is handled, weak versus strong atomicity, and bounded versus unbounded. All of these influence how you program the system, its performance characteristics, and what types of problems you can run into.
STM systems can work at either the word or object level. When working at the word level the STM system is transacting the individual bytes that you’re working with. STM systems that work at the object level are transacting full objects. It seems like word based systems have their place, but only if you’re doing a lot of bit twiddling at a low level. For most applications object based systems make more sense to me. Plus, the two systems that I have experience with – Haskell’s built-in system and the home-grown system we use at work – are both object based systems. So keep in mind that I’m primarily talking about object based systems when I start spouting off about how well things work and making suggestions on good STM usage.
One can also categorize STM systems based on whether they are explicit (the variables to be handled transactionally are declared to be so by the programmer) or implicit (the STM system figures out what to transact on its own). If your system isn’t built into the compiler then there’s little chance of it being implicit as there’s a lot of code instrumenting that needs to be done during compilation to make it work – the system proposed for the C++ standard and Intel’s experimental system are both implicit systems and both are implemented at the compiler level. Our home-grown system and Haskell’s system (that my system is modeled after to some degree) are explicit: in order for something to be transacted it has to be declared as a transacted value (either as a
STM::Var in our system or
TVar a in Haskell). Both are also implemented at the library level so really there’s no way they could be made implicit.
Implicit systems don’t require you to litter your code with special types or function calls, that’s supposed to make them easier to use. But there is a danger: if you access a variable in one thread transactionally while another thread accesses it non-transactionally then you’ve got a race condition. I suppose that’s not any worse than the current situation with mutexes: if you access variables without adequately protecting them with mutexes then you end up with a race condition as well. With an explicit system you lean on the type system to prevent things from being accessed non-transactionally and causing race conditions. There might be a performance advantage to being able to sometimes access a value non-transactionally, but that sounds like a dangerous area to get into without really knowing what you’re doing, analogous to lock-free programming.
Another differentiator between systems is how privatization is handled. When you’re working in a transaction not everything needs to be transacted: there’s all sorts of data that is private to the thread and transacting it would be an unnecessary drag on performance. In explicit systems the programmer is already encoding the information about what can be privatized by what is and is not stored in the transaction variables. The implicit system has to figure out on its own what can be privatized and what can’t. In either case overeager privatization (i.e. privatizing a variable that is actually shared between threads) can create race conditions. In this respect the implicit system comes out ahead since the programmer doesn’t have to worry about what is privatized and what isn’t, it should all just work. Though this can lead to performance degradation if the system’s privatization algorithm is too conservative.
In a strongly atomic system both a transacted variable and any state internal to that object must be transacted. Systems that can’t do this have weak atomicity, if you pull an object out of a variable in a transaction you are free to use that object all you want outside of a transaction as far as the STM system is concerned (whether that is a good idea or not is up to your design beyond the STM system). Our system is weakly atomic because I don’t know how it would be possible to create a strong system without modifying the compiler. In a later post I’ll get into how we avoid race conditions when using a weak system.
If the number of reads and/or writes that you can do in a transaction is limited then the system is bounded, if you can do reads and writes to your hearts content then it is unbounded. Normally a system is bounded because it is tying into transactional memory hardware that is limited in how many reads or writes it can deal with in any given transaction, or maybe there’s a special algorithm underlying the system that gives it better performance at the cost of limiting the number of reads or writes. Neither of those conditions apply to our system so it’s unbounded.
In the event of a conflict a transaction gets restarted from the beginning. This means that one has to be careful about side-effects when using an STM system. In Haskell they are able to prevent side-effects entirely (barring the use of
unsafePerformIO) in transactions by using the type system. C++ (and most other languages) don’t have strong enough type systems to do this, so we have to rely on programmer discipline instead. Though a good system should allow the programmer to either white-list or black-list functions as OK or not OK to call in transactions. Some systems also have methods for scheduling side-effects to happen after the transaction commits. Our system supports both of these, we have a method for scheduling side-effects and we can black-list functions from transactions. Unfortunately our blacklisting results in a run-time error if a blacklisted function is called in a transaction. I couldn’t come up with any way to exploit the C++ type system to make the check take place at compile time, a feature an implicit system built into a compiler can provide.
That should give you some idea of what STM is, why it’s useful, and enough useless detail to glaze your eyes over. Sorry for the excessive verbiage, but STM is a big subject that I needed to whittle it down a bit before I could start getting into how we use it, how well it works and why I think it works well for us. More to come…
- TL;DR: great, but YMMV. 5 ↩
- Not completely and I will justify what I did in a later post. ↩
- Given the job interviews I’ve been conducting lately I’m guessing this is probably the most popular question of the three. ↩
- Not all STM systems treat reads in this way. Some demand consistency from the beginning of the transaction, and others don’t provide any consistency. I don’t want to go down that rabbit hole right now, our system works as stated so I’m going to leave it at that. ↩
- The application I work on may be in the sweet spot for STM. Details will follow in a later post. ↩