leastfixedpoint

Deadlocks are annoying

This page is a mirrored copy of an article originally posted on the LShift blog; see the archive index here.

Java’s concurrency model provides a sophisticated menu of ways in which to shoot yourself in the foot. Many styles and many variations are available. To give a taste of some of the delicacies on offer, here’s the essence of a problem I found in some code I’d written recently.

Let thread one perform the tasks:

  1. lock C
  2. wait for event 1
  3. unlock C
  4. lock C
  5. wait for event 2
  6. unlock C

Let thread two perform the tasks:

  1. signal event 1
  2. lock C
  3. unlock C
  4. signal event 2

When we were trying to pin down the issue, we found that the code would run through to the end about 10% of the time. The other runs locked up with thread one blocked at step 5, and thread two at step 2.

Java doesn’t make any promises about fairness: specifically, thread one’s release of the lock before reacquiring the same lock need not cause a thread switch, even when the runtime knows there’s another thread waiting for that lock.

There are solutions: both STM and shared-nothing message passing are models much easier to reason about. Let’s hope future revisions to the Java system raise the level of discourse.

Comments

On 18 June, 2007 at 2:29 pm, matthew wrote:

Fairness is stunningly hard to deal with. There’s a guy doing a PhD behind me (literally, desk behind mine) who has spent two years trying to define fairness for Chorded languages.

Java’s threading model is very simple and deliberately does nothing clever behind the scenes. The documentation is pretty explicit these days: notifyAll() simply schedules all blocked threads (on the given obj) to be woken. It says nothing about the current thread’s quantum of CPU time being over or yielding. You can easily argue that to do anything along those lines is unfair.

It seems to be that you’re problem is that you’re trying to do synchonization without a consistent locking strategy. When waiting for event 2 you need to make sure you release any locks that could ever prevent event 2 from being signalled.

On 18 June, 2007 at 3:23 pm, Dale wrote:

As Matthew says your code is incorrect. Don’t blame the tool.

That being said, it is important to realize that multithreading/concurrent programming is harder than you (or most anyone else) thinks it is. The crux of the matter in my opinion is that the source code doesn’t accurately represent how the program gets executed. Since most programmer’s mental model of program execution is tied closely to the textual representation of the source code there is a mismatch that allows errors to creep in.

So as concurrent programming becomes more common, we need a better representation scheme for programs that supports better mental models in programmers.

And there I go, blaming the tool ;-)

Dale

On 18 June, 2007 at 3:31 pm, matthew wrote:

Dale, I utterly agree. This is one of the worst aspects of object oriented programming: whilst the relationship between data items and types is clear, the interaction between threads is utterly unrepresented in the code.

Message passing gets us quite a lot closer, but typing systems for message passing are currently pretty underdeveloped. The issue is almost syntactic: how can you represent thread interaction clearly in a text file? The Actor paradigm ”feels” easier, but really all it’s doing is the same old “we know how to recognise the starts and ends of code blocks” and attaching to that simple and effective thread semantics. Thus it’s achiving the solution transitively, almost by side-effect, rather than directly.

On 18 June, 2007 at 6:03 pm, tonyg wrote:

The problem is easy to spot in the distillation of the problem I’ve written above. It was much, much harder to spot in the real code. The code was incorrect, but it wasn’t immediately apparent that it was incorrect. We’ve since changed the locking model we’re using.

The promise of STM (and related inventions) is that the system becomes more transparent and easy to reason about. You get to reason about each transaction, as written, in isolation.

On 18 June, 2007 at 7:24 pm, tonyg wrote:

@matthew: How is it anything to do with object-oriented programming? It’s more to do with mutable shared state plus simple locks, I would have thought. C, C++, C#, Java, Scheme, Lisp, machine code etc etc all suffer from the same extremely-low-level approach to concurrency.

On 18 June, 2007 at 7:29 pm, David MacIver wrote:

It’s actually relatively easy to build shared nothing message passing concurrency on top of Java’s existing threading model. For a full blown library / framework / wossname there’s JCSP (http://www.cs.kent.ac.uk/projects/ofa/jcsp/). Even without that it sometimes makes sense to model things as threads which simply wait on some sort of in memory or JMS queue.

Of course, that’s just the ‘message passing’ part. Shared nothing requires a bit more discipline to enforce.

The major problem with this is a performance one. Java threads just weigh too much for this to be a viable model for non toy applications, unless you start messing around with thread pools.

On 18 June, 2007 at 9:52 pm, matthew wrote:

@Tony, you’re right, my comments apply to more than just object oriented programming languages, but they do particularly apply for the following reasons:

  1. (this is pretty weak) object oriented languages focus hard on the relations between values: most of the major language structure is designed around expressing these relationships. Now that’s all well and good, but it does distract from the relationships between threads: i.e. thread interactions are obscured because of the amount of “noise” going on elsewhere in the language. Ok, this is a pretty tenuous argument, but it’s certainly the case that the less code you have, the easier it is to understand what’s going on.

  2. Object Orientation is designed around shared mutable state. Yes, you could have an OO language that has no shared mutable state, but it’d feel pretty weird. As others have said, the shared mutable state introduces lots of reasons for needing locking and atomic sections etc etc.

On 19 June, 2007 at 1:48 am, tonyg wrote:

I’m not so sure about the weirdness of a pure-functional OO approach. In fact, I think it might feel pretty natural - a bit like ML, a bit like Smalltalk, a bit like Haskell. But that’s neither here nor there, I suppose :-)