Erlang processes vs. Java threads
This page is a mirrored copy of an article originally posted on the (now sadly defunct) LShift blog; see the archive index here.
Sun, 10 September 2006
Earlier today I ran a simple test of Erlang’s process creation and teardown code, resulting in a rough figure of 350,000 process creations and teardowns per second. Attempting a similar workload in Java gives a figure of around 11,000 thread creations and teardowns per second - to my mind, a clear demonstration of one of the main advantages of Erlang’s extremely lightweight processes.
Here’s the Java code I used - see the earlier post for the Erlang code, to compare:
// Java 5 - uses a BlockingQueue.
import java.util.concurrent.*;
public class SpawnTest extends Thread {
public static void main(String[] args) {
int M = Integer.parseInt(args.length > 0 ? args[0] : “1″);
int N = Integer.parseInt(args.length > 1 ? args[1] : “1000000″);
int NpM = N / M;
BlockingQueue queue = new LinkedBlockingQueue();
long startTime = System.currentTimeMillis();
for (int i = 0; i < M; i++) { new Body(queue, NpM).start(); }
for (int i = 0; i < M; i++) { try { queue.take(); } catch (InterruptedException ie) {} }
long stopTime = System.currentTimeMillis();
System.out.println((NpM * M) / ((stopTime - startTime) / 1000.0));
}
public static class Body extends Thread {
BlockingQueue queue;
int count;
public Body(BlockingQueue queue, int count) {
this.queue = queue;
this.count = count;
}
public void run() {
if (count == 0) {
try { queue.put(this); } catch (InterruptedException ie) {}
} else {
new Body(queue, count - 1).start();
}
}
}
}
Comments
On 10 September, 2006 at 10:17 pm,
wrote:On 11 September, 2006 at 11:20 pm,
wrote:@badger: Do you mean for testing IPC throughput? That’s a little out-of-scope here, I think; or does JMS have something to say about straight thread-creation time that I’m not seeing?
Note that the use of BlockingQueue is a convenience for performing the equivalent of Thread.join in the linear chain of created threads I’m testing - BlockingQueue is not a central part of the test.
On 12 September, 2006 at 6:53 pm,
wrote:Concurrency benchmarks are perhaps the least successful part of The Computer Language Shootout - too many different ways to implement concurrency for a sensible comparison.
Nonetheless we have chameneos and cheap-concurrency.
On 12 September, 2006 at 8:32 pm,
wrote:I was thinking of some sort of Java message passing benchmark. Is there a more lightweight means of doing this than JMS?
On 12 September, 2006 at 10:43 pm,
wrote:Isaac Gouy wrote:
Concurrency benchmarks are perhaps the least successful part of The Computer Language Shootout - too many different ways to implement concurrency for a sensible comparison.
Case in point: the Erlang code for cheap-concurrency is cheating. Unlike all the other implementations I have looked at it waits for each number to ripple through the entire pipeline of processes before injecting the next number. That results in zero parallelism (at any one time only one process can actually make process), zero queuing and zero contention.
On 13 September, 2006 at 4:16 am,
wrote:I don’t think you can compare these things. An erlang process does not start an actual thread in the operating system like java does. I’d almost be vaguely interested in seeing a comparison between erlang processes and protothreads and windows fibers.
On 13 September, 2006 at 11:40 am,
wrote:@Ron: The difference between the systems is precisely the interesing point. Java has no analogue to Erlang’s lightweight processes. The best you can do is either threads or hand-rolled explicit continuation passing style. It comes down, IMO, to the differences in philosophy: the Erlang virtual machine is designed around concurrency; the Java VM has concurrency crudely bolted on.
@Badger: I have done a few quick experiments with the new Java 5 java.util.concurrent.BlockingQueue implementation - I’ll post on those soon. (Briefly: Erlang wins by a factor of about three (for what I called “oneway” the other day) on a single-CPU machine, but for some reason loses badly on a multicore machine)
On 13 September, 2006 at 4:49 pm,
wrote:Does Erlang use cooperative, single kernel-process “green” threads? That would explain why it loses on multicore. Java threads can be scheduled per CPU.
On 13 September, 2006 at 6:04 pm,
wrote:@Julian: Yes, it does (roughly), but that’s not the reason it’s losing - I’ll explain further when I post about the comparisons I ran in more detail.
On 13 September, 2006 at 8:23 pm,
wrote:Java has no analogue to Erlang’s lightweight processes.
That’s not exactly true. Java threads do not have to be implemented as kernel threads. In fact, a big complaint with Java early on was that many JVMs didn’t use kernel threads & thus couldn’t take full advantage of multiple processors. I got these exactly this sort of results when comparing various JVMs years ago.
The real drawback with Java here is that it forces JVMs to choose one thread model for everything rather than being able to expose multiple multitasking models to the programmer.
On 13 September, 2006 at 9:37 pm,
wrote:matthias wrote … the Erlang code for cheap-concurrency is cheating
Argue that in the discussion forum, let’s be considerate and not hijack this blog.
On 13 September, 2006 at 10:08 pm,
wrote:@tony: Cool. I’ll read up on java.util.concurrent.BlockingQueue in the meantime. You’re right about concurrency in Java, java.util.concurrent is truly a horror! ;)
@julian: The most recent version of Erlang is SMP aware.
On 10 October, 2006 at 6:27 am,
wrote:blink blink… blink
Sun Workstation, Opteron 150, 1gig ram, Redhat
Java:27055
Erlang. 1.03332e+6
1 million something?
Goodness…. And you’d figure that a Sun workstation would be where java would somehow be champ.
On 13 December, 2006 at 4:35 pm,
wrote:Come on, no Java developer writes code like that.
import java.util.concurrent.*;
public class Test25 extends Thread {
public static Executor executor;
public static void main(String[] args) throws InterruptedException {
int M = Integer.parseInt(args.length > 0 ? args[0] : “1″);
int N = Integer.parseInt(args.length > 1 ? args[1] : “1000000″);
executor = Executors.newFixedThreadPool(M);
int NpM = N / M;
BlockingQueue queue = new ArrayBlockingQueue(1000 * 1000);
long startTime = System.currentTimeMillis();
for (int i = 0; i < M; i++) executor.execute(new Body(queue, NpM));
for (int i = 0; i < N; i++) queue.take();
long stopTime = System.currentTimeMillis();
System.out.println((NpM * M) / ((stopTime - startTime) / 1000.0));
}
public static class Body implements Runnable {
BlockingQueue queue;
int count;
public Body(BlockingQueue queue, int count) {
this.queue = queue; this.count = count;
}
public void run() {
try {
for (int i = 0; i < count; i++) queue.put(this);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
}
}
Makes 1.5mln/sec on 3ghz P4 (windows xp)
It uses a thread pool and a loop instead of recursion.
On 13 December, 2006 at 7:46 pm,
wrote:Denis, you’ve missed the point: I’m measuring how many thread creations and teardowns per second the system tops out at, not how many messages per second the system can send with a fixed pool of workers. Note the indirectness of the recursion in the run() method - it spawns a whole new thread for each step of the recursion.
On 14 December, 2006 at 11:10 am,
wrote:Tonyg, yeah, I do understand that. But is it required (by some good reason) to create new threads each time?
I mean that in Java in order to achieve high performance developers do use pool of workers which actually does the same work as if there will be a lot of threads spawned.
I believe that there is no practical task which could not be solved with pool of workers, if the worker is blocked by network IO we can always use the non-blocking IO.
So, I mean that actually Java can do better then Erlang on concurrent programming if it’s used properly, keeping in mind it’s limitations and advantages.
On 14 December, 2006 at 1:49 pm,
wrote:Denis, see my comment number 7 above. Beware of the Turing tarpit! If you need to support, say, 100,000 genuinely concurrent activities, you have two options: using the threads provided by the language, or modelling threads using a hand-rolled CPS transform. Java’s native threads don’t scale for one reason (heavyweight JVM implementation), as we see above for one axis of measurement, and pseudothreading-by-CPS doesn’t scale for another (human fallibility, poor compositionality). The test I wrote was a simple way of gauging how an Erlangish style of programming might perform in Java. The point I’m interested in is that Erlang’s lightweight threads make programming highly concurrent systems natural, in a way that Java’s threads never will (modulo possible-but-unlikely wholesale revisions of the language).
On 14 December, 2006 at 2:29 pm,
wrote:Tonyg, I got your point, but most of examples where we need to run 100k of concurrent activities are invented around slow IO. The most frequent case is the web server with massive number of keepalive concurrent users.
Such tasks are solved in Java by using the non blocking IO and a single thread which does actual processing. With modern framework such progamming style is either completely transparent to program or very simple to use. See AsyncWeb & the Grizzly/Glassfish.
Also, I think that the thread pool is extremally helpful & provides simplicity identical to thread spawning in 90% of cases.
Actually Java used to had a number of lightweighted thread implementations like famous green threads or JRockit thinthreads. All of them was invented to cope slow Linux threading before kernel 2.6 & NPTL and dropped immediately after kernel 2.6 was adopted. Simply because there are no real business cases for Java there such kind of lightweight processes are useful.
Could you please give me the real world sample of task which is not a subject to solve via simple thread pool and is not IO bound?
Also, why Erlang has slow message passing in SMP configuration? The reason is the lack of good scheduler which can’t place dependant processes in the single OS thread?
On 28 March, 2008 at 8:55 pm,
wrote:@Denis, think simulation. Thousands of active agents interacting with each other.
On 3 April, 2008 at 2:43 pm,
wrote:Denis is correct, Tony none of your arguments hold water.
Your basically arguing green threads vs native threads not Java vs Erlang, both of which I believe come in various threading flavours.
The thread limits, memory etc is an OS thing , not Java vary your JVM you’ll get different results e.g. an early JVM will most likely green thread and with bad coding like yours possibly beat the new JVM on a single core .
Yes a badly written Java app, particularly on a very old JVM (not too old mind as the very old ones tend to be green threaded and that would give Java a chance) on a single core machine can be beaten by a green thread App even Visual Basic .
… but surprise if you use modern Java or allow threading pooling and/or combine with multicore and your Erlang looks like its in big trouble.
“Thousands of active agents interacting with each other” so what, no problem, don’t create an OS thread per agent, Erlang isn’t !! it just makes a representation of a thread, Java can do that.
PS Note in no way am I commenting on the elegance of the languages purely on the benchmarks.
On 3 July, 2008 at 5:44 pm,
wrote:((I accidentally deleted a bunch of comments I didn’t mean to delete today, so I’m having to repost them manually:))
VB wrote:
To ChrisH:
“Thousands of active agents interacting with each other” so what, no problem, don’t create an OS thread per agent, Erlang isn’t !!”
How would you model the agent then? With a state machine?
This is going to be a hell.
By the way, Erlang is not about green threads but about share-nothing, side-effectlessness, concurrency and composability. It’s just better suitable for concurrency than Java, that’s it.
On 8 January, 2009 at 3:04 am,
wrote:“that’s it” means you can not prove it?
On 6 March, 2010 at 11:14 pm,
wrote:Bra Dude,
You are seriously missing the point. Erlang’s “processes” aren’t real processes. The runtime treats them like tasks and has to schedule them on real OS threads.
You can simulate millions of agents quite naturally in a singe thread. Take a look at n-body simulations etc. State is persisted in a variety of arrays and the effect of dynamics occurring in parallel in simulated by applying the rules of motion to each element. This can easily be parallelized by dividing the state updates amongst available processors and waiting for a barrier to be reached before advancing.
How about a benchmark of JMS (or similar)?