Many common programming languages today eschew manual memory management in preference to garbage collection. While the former certainly has its place in certain use cases, I believe the majority of application development today occurs in garbage collected languages, and for good reason. Garbage collection moves responsibility for many memory issues from the developer into the language’s runtime, mostly removing the possibility of entire classes of bugs while reducing cognitive overhead. In other words, it separates out a concern. That’s not to say that garbage collection is perfect or appropriate in all cases, but in many common cases it greatly simplifies code.
Languages like Haskell, Erlang, and Go provide a similar separation-of-concern in the form of green threads. Instead of requiring the developer to manually deal with asynchronous (or non-blocking) I/O calls, the runtime system takes responsibility for this. Like garbage collection, it’s not appropriate for all use cases, but for many common cases it’s a great fit and reduces code complexity. This post will discuss what green threads are, the problems they solve, some cases where they aren’t the best fit, and how to get started with green thread based concurrency easily.
If you want to jump to that last point, you can download the Stack build tool and start with the async library tutorial.
Suppose you’re writing a web server. A naive approach would be to write something like the following psuedo-code:
function webServer(port) {
var listenSocket = bindSocket(port);
while(true) {
var socket = accept(listenSocket);
forkThread(handleConnection(socket));
}
}
function handleConnection(socket) {
while(true) {
var bytes = read(socket);
var request = parseRequest(bytes);
var response = getResponse(request);
write(socket, renderResponse(response));
}
}
The read
and write
calls appear to
perform blocking I/O, which means that the entire system
thread running them will be blocked on the kernel until data is
available. Depending on what our forkThread
call did,
this could mean one of two things:
forkThread
forks a new system thread, then
performing blocking I/O isn’t a problem: that thread has nothing to
do until read
and write
complete.
However, forking a new system thread for each connection is
expensive, and does not scale well to hundreds of thousands of
concurrent requests.forkThread
forks a new thread within your
runtime (sometimes called a fiber),
then multiple fibers will all be running on a single system thread,
and each time you make a blocking I/O call, the entire thread will
be blocked, preventing any progress on other connections. You’ve
essentially reduced your application to handling one connection at
a time.Neither of these approaches is very attractive for writing
robust concurrent applications (though the former is certainly
better than the latter). Another approach is to use non-blocking
I/O. In this case, instead of making a call to read
or
write
which blocks until data is available, you make a
call and provide a callback function or continuation
to handle what to do with the result data. Let’s see what our web
server above will look like:
function webServer(port) {
var listenSocket = bindSocket(port);
listenLoop(listenSocket);
}
function listenLoop(listenSocket) {
acceptAsync(listenSocket, function(socket) {
handleConnection(socket);
listenLoop(listenSocket);
});
}
function handleConnection(socket) {
readAsync(socket, function(bytes) {
var request = parseRequest(bytes);
var response = getResponse(request);
writeAsync(socket, renderResponse(response), function() {
handleConnection(socket);
});
});
}
Let’s note some differences:
read
in a
variable bytes
, we provide readAsync
a
callback function, and that callback function is provided the
bytes
value when available. We sometimes call these
callbacks continuations, since they tell us how to continue
processing from where you left off.This approach solves the problems listed above with blocking I/O: no performance overhead of spawning threads or fibers, and multiple requests can be processed concurrently without being blocked by each other’s I/O calls.
Unfortunately, this new style does not get away scot-free.
parseRequest
or renderResponse
functions
perform any blocking I/O, or use a significant amount of CPU time,
other requests will need to wait until that processing finishes
before they can resume their processing.For those interested, we had a previous blog post on Concurrency and Node which delved more deeply into these points.
Let’s deliver on our promise from the introduction: turning non-blocking I/O into a runtime system concern. The theory behind this is:
That may sound like a lot to deliver on, but green threads are up to the challenge. They are very similar to the fibers that we described above, with one major difference: seemingly blocking I/O calls actually use non-blocking I/O under the surface. Let’s see how this would work with our web server example from above (copied here for convenience):
function webServer(port) {
var listenSocket = bindSocket(port);
while(true) {
var socket = accept(listenSocket);
forkThread(handleConnection(socket));
}
}
function handleConnection(socket) {
while(true) {
var bytes = read(socket);
var request = parseRequest(bytes);
var response = getResponse(request);
write(socket, renderResponse(response));
}
}
Starting from after the bindSocket
call:
accept
. The runtime
system essentially rewrites this to an acceptAsync
call like our callback version, puts the main green thread to
sleep, and has the runtime system’s event loop schedule a wakeup
when new data is available on the listenSocket
.socket
value filled
in. Our thread then forks a new green thread (let’s call it worker
1) running the handleConnection
call.read
call is similarly
rewritten to readAsync
with a callback, and the
runtime system puts worker 1 to sleep and schedules a wakeup when
data is available on socket
.forkThread
call, and iterates on
the while
loop. It arrives back at the
accept
call and, like before, the runtime system puts
the main thread to sleep and schedules a wakeup when there’s a
connection available on listenSocket
.listenSocket
or
socket
have some data available. When the operating
system tells us (via a system call like epoll
or
select
) that data is available, the runtime system can
wake the relevant thread up, and do some more work until the next
I/O call that puts it to sleep.read
call, the runtime system can throw an
exception from that point in the code.I’d argue that this green thread approach – for the most part – gives us the best of both worlds: the high level, easy to read and write, robust code that comes from using threads, with the high performance of the non-blocking callback code.
Like garbage collection, there are downsides to green threads as well. While not a comprehensive list, here are some such downsides I’m aware of:
As you can see, like garbage collection, the main downside is that for specific performance cases, green threads may be an impediment. But also like garbage collection, there is a wide array of cases where the gains in code simplicity and lower bug rates more than pay for the slight performance overhead.
Let’s go ahead and get started with a short example right now.
The only tools you’ll need are the Stack build tool and
a text editor. If you’re on a Mac or Linux system, you can get
Stack by running curl -sSL https://get.haskellstack.org/ |
sh
. On Windows, you probably want the 64-bit
installer.
Once you have Stack installed, save the following code into a
file called echo.hs
:
#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc --package conduit-extra {-# LANGUAGE OverloadedStrings #-} import Data.Conduit import Data.Conduit.Network import qualified Data.ByteString.Char8 as B8 main = do putStrLn "OK, I'm running!" -- This automatically binds a new listening socket and forks a new -- green thread for each incoming connection. runTCPServer settings app where -- Listen on all interfaces on port 4500 settings = serverSettings 4500 "*" -- Create a simple pipeline connecting the input from the network -- (appSource) to our echo program to the output to the network -- (appSink). app appData = runConduit (appSource appData .| echo .| appSink appData) -- awaitForever keeps running the inner function as long as new data -- is available from the client echo = awaitForever (bytes -> do -- Echo back the raw message we received yield bytes -- And now send back the Fibonacci value at the length of the -- input. We need to use B8.pack to convert our String into a -- binary format we can send over the network. yield (B8.pack (show (fib (B8.length bytes)) ++ "n"))) -- Written badly for high CPU usage! fib 0 = 1 fib 1 = 1 fib n = fib (n - 1) + fib (n - 2)
Now you can run this program with stack echo.hs
.
The first run will take a bit of time as it downloads a compiler
and a number of libraries. This will only happen on the first run;
subsequent runs will reuse the previously downloaded and installed
tools and libraries. Once you’ve got this running, you can connect
to it to play with:
$ telnet localhost 4500
Go ahead and play with it with some short lines, and confirm that it responds. For example, here’s a sample session:
$ telnet localhost 4500
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Hello world!
Hello world!
610
Bye!
Bye!
13
Now to prove our claims of the system remaining responsive in the presence of high CPU: try entering a long string, which will require a lot of CPU time to calculate the Fibonacci value, e.g.:
This is a fairly long string which will take a bit of time unless you have a supercomputer.
This is a fairly long string which will take a bit of time unless you have a supercomputer.
As you might expect, further interactions on this connection will have no effect as it is computing its response. But go ahead and open up a new telnet session in a different terminal. You should be able to continue interacting with the echo server and get results, thanks to the scheduling behavior of the runtime system. Notice how we get this behavior without any explicit work on our part to break up the expensive CPU operation into smaller bits!
EDIT As pointed out on lobste.rs, the above will not expand to multiple cores, since GHC’s interpreter mode will only use a single core. In order to see this take advantage of additional CPU cores for additional requests, first compile the executable with:
stack ghc --resolver lts-7.14 --install-ghc --package conduit-extra -- --make -threaded echo.hs
And then run it with:
./echo +RTS -N4
The -N4
tells the GHC runtime to use 4 cores.
There are great libraries in Haskell to take advantage of green threads for easy and beautiful concurrency. My all time favorite is the async package. Paired with powerful abstractions like Software Transactional Memory, you can quickly whip up high-performance, concurrent network services while avoiding common pitfalls like deadlocks and race conditions.
If you or your team are interested in learning more about how functional programming and Haskell, check out our training options and our Haskell syllabus. If you’d like to learn about how FP Complete can help you with your server applications needs, contact us about our consulting.
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.