Wednesday, June 11, 2008

Erlang R12B-3 Released

There's a new release of Erlang/OTP out today, which you can download from the usual location. I've also imported it into the erlang-otp repository at GitHub.

According to the readme, this release contains an "experimental" regular expression module called re. The module wraps a lower-level PCRE library and is "many times faster than the pure Erlang implementation". It also looks like it works equally well with both binary- and list-based strings. So I guess the race is one to see who can get their WideFinder implementation up and running first...

Sunday, June 8, 2008

Updated Erlang/OTP Repository at GitHub

I ended up deleting and recreating the erlang-otp repository at github today, in case anyone's having problems accessing it. My original import script had a bug that prevented files from being deleted correctly between versions, which meant the files in the repository didn't exactly match the files in the source tarballs. It should be fixed now, but you were using the old repo you'll probably need to do a clean git clone to get things working again.

Sunday, June 1, 2008

Message Ordering in Erlang

I've noticed that when people first learn Erlang (myself included), they tend to assume that messages sent between processes must be handled in the same order they're received. Turns out this isn't the case - you can pluck a message out from anywhere in a process's inbox, assuming you know a pattern that uniquely matches it. This is called a "selective receive".

For example, let's say we fire off a couple of messages to a process registered as myserver:

myserver ! {name, "World"},
myserver ! {greeting, "Hello"}
Even though the we're sending the greeting message after the name message, it's possible for the receiver to process the greeting first, by matching on the greeting atom in the first position of the tuple:
Greeting = receive {greeting,G} -> G end,
Name = receive {name,N} -> N end
Here the first line removes the second message (the greeting) from the inbox, and the second line removes the first message (the name).

This may not seem like a big deal, but there are a lot of cases where this behavior comes in extremely handy. Below are couple of examples: synchronous messaging, and parallel processing of lists:

Synchronous Messaging

Layering synchronous messaging (i.e. your basic remote request/response) on top of the asynchronous stuff built into Erlang seems like it should be easy - you just send a message to another process, and wait until you get a message back. And a lot of the time that's all there is to it.

However, it's possible that your client process has lots of messages coming in from different places, in which case there's no guarantee that the first message you receive after sending a request will be the actual response to your request. You'll have to do a selective receive to filter out the irrelevant messages.

An easy way to do this is to include a unique token with each request, and have the server include it in its response. That way the client can just pattern match on the token.

For example, say we have a time server that sends the current time back to any process that requests it (plus the token):
time_server() ->
receive
{get_time, Pid, Token} ->
Pid ! {time, time(), Token}
end,
time_server().
We can then get the time from the server using the following two functions. The first is responsible for sending the get_time message to the time server. It uses make_ref the generate the token:
request_time(TimeServer) -> 
Token = make_ref(),
TimeServer ! {get_time, self(), Token},
Token.
The second is responsible for taking that token and using it to selectively retrieve the response from the inbox:
receive_time(Token) ->
receive {time, Time, Token} -> Time end.
That's pretty much it. We just need to put the two together:
get_time(TimeServer) ->
Token = request_time(TimeServer),
receive_time(Token).
We now have a function that will send a request to another process and block until it receives a response, without affecting any of the other messages in the inbox.

Parallel Processing Of Lists

Now let's say we want to make calls to a bunch of different servers. And we'd like those servers to execute the calls in parallel because, well, that's just how we do things in Erlang.

So continuing our previous example, assume we have a list of time_server processes. We want to send our get_time message to all of them, and we want to block until we receive all of the responses. We'd also like the responses to be returned as a list whose order corresponds with the original list, so that we know which time value goes with which time_server. (We can't just use the order the responses are received, since that's basically random.)

A very concise way to implement all this in Erlang is to use two list comprehensions: one to send the requests out and generate the list of tokens, and another to take the list of tokens and selectively receive the responses. For example:
get_times(TimeServers) ->
Tokens = [ request_time(TimeServer) || TimeServer <- TimeServers],
[ receive_time(Token) || Token <- Tokens ].
That's all it takes to execute those calls in parallel, and to put the results in the right order. Not bad for two lines of code.

And More...

We could take this example a lot further: maybe we want to ignore time servers that don't respond within a certain amount of time; or automatically restart them when they crash; or process the responses as they come in instead of collecting them in a list; and so on.

All these scenarios are easy to implement using selective receives (and other language features like linked processes). The interprocess communication built into Erlang seems simple, but it's been carefully designed to give you a lot of flexibility when designing your system.