Friday, August 1, 2008

Erlang, Multiple Assignment, and Closures

There's been some discussion on the tubes recently about the value of Erlang's single-assignment semantics, with people weighing in on both sides.

However, no one seems to be asking the key question, namely what happens when you mix closures (which already exist in Erlang) with multiple-assignment variables? That's probably because the answer is a bit messy: you get an object-oriented language.

To see why this is the case, let's go ahead and define a Person class in this (hypothetical) Erlang variant. The Person class has a single Name field and corresponding getter and setter methods:

-record(person,{get_name,set_name}).

new_person(Name) ->
#person{
get_name=fun() ->
Name
end,
set_name=fun(NewName) ->
Name = NewName % Multiple assignment
end
}.
That's all there is to it. The arguments to the function (i.e. Name) act as the fields of the object, and the anonymous functions assigned to the record (i.e. get_name and set_name) act as the methods. This means we can now create Person objects, pass them around, and modify them whenever we want:
Person = new_person("Sean Combs"),
print(Person#person.get_name()),
Person#person.set_name("Puff Daddy"),
print(Person#person.get_name()).
etc...

Why does this matter? Because objects have identity, and identity's a real bitch when it comes to distributed systems. In particular, a lot of questions pop up when you try to send an object over the wire to another process. Do you:

  • Copy the object, in which case any changes made to it on the remote process won't be visible locally

  • Send a reference to the object, such that any time the object is referred to in the remote process, a call is made back to the original process (at great cost to performance)

  • Let the developer indicate on an object-by-object basis which of the previous two options they would prefer

  • Punt, and simply forbid access to mutable variables from within closures.
    This is pretty limiting, to the point where it makes it impossible to even write a simple foreach function. For example, the following code, which adds the numbers [1,2,3] together, wouldn't be allowed (since Sum is mutable and accessed from a closure):
    Sum = 0, 
    foreach(fun(N)-> Sum += N end, [1,2,3]),
    print(Sum)

    Assuming we're OK with this, we still have to deal with the fact that we now have two types of variables in the language: mutable and immutable. Therefore we must either:

    • Prefix immutable variables with "final" to indicate they can be used within a closure

    • Prefix mutable variables with "var" to indicate they cannot be used within a closure

    • Rely on the compiler to infer which variables are mutable and which are used within closures (possibly producing some very cryptic error messages)

    • Take a "snapshot" of all variables in scope when a closure is created, and ignore any subsequent changes to the variable. For example, the following code would print "foo":

      Name = "foo",
      PrintName = fun() -> print(Name) end,
      Name = "bar",
      PrintName().

      But this would print "bar":

      Name = "foo",
      Name = "bar",
      PrintName = fun() -> print(Name) end,
      PrintName().

      Confusing, to say the least.

Personally, I don't find any of these options very appealing. But if you really want multiple-assignment variables in Erlang, you're going to have to pick one of them.