Friday, November 21, 2008

New Job!

I've just finished my second week here at DRW Trading Group. So far I've been extremely impressed by the quality of the people, the number of interesting projects going on, and the overall energy of the place. It's also been fun catching up with the various ThoughtWorkers roaming the halls here.

Windows XP was installed on my machine when I first arrived, but I'm happy to say that's been remedied and I'm now up and running with Ubuntu 8.10. Combined with a dual quad-core machine and a 30" monitor (plus a 19" side monitor) it's definitely the nicest workstation I've ever had. Much appreciated.

As for what I'm actually working on here at DRW, all I can say is that I'm in the "algorithmic trading group", of which the first rule is "don't talk about the algorithmic trading group" (intellectual property and all). So we'll leave it at that for now...

Monday, September 8, 2008

Erlang R12B-4 Released

The erlang-otp repository at GitHub has been updated.

Some folks were having trouble building the previous releases, due to a well-known "feature" in git that causes it to ignore empty directories. I've modified my import script to add empty.gitignore files where necessary, so that those empty directories come out the other end. The build scripts should now run out of the box.

Wednesday, September 3, 2008

LLVM Buzz

I've been playing around with the Low Level Virtual Machine toolkit a bit recently, which is a whole suite of tools for implementing compilers, virtual machines, and other goodies only a C.S. major could love. I'd always been vaguely aware of the project, but a couple of recent developments made me think it's time to take a serious look at it:

FlaCC

The first was the presentation that Scott Peterson of Adobe gave at the 2008 LLVM Developer's Meeting a few weeks ago (slides and video here). Scott's managed to get LLVM bytecode running on top the the Flash VM, which has all sorts of interesting implications (although it seems a bit backwards - shouldn't the high-level virtual machine i.e. Flash run on top of the low-level virtual machine - not the other way around?). In particular, it means that you could potentially run arbitrary C programs in the browser, since a C-to-LLVM bytecode compiler already exists.

Most of the major programming languages out there have interpreters written in C, so it follows that you could then run Ruby or Python or whatever in the modified Flash runtime without a whole lot of work. The browser would become even more like an OS, hosting arbitrary applications and giving them low-level APIs for accessing the DOM and other resources, instead of imposing particular languages or object models. Huzzah.

Unfortunately, it doesn't look like FlaCC will be supported anytime time, which means we'll have to limp along with Javascript and Actionscript for the time being. And to be honest I can't see a reason why Adobe would want to support this - if I could run vanilla Ruby in the browser, why would I pony up the big bucks for all those ActionScript development tools?

Snow Leopard

The second interesting development is Apple's use of LLVM in the next release of their operating system, code named Snow Leopard. It appears that both Grand Central (which distributes processing across cores) and OpenCL (which distributes processing between the CPU and the GPU) will make use of LLVM. In fact, Apple seems to be throwing so much weight behind the LLVM project, that it makes be wonder if they're building their own LLVM-based browser plugin. There does seem to be a rather gaping hole in this chart:





CompanyPreferred LanguageVirtual Machine
MicrosoftC#, Visual Basic.NET CLR
AdobeActionScriptFlash
SunJavaJava Virtual Machine
AppleObjective-C???


Since Apple refuses to support Flash on the iPhone, and they haven't been shy about using iTunes as a way to get users to install other Apple software, my guess is that they will release an LLVM-based browser plugin at some point. Just a hunch.

Anyway, to get the hang of things I'm working on a LLVM backend for a Lisp-like language. More to come...

Friday, August 1, 2008

Erlang, Multiple Assignment, and Closures

There's been some discussion on the tubes recently about the value of Erlang's single-assignment semantics, with people weighing in on both sides.

However, no one seems to be asking the key question, namely what happens when you mix closures (which already exist in Erlang) with multiple-assignment variables? That's probably because the answer is a bit messy: you get an object-oriented language.

To see why this is the case, let's go ahead and define a Person class in this (hypothetical) Erlang variant. The Person class has a single Name field and corresponding getter and setter methods:

-record(person,{get_name,set_name}).

new_person(Name) ->
#person{
get_name=fun() ->
Name
end,
set_name=fun(NewName) ->
Name = NewName % Multiple assignment
end
}.
That's all there is to it. The arguments to the function (i.e. Name) act as the fields of the object, and the anonymous functions assigned to the record (i.e. get_name and set_name) act as the methods. This means we can now create Person objects, pass them around, and modify them whenever we want:
Person = new_person("Sean Combs"),
print(Person#person.get_name()),
Person#person.set_name("Puff Daddy"),
print(Person#person.get_name()).
etc...

Why does this matter? Because objects have identity, and identity's a real bitch when it comes to distributed systems. In particular, a lot of questions pop up when you try to send an object over the wire to another process. Do you:

  • Copy the object, in which case any changes made to it on the remote process won't be visible locally

  • Send a reference to the object, such that any time the object is referred to in the remote process, a call is made back to the original process (at great cost to performance)

  • Let the developer indicate on an object-by-object basis which of the previous two options they would prefer

  • Punt, and simply forbid access to mutable variables from within closures.
    This is pretty limiting, to the point where it makes it impossible to even write a simple foreach function. For example, the following code, which adds the numbers [1,2,3] together, wouldn't be allowed (since Sum is mutable and accessed from a closure):
    Sum = 0, 
    foreach(fun(N)-> Sum += N end, [1,2,3]),
    print(Sum)

    Assuming we're OK with this, we still have to deal with the fact that we now have two types of variables in the language: mutable and immutable. Therefore we must either:

    • Prefix immutable variables with "final" to indicate they can be used within a closure

    • Prefix mutable variables with "var" to indicate they cannot be used within a closure

    • Rely on the compiler to infer which variables are mutable and which are used within closures (possibly producing some very cryptic error messages)

    • Take a "snapshot" of all variables in scope when a closure is created, and ignore any subsequent changes to the variable. For example, the following code would print "foo":

      Name = "foo",
      PrintName = fun() -> print(Name) end,
      Name = "bar",
      PrintName().

      But this would print "bar":

      Name = "foo",
      Name = "bar",
      PrintName = fun() -> print(Name) end,
      PrintName().

      Confusing, to say the least.

Personally, I don't find any of these options very appealing. But if you really want multiple-assignment variables in Erlang, you're going to have to pick one of them.

Saturday, July 19, 2008

RESTful Protocol Buffers

I'm considering adding support for Protocol Buffers to the RESTful application that I'm currently working on, and was wondering what people thought.

I'm aware of all the arguments against using binary formats in REST APIs - but we deal with some pretty big sets of data on our project, and the speed and compactness of Protocol Buffers compared to XML/JSON is just too tempting. So here's how I'm thinking we might implement things:


  • XML will remain the default representation for all resources.

  • If the client includes the application/x-protobuf MIME type in their Accept header, the server will return a Protocol Buffer instead (and set the Content-Type to application/x-protobuf).

  • When a Protocol Buffer is returned, the HTTP response will also include an X-Protobuf-Schema header containing the URI for the .proto schema file.

Is anyone else doing something similar? It would probably make sense to coordinate on MIME types, etc...

Wednesday, June 11, 2008

Erlang R12B-3 Released

There's a new release of Erlang/OTP out today, which you can download from the usual location. I've also imported it into the erlang-otp repository at GitHub.

According to the readme, this release contains an "experimental" regular expression module called re. The module wraps a lower-level PCRE library and is "many times faster than the pure Erlang implementation". It also looks like it works equally well with both binary- and list-based strings. So I guess the race is one to see who can get their WideFinder implementation up and running first...

Sunday, June 8, 2008

Updated Erlang/OTP Repository at GitHub

I ended up deleting and recreating the erlang-otp repository at github today, in case anyone's having problems accessing it. My original import script had a bug that prevented files from being deleted correctly between versions, which meant the files in the repository didn't exactly match the files in the source tarballs. It should be fixed now, but you were using the old repo you'll probably need to do a clean git clone to get things working again.