April 13, 2019

A Frontend Programmer's Guide to Languages

Today, we're going to make a programming language.

No, it won't take a PhD, or years of your time (or even days for that matter). You and I, together, are going to learn what a programming language is and actually build one that can evaluate programs.

By the end of this post we'll have written a JavaScript function, evaluate, which interprets a small programming language with strings and variables. You won't be going off and rewriting your stack in it (though you're certainly welcome to try!), but I hope it's a fun exercise.

I encourage you to copy the code examples in your editor of choice and actually run them. You should be able to make it through this post in a single listen of Lights - Little Machines.

Hello, world!

To kick things off, we'll write an interpreter for a new HelloWorld language, and use it write a Hello, world! program.

Here's our goal:

const program = { type: "HelloWorld" }
console.log(evaluate(program))
// => Hello, world!

I'm sure you have questions. Real quick, let's introduce some terms that we'll more clearly define as we go.

Our "program" is represented by the JavaScript object { type: "HelloWorld" }. Throughout this post, we'll be adding more and more things to this object. For now, there's not much to it.
Our "evaluator" (or "interpreter" - it's your call) is a single JavaScript function that accepts a "program."
We receive a "value" from our "evaluator," which we send to the wonderful console.log function you're all familiar with.

Now let's build our evaluator.

function evaluate(node) {
  switch (node.type) {
    case "HelloWorld":
      return "Hello, world!"
    default:
      throw `evaluate -- unknown node type ${node.type}`
  }
}

This function is not terribly interesting, but I'd like to call out a few details.

Our function accepts a parameter that we call node as in, a node of a tree. (dramatic foreshadowing)
The crux of this function is a single switch statement that operates on the type field of our node. Our language is very simple, so we only have a single "node type" (namely, "HelloWorld").
For the "HelloWorld" expression - we return the string "Hello, world!"
If we see something we don't recognize - we throw an error. The programmer messed up!

At this point we have 8 lines of code that evaluate a simple HelloWorld language. I'll emphasize the simple here, there are no variables, no loops, no modules, not even numbers - but it's a language. Our language has a very small grammar, but it's a language.

Let's make things more interesting.

Strings

In this section we'd like to make a new HelloStrings language (marketing wants us to be able to print more than just "Hello, world!") that can produce strings and run two operations on them.

Here's our goal for this section:

console.log(
  evaluate({
    type: "String",
    content: "Apple",
  })
)
// => Apple

console.log(
  evaluate({
    type: "Excite",
    expression: {
      type: "String",
      content: "Banana",
    },
  })
)
// => Banana!

console.log(
  evaluate({
    type: "Append",
    first: {
      type: "String",
      content: "Apple",
    },
    second: {
      type: "Excite",
      expression: {
        type: "String",
        content: "Banana",
      },
    },
  })
)
// => AppleBanana!

Some initial observations:

We ditch our "HelloWorld" expression in favor of "String," which has a "content" field in addition to the "type" field we used in the previous section.
We introduce "Excite" which adds exclamation points to expressions.
We introduce "Append," which also contains expressions in its definition - two of 'em in fact!

Let's start off by creating an evaluator to operate on "String" expressions.

function evaluate(node) {
  switch (node.type) {
    case "String":
      return node.content
    default:
      throw `evaluate -- unknown node type ${node.type}`
  }
}

Great, so we replace "HelloWorld" with "String" as the node's type - and return its "content" value instead of the string "Hello, world!"

You just wrote a programming language with strings, let's celebrate!

But if we continue with our desired goal above, we see the following:

Apple
evaluate -- unknown node type Excite

We can evaluate { type: "String", content: "Apple" } no problem, but we don't know what to do with this "Excite" thing just yet.

So how might we evaluate "Excite" expressions? Based on how it produces "Banana!" in our desired output, we may be inclined to say that "Excite" takes a string and adds an exclamation point to it. Simple enough, right? But let's take a closer look.

{
  type: "Excite",
  expression: {
    type: "String",
    content: "Banana"
  }
}

The "expression" field of our Excite expression node isn't a string per se, but an expression in the HelloStrings language! Our expression contains an expression inside of it! (Are you beginning to see why our evaluator accepts a node as in "nodes of a tree"?)

To illustrate this further, allow me to propose the following, valid program for the HelloStrings language.

console.log(
  evaluate({
    type: "Excite",
    expression: {
      type: "Excite",
      expression: {
        type: "String",
        content: "Banana",
      },
    },
  })
)

In the above example, we'll evaluate the string "Banana" - excite it to get "Banana!" - than excite that to get "Banana!!" (how very exciting!)

Let's begin adding Excite expressions to our language.

function evaluate(node) {
  switch (node.type) {
    case "String":
      return node.content;
    case "Excite":
      // We have this node.expression thing, but what do
      // we do with it?
      ???
    default:
      throw `evaluate -- unknown node type ${node.type}`;
  }
}

Rather than just returning a "value" like we did for String expressions (return node.content), we'll need to use node.expression somehow. As the name suggests, node.expression is an expression, and what do we do with expressions?

We evaluate them!

case "Excite":
  return evaluate(node.expression) + "!";

evaluate is now a recursive function which operates on String and Excite expressions. Here's the function in all its glory and an invitation to implement Append yourself.

function evaluate(node) {
  switch (node.type) {
    case "String":
      return node.content
    case "Excite":
      return evaluate(node.expression) + "!"
    case "Append":
    // Now it's your turn, how might we evaluate
    // Append expressions?
    default:
      throw `evaluate -- unknown node type ${node.type}`
  }
}

Variables

At this point we have a small language which can produce strings with "Excite" and "Append" statements. Nice job if you've made it this far!

Let's use this momentum and expand our language to support a very popular language construct - variables.

By the end of this section, we'll have a language which can declare and retrieve the value of variables in HelloStrings expressions.

console.log(
  evaluate(
    {
      type: "Let",
      name: "x",
      value: {
        type: "String",
        content: "Hello, world",
      },
      expression: {
        type: "Excite",
        expression: {
          type: "Variable",
          name: "x",
        },
      },
    },
    {}
  )
)
// => Hello, world!

There's a lot there, so let's break it down.

evaluate now takes a second argument (dramatic foreshadowing)
We introduce a new "Variable" expression, which will lookup a variable by its name.
We introduce a "Let" expression, which makes a new variable with a "name" and a "value" (the value being an expression in our language!), and uses that when evaluating an inner expression.

In our case, we're setting "x" to the evaluation of a "Hello, world" String, then retrieving the value of "x" to Excite it.

Let's start by adding "Variable" expressions to evaluate.

function evaluate(node) {
  switch (node.type) {
    case "Variable":
      // We have `node.name`, but what to do with it?
    ...
  }
}

Consider the following code:

evaluate({ type: "Variable", name: "x" })

Here we want to evaluate the variable x, but there doesn't seem to be much to work with. In a typical programming language, what we might we do with a variable?

We look it up!

function evaluate(node) {
  switch (node.type) {
    case "Variable":
      return lookup(node.name);
    ...
  }
}

In order for lookup to do anything meaningful, we need to provide it with some sort of environment - a collection of variable names and values. We'll call this env.

function evaluate(node) {
  switch (node.type) {
    case "Variable":
      return lookup(env, node.name);
    ...
  }
}

And where does env come from? Rather unfortunately we can't make it out of thin air (believe me, I tried for many hours), but we can pass it in to our evaluator.

function evaluate(node, env) {
  switch (node.type) {
    case "Variable":
      return lookup(env, node.name)
    case "String":
      return node.content
    case "Excite":
      return evaluate(node.expression, env) + "!"
    case "Append":
    // Left as an exercise to the reader
    default:
      throw `evaluate -- unknown node type ${node.type}`
  }
}

Remember to add env to our recursive calls in Excite and Append (you did implement Append, right?)

Let's define lookup. For simplicity's sake, we'll say an environment is a JavaScript object with name keys and value values.

function lookup(env, name) {
  return env[name]
}

Simple, right?

console.log(evaluate({ type: "Variable", name: "x" }, { x: "Hello, world!" }))
// => Hello, world!

Congratulations you just added variables to your language!

Last but not least, we'll want an easy way to create new variables. Without this, we'll be evaluating Excite's and Append's with nothing more than a bunch of global variables - and the marketing department definitely doesn't want that.

Let's start by listing what a "Let" expression actually contains:

{
  type: "Let",
  name: "x",
  value: {
    type: "String",
    content: "Hello, world"
  },
  expression: {
    type: "Excite",
    expression: {
      type: "Variable",
      name: "x"
    }
  }
}

A "Let" type
A "name" field for naming our new variable
A "value" field for giving our new variable a value
An "expression" field to use our new variable in!

Excellent, now let's get started on implementing the thing.

function evaluate(node, env) {
  switch (node.type) {
    case "Variable":
      return lookup(env, node.name);
    case "Let":
      let newEnv = ???
      return evaluate(node.expression, newEnv);
    ...
  }
}

We'll be adding a new variable to our environment, and using that to evaluate our expression. For simplicity's sake, let's give ourselves an extendEnv function to do this.

function evaluate(node, env) {
  switch (node.type) {
    case "Variable":
      return lookup(env, node.name);
    case "Let":
      let name = ???
      let value = ???
      let newEnv = extendEnv(env, name, value)
      return evaluate(node.expression, newEnv);
    ...
  }
}

name is simple, that's just node.name. For value, however, we'll be given a HelloStrings expression to evaluate.

{
  type: "Let",
  name: "x",
  value: {
    type: "String",
    content: "Hello, world"
  },
  ...
}

Still, it's not much code :)

function evaluate(node, env) {
  switch (node.type) {
    case "Variable":
      return lookup(env, node.name);
    case "Let":
      let name = node.name;
      let value = evaluate(node.value, env);
      let newEnv = extendEnv(env, name, value);
      return evaluate(node.expression, newEnv);
    ...
  }
}

Finally, extendEnv.

Our env is a simple JavaScript object that maps names to values. We'll need to extend an env with a new name and value pair. We can do so with Object.assign:

function extendEnv(env, name, value) {
  let envAddition = {}
  envAddition[name] = value

  return Object.assign({}, env, envAddition)
}

Or, using some of the latest and greatest JavaScript features:

function extendEnv(env, name, value) {
  return {
    ...env,
    [name]: value,
  }
}

Putting it all together, the combination of lookup, extendEnv, and evaluate completes the language.

function lookup(env, name) {
  return env[name]
}

function extendEnv(env, name, value) {
  return {
    ...env,
    [name]: value,
  }
}

function evaluate(node, env) {
  switch (node.type) {
    case "String":
      return node.content
    case "Excite":
      return evaluate(node.expression, env) + "!"
    case "Append":
    // Left as an exercise to the reader
    case "Variable":
      return lookup(env, node.name)
    case "Let":
      let inner = node.expression
      let value = evaluate(node.value, env)
      let newEnv = extendEnv(env, node.name, value)
      return evaluate(node.expression, newEnv)
    default:
      throw `evaluate -- unknown node type ${node.type}`
  }
}

Closing notes and things to ponder

If you made it to the end of this post, congratulations and thank you :)

We developed a language and an evaluator that processes Abstract Syntax Trees (or ASTs for short, but I just called 'em nodes for simplicity). Our language has constants, some small expressions to operate on strings, and variables.

But this is only the beginning! You are ending this article with an evaluate function that can grow to support all sorts of common language features (numbers, comments, functions), the only limit is your imagination.

I'd like to leave you with a few concrete exercises, should you wish to revisit your new language on a rainy day:

How might we add Numbers to our language? How about operations to Add and Multiply them? What about rational numbers (fractions with a numerator and a denominator)?
As we saw earlier, we can kickstart our env by populating it with global values. Could we add functions to this environment? How might we call them from our language?
Can we take our abstract syntax tree and use it to generate JavaScript code? How about Ruby code?
Writing our programs in as JavaScript objects doesn't seem to scale well. How might we go about creating a syntax for our language and converting that to an AST?

I'll also go ahead and plug some of my favorite resources on Programming Languages.

Dan Grossman's Programming Languages on Coursera is an entirely free, entirely amazing three-part course for learning the ins and outs of different types of programming languages. I can't recommend it enough.
This article is inspired in no short part by @jamiebuilds's super tiny compiler. I encourage you to check out the code to see some of the concepts introduced in this article in more detail.
Crafting Interpreters is an effort to build "a handbook for making programming languages." An ambitious goal, but it's completely free and chock full of different interpreter concepts.

That's all for now. If you liked this article feel free to share it and follow me on Twitter while you're at it: @jdan.

Thanks again for reading,
Jordan