Continuous Learning

At my advanced age, I’m still happy to learn new things.

In my new job at Fitbit, I’ve primarily been working on an eCommerce web application in Java, but in support of a data migration/business intelligence project, I also find myself suddenly working in Python, a language I’ve never previously used, pulling data out of Google protocol buffers.  So I’m learning two new things at once, and enjoying it.  I can’t share any of that code of course, but I might post some code in Python here eventually, and/or some code demonstrating the use of protocol buffers in Scala or Java.

Meanwhile, in one of my primary “extracurricular” activities, I’m also taking up a new musical instrument.  For a couple years now, I’ve been involved in a group called “The Sing”, which meets weekly and, as the title suggests, sings!   In that time, I’ve sung in a variety of genres, sometimes accompanying myself on piano (not very well…), and sometimes recruiting a friend to accompany me on guitar.  But as I’m doing mainly music that makes sense with a guitar, I’ve decided to learn to play it.

So about a month ago I bought a guitar, and have been picking it up gradually on my own, learning chords and a little finger-picking, developing callouses, and stumbling through Phil Ochs’ “Too Many Martyrs” and “A Toast to Those Who Are Gone”, Tom Paxton’s “The Last Thing on My Mind”, and the Grateful Dead’s “Ripple”.   I’ve also signed up for Introduction to Guitar on Coursera starting in mid-October.  Maybe in a couple months I’ll be ready to play in public.   Not *too* public, but in the weekly sing, among friends.  Not giving up my day job!

Pattern Matching in Scala

In my article on parser combinators, I developed a parser that parses simple integer sums and products into an internal syntax tree. In this article, I’ll make a simple evaluator that takes such a syntax tree as parsed and computes the value of the expression.

This is the sort of thing that in Java you might do using the Visitor Pattern, but Scala (like some other functional languages) has a powerful mechanism called “pattern matching” that makes this somewhat complex pattern unnecessary in most cases, and allows us to create quite concise code.

For reference, the syntax created in the last article was given by:

1
2
3
4
5
6
7
8
9
10
package com.donroby.parsing
 
trait ExpressionSyntax {
 
  sealed abstract class Expression
 
  case class IntegerLiteral(i: Int) extends Expression
  case class Sum(e1: Expression, e2: Expression) extends Expression
  case class Product(e1: Expression, e2: Expression) extends Expression
}

Starting as usual with a test, I write:

1
2
3
4
5
6
7
8
9
10
11
12
package com.donroby.parsing
 
import com.donroby.parsing.ExpressionSyntaxEvaluator._
import org.scalatest.FlatSpec
 
class ExpressionSyntaxEvaluatorSpec  extends FlatSpec {
 
  "The expression syntax evaluator "  should "evaluate a literal" in {
    assert(evaluate(IntegerLiteral(42)) == 42)
  }
 
}

And in order to get it to compile, I create a scala object that will do the evaluation but leave the `evaluate` method unimplemented so the test definitely fails on first run:

1
2
3
4
5
package com.donroby.parsing
 
object ExpressionSyntaxEvaluator extends ExpressionSyntax {
  def evaluate(e: Expression): Int = ???
}

Now as I’ve already decided I’m going to use pattern matching to do this, I’ll start implementing a simple “case-expression” with only the already specified literal handling and a default case:

1
2
3
4
5
6
7
8
package com.donroby.parsing
 
object ExpressionSyntaxEvaluator extends ExpressionSyntax {
  def evaluate(e: Expression): Int = e match {
    case IntegerLiteral(i) => i
    case _ => 0
  }
}

This makes the test pass of course. It works because Scala supplies this syntax for matching a pattern based on case classes, which I used when defining the syntax. (It also works for some other classes supplied with “extractor objects”.)

What this case expression does depends of course on what Expression you pass it. If it is an IntegerLiteral, it unwraps (“extracts”) the contained value and returns it. For any other expression it returns 0. We’ll of course be adding cases for our other syntax classes.

Another test, for evaluating a sum:

1
2
3
  it should "evaluate a sum of two literals" in {
    assert(evaluate(Sum(IntegerLiteral(2), IntegerLiteral(3))) == 5)
  }

This test can be made to pass by adding a more complex case:

1
2
3
4
5
6
7
object ExpressionSyntaxEvaluator extends ExpressionSyntax {
  def evaluate(e: Expression): Int = e match {
    case IntegerLiteral(i) => i
    case Sum(IntegerLiteral(i), IntegerLiteral(j)) => i + j
    case _ => 0
  }
}

This is a bit silly, but it does reveal how complex the cases in such an expression are allowed to be. It will change soon.

I’ll make another test that forces us to handle complex sums instead of just sums of two integers:

1
2
3
4
5
6
7
8
9
10
11
12
13
  it should "evaluate a sum of two sum expressions" in {
    assert(
      evaluate(
        Sum(
          Sum(
            IntegerLiteral(2),
            IntegerLiteral(2)),
          Sum(
            IntegerLiteral(3),
            IntegerLiteral(1))
          )
      ) == 8)
  }

This of course fails, as we only handle sums of literals.

We can add a case for sums of arbitrary expressions:

1
    case Sum(e1, e2) => evaluate(e1) + evaluate(e2)

But as a literal is itself an expression, and evaluation of a literal is already defined to do the right thing, we can also remove the messy pattern match in the last case, leaving us with:

1
2
3
4
5
6
7
object ExpressionSyntaxEvaluator extends ExpressionSyntax {
  def evaluate(e: Expression): Int = e match {
    case IntegerLiteral(i) => i
    case Sum(e1, e2) => evaluate(e1) + evaluate(e2)
    case _ => 0
  }
}

Of course, we can do similarly for products, resulting in a complete test:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
package com.donroby.parsing
 
import com.donroby.parsing.ExpressionSyntaxEvaluator._
import org.scalatest.FlatSpec
 
class ExpressionSyntaxEvaluatorSpec  extends FlatSpec {
 
  "The expression syntax evaluator "  should "evaluate a literal" in {
    assert(evaluate(IntegerLiteral(42)) == 42)
  }
 
  it should "evaluate a sum of two literals" in {
    assert(evaluate(Sum(IntegerLiteral(2), IntegerLiteral(3))) == 5)
  }
 
  it should "evaluate a sum of two sum expressions" in {
    assert(
      evaluate(
        Sum(
          Sum(
            IntegerLiteral(2),
            IntegerLiteral(2)),
          Sum(
            IntegerLiteral(3),
            IntegerLiteral(1))
          )
      ) == 8)
  }
 
  it should "evaluate a product of two literals" in {
    assert(evaluate(Product(IntegerLiteral(2), IntegerLiteral(3))) == 6)
  }
 
  it should "evaluate a product of two expressions" in {
    assert(
      evaluate(
        Product(
          Product(
            IntegerLiteral(2),
            IntegerLiteral(2)),
          Sum(
            IntegerLiteral(3),
            IntegerLiteral(1))
        )
      ) == 16)
  }
 
}

And quite simple evaluating code:

1
2
3
4
5
6
7
8
9
10
package com.donroby.parsing
 
object ExpressionSyntaxEvaluator extends ExpressionSyntax {
  def evaluate(e: Expression): Int = e match {
    case IntegerLiteral(i) => i
    case Sum(e1, e2) => evaluate(e1) + evaluate(e2)
    case Product(e1, e2) => evaluate(e1) * evaluate(e2)
    case _ => 0
  }
}

The default case will now never be reached, as all possibilities are covered by the first three, but I elect to keep it as a safety, to ease adding more expression types later.

Parsing Expressions with Scala Parser Combinators

I’ve worked with Scala, and specifically with parsing in Scala, but it’s been a while. So this is a refresher for me.

The first challenge was getting Intellij Idea, SBT and a testing setup working so I could begin to code.

In previous work, I used specs2 and liked it very much, but my efforts to do that now led to alot of confusion, and I’ve for the moment given up and dropped back to scalatest. I expect I’ll try to get specs2 going later, but I really wanted to get moving rather than stumbling around.

The parser combinators also used to be bundled with the regular Scala libraries, but have now been made a separate distribution.

So my build.sbt file ended up looking like this:

1
2
3
4
5
6
7
8
name := "expression-parser"
version := "1.0"
scalaVersion := "2.11.1"
 
libraryDependencies ++= Seq(
"org.scalatest" % "scalatest_2.11" % "2.2.0" % "test" withSources(),
"org.scala-lang.modules" %% "scala-parser-combinators" % "1.0.1" withSources()
)

To get started with coding, I made a test in class ExpressionParserSpec:

1
2
3
4
5
6
7
8
9
10
11
12
package com.donroby.parsing
 
import com.donroby.parsing.ExpressionParsers._
import org.scalatest.FlatSpec
 
class ExpressionParserSpec extends FlatSpec  {
 
  "The parser" should "parse an integer literal" in  {
    assert(parseExpression("42") == IntegerLiteral(42))
  }
 
}

I also of course created the referenced ExpressionParsers in order to get this to compile, as a trait and companion object:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package com.donroby.parsing
 
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.input.CharSequenceReader
 
trait ExpressionParsers extends RegexParsers with ExpressionSyntax {
 
  def expression:Parser[Expression] = ???
}
 
object ExpressionParsers extends ExpressionParsers {
 
  def parseExpression(s: CharSequence): Expression = {
    parseExpression(new CharSequenceReader(s))
  }
 
  def parseExpression(input: CharSequenceReader): ExpressionParsers.Expression = {
    parsePhrase(input) match {
      case Success(t, _) => t
      case NoSuccess(msg, next) => throw new IllegalArgumentException(
        "Could not parse '" + input + "' near '" + next.pos.longString + ": " + msg)
    }
  }
 
  def parsePhrase(input: CharSequenceReader): ParseResult[Expression] = {
    phrase(expression)(input)
  }
}

and a near-empty trait ExpressionSyntax:

1
2
3
4
5
6
7
package com.donroby.parsing
 
trait ExpressionSyntax {
 
  sealed abstract class Expression
  case class IntegerLiteral(i: Int) extends Expression
}

This is a bit more code than I like to do just to get the first test to compile and fail, but it’s mostly reasonably simple design organization decisions.

The companion object is the main entry point and has overloaded parseExpression methods primarily for the test as well as a parsePhrase (which might have been better named just parse) called by these and likely to be called in production use.

The ExpressionSyntax trait has a sealed trait Expression whose extensions will represent the abstract syntax tree of the parse. I made the first extension so that it could be referenced from the first test.

I’ve made the ExpressionParsers trait extend ExpressionSyntax for convenience of referencing the syntax types and RegexParsers because I know I’m going to be making at least one parser (the first) using regular expressions. I could have made it extend Parsers now and changed it later…

The test of course fails as tests should on first run, giving scala.NotImplementedError: an implementation is missing.

To make the test pass, we implement the first extremely simple parser:

In the ExpressionParsers:

1
2
3
  def expression:Parser[Expression] = """\d+""".r ^^ {
    s => new IntegerLiteral(s.toInt)
  }

The test passes.

As we know the expression parser is going to be a bit more complex, we’ll immediately refactor to pull out this simple parser to an appropriately named one:

1
2
3
4
5
  def integer:Parser[IntegerLiteral] = """\d+""".r ^^ {
    s => new IntegerLiteral(s.toInt)
  }
 
  def expression:Parser[Expression] = integer

Even this simple parser deserves a bit of explanation, and already involves a parser combinator.

The “””\d+””” is of course just a java.util.String and the .r produces a regular expression, which in Scala is a scala.util.matching.Regex.

^^ is a parser combinator. That takes a bit of explanation, which I’ll try to get in later when we get to the other parser combinators. At any rate, it is a method, which in Scala unlike many languages can have names like this.

It appears there’s some implicit conversion happening here, which can get quite confusing. The method `r` is actually defined in scala.collection.immutable.StringLike, so somewhere there must be an implicit conversion that already converted the String to a StringLike, and this method then converts it to a Regex. Similarly, the ^^ method is defined in an abstract class Parser[+T] in util.parsing.combinator.Parsers, so somewhere an implicit conversion has turned the Regex into a Parser. I have not delved into the source enough to know where these implicits are defined. A Regex is of course specifically converted into a RegexParser.

Before elaborating on the parser combinators, we’ll get another example in. But first let’s make sure it handles negative numbers!

Add another test:

1
2
3
  it should "parse a negative integer literal" in  {
    assert(parseExpression("-42") == IntegerLiteral(-42))
  }

It fails. We just need to add an optional minus sign to the regular expression:

1
2
3
4
5
  def integer:Parser[IntegerLiteral] = {
    """-?\d+""".r ^^ {
      s => new IntegerLiteral(s.toInt)
    }
  }

And both tests pass.

Now to make a slightly more complex expression, starting of course with the test:

1
2
3
4
5
6
7
8
  it should "parse a simple sum" in {
    assert(parseExpression("2 + 3") ==
      Sum(
        IntegerLiteral(2),
        IntegerLiteral(3)
      )
    )
  }

which for compilation also makes us add the new Expression subtype in ExpressionSyntax:

1
  case class Sum(e1: Expression, e2: Expression) extends Expression

And the test of course fails.

Now to get this working we have to use a couple more combinators. We add a `sum` parser and change the `expression` parser to add a reference to it:

1
2
3
4
5
  def sum:Parser[Sum] = integer ~ "+" ~ integer ^^ {
    case (x ~ "+" ~ y) => Sum(x, y)
  }
 
  def expression:Parser[Expression] = sum | integer

The tests all pass again.

Now we see two more parser combinators, the sequencing combinator `~` and the disjunction combinator `|`.

Now for a little bit of explanation. A parser is a subclass of the abstract class Parser[+T] defined in util.parsing.combinator.Parsers. This class has an unspecified method

1
    def apply(in: Input): ParseResult[T]

which subclasses must implement to do the parsing. The ParseResult[+T] class is a sealed abstract class also defined in util.parsing.combinator.Parsers along with some concrete subclasses of which the main ones are Success[+T] , and another sealed abstract class NoSuccess with subclasses distinguishing errors from failures.

The Success[+T] class is a case class with a field result holding the result of the parse and
a field next holding the remainder of the input for other parsers to continue parsing.

A parser combinator is a method combining two parsers or one parser with something else to make a new parser in some way combining the actions of the two or modifying the action of the one.

The sequencing combinator `~` creates a parser that applies the first parser and then if that parser succeeds passes its next field as input for the second. This combined parser succeeds only if both of its component parsers succeed.

The disjunction combinator `|` creates a parser that tries both of its component parsers and returns success if either of them does.

The function application combinator `^^` is a method taking a parser and a function as arguments and creating a parser that succeeds only if the argument parser succeeds and then applies the function to the result.

There are of course other parser combinators. We might get to more in a bit.

We can do product in a way analogous to sum. First the test:

1
2
3
4
5
6
7
8
  it should "parse a simple product" in {
    assert(parseExpression("2 * 3") ==
      Product(
        IntegerLiteral(2),
        IntegerLiteral(3)
      )
    )
  }

and the syntax definition:

1
  case class Product(e1: Expression, e2: Expression) extends Expression

The test fails, and we implement it by adding a `product` parser and again changing the `expression` parser:

1
2
3
4
5
  def product:Parser[Product] = integer ~ "*" ~ integer ^^ {
    case (x ~ "*" ~ y) => Product(x, y)
  }
 
  def expression:Parser[Expression] = sum | product | integer

And of course all the tests pass.

Of course, we could do division and exponentiation similarly, but this is enough for this demo.

But we can still only handle a simple sum or two integers and a simple product of two integers.

We of course want to do something more complex, and the easiest thing is to introduce parentheses. They’re needed for some expressions. We could allow omitting them sometimes if we also handled precedence, but we’re not going to do that now.

So we’ll first just allow parentheses to surround what we’ve already made, starting with two tests:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
  it should "parse a parenthesized sum" in {
    assert(parseExpression("(2 + 3)") ==
      Sum(
        IntegerLiteral(2),
        IntegerLiteral(3)
      )
    )
  }
 
  it should "parse a parenthesized product" in {
    assert(parseExpression("(2 * 3)") ==
      Product(
        IntegerLiteral(2),
        IntegerLiteral(3)
      )
    )
  }

And the implementation:

1
2
3
4
5
  def parenthesizedExpression = "(" ~ expression ~ ")" ^^ {
    case ("(" ~ e ~ ")") => e
  }
 
  def expression:Parser[Expression] = ( sum | product | integer | parenthesizedExpression )

With this the tests pass and we can use parentheses.

Now to allow more complex expressions, we add a test:

1
2
3
4
5
6
7
8
9
10
11
12
13
  it should "parse a complex expression" in {
    assert(parseExpression("(2 * 3) * (4 + 5)") ==
      Product(
        Product(
          IntegerLiteral(2),
          IntegerLiteral(3)),
        Sum(
          IntegerLiteral(4),
          IntegerLiteral(5)
        )
      )
    )
  }

To implement this we create another parser that allows integers or parenthesized expressions and use that in place of `integer` in the definitions of `sum` and `product`:

1
2
3
4
5
6
7
8
9
  def operand = (integer | parenthesizedExpression)
 
  def sum:Parser[Sum] = operand ~ "+" ~ operand ^^ {
    case (x ~ "+" ~ y) => Sum(x, y)
  }
 
  def product:Parser[Product] = operand ~ "*" ~ operand ^^ {
    case (x ~ "*" ~ y) => Product(x, y)
  }

Now by using a couple more combinators, we can simplify the definition of `parenthesizedExpression`.
Recall that its definition looks like:

1
2
3
  def parenthesizedExpression = "(" ~ expression ~ ")" ^^ {
    case ("(" ~ e ~ ")") => e
  }

The “(” and “)” in the parser are actually “constant” parsers that simply return their own value, which then has to be repeated stripped out in the `^^` combinator’s function.

The `~>` combinator acts just like the `~` combinator, but discards the result on the left. Similarly, the `<~` combinator discards the result on the right. So we can change this to:

1
  def parenthesizedExpression = "(" ~> expression <~ ")"

Doing this, all the tests still pass.

I also somewhere along the course of doing this added a test that parsing fails properly:

1
2
3
4
5
  it should "throw exception given incomplete input" in  {
    intercept[IllegalArgumentException] {
      parseExpression("2 +")
    }
  }

This actually tests the use of `phrase` in the parsePhrase method. The phrase method takes a parser as argument and creates a parser that fails if the argument parser fails or if it does not consume all input.

I think I actually did this pretty early, and added the use of `phrase` when I did it, because without this, one of the tests prematurely reported a successful parse because an already implemented parse succeeded without consuming the whole string.

Had I saved this in github intermittently as I wrote it, I might be able to go back and get that in the proper sequence. Think I’ll do that on the next thing I expect to be blogging about.

The code is in github now though.

Ctrl-Alt-Del

I’ve had this site for a while but not done much with it.  A few years ago I blogged for a year or so at donroby.blogspot.com. I’ve decided now to resurrect and update DonRoby.com, start blogging here, and probably put links here from places like my Stack Overflow and LinkedIn profiles.

The last blog entries on the old blog were while I was looking for work.   I found a job and stopped blogging…

I’ve been through some more jobs since, and now I’m looking again, as I was laid off (again!)  I’ll try to keep it going after I find a job this time.

I mainly work in Java, but I’ll likely post work in Scala because it’s more fun.  I might even sneak in some non-programming things about math, music or theater.