Scala's match operator and unapply methods

Categories: Java

My scala notes article just mention under “pattern matching” that apply/unapply is “too complex to discuss here”. I’ve attempted to address the issue in this article; as I am pretty new to Scala please don’t take the info below as guaranteed correct. This article also covers only the most common features of pattern-matching; the Neophytes Guide articles cover the topic in more detail.

The Magic Unapply Method

Pattern-matching is a feature available in many functional programming languages, and is supported in Scala via the match operator together with the magic unapply method that can be defined on classes and singleton-objects. Method unapply takes as its only argument an instance of a target type and returns an Option containing a tuple of parameters, or None if the argument is not of an acceptable type. Unapply methods are often referred to as “extractors”.

The Match Algorithm

After some research, it appears that the pattern-matching algorithm is (in my phrasing) as follows.

Given an expression like the following:

var result = someobj match {
  case somepattern => ...
  case somepattern => ...
  case somepattern => ...

the algorithm is (roughly):

for each pattern in order:
  if somepattern is just "_" then the case succeeds
  else if somepattern is just a variable-name then the variable is bound to the object being matched and the case succeeds
  else if somepattern is just a literal value x (without following parentheses) then the case succeeds if someobj.equals(x)
  else if somepattern is of form "someMatcher(somepattern2)" then
    var unapplyMethod = findBestMatchingMethod(target=someMatcher, methodName="unapply", paramType=classOf(someobj))
    if no such unapply method exists, the case fails to match
    else
      var results = someMatcher.unapplyMethod(someobj)
      if results is None then the case fails to match
      else if result is Some(someobj2) then
        recursively invoke the above algorithm to try to match someobj2 against somepattern2
      else if result is Some(sometuple) then
        for each pair of values (someobj2, somepattern2) from the result tuple and the pattern tuple
          recursively invoke the above algorithm to try to match the object against the pattern

Value “someMatcher” is usually a reference to a Scala singleton object, in which case its name will start with a capital letter. And usually the singleton object is the companion-object of some class. However this is not a requirement; a reference to a non-singleton is also acceptable. Remember that singleton and non-singleton objects both have an associated class with a fixed set of methods and fields - it is just that the singleton instances cannot be manually instantiated.

Note that when matching using the form case Foo(1,2,3) => .. then this looks like the apply method of (singleton) object Foo is being invoked and passed three parameters. However in match-patterns this is not the case; in fact the dataflow is quite different: Foo.unapply is being invoked with the object being matched on and then the contents of the returned tuple are matched pairwise (like the method zip) with the tuple (1,2,3). That the tuple following Foo is not a regular parameter list is more obvious when using wildcards and variables eg case Foo(1, a, _).

A Match Example

As the above algorithm is somewhat abstract, lets walk through an example:

trait Node(val id:String)
class Nonleaf(id:String, val left:Node, val right:Node) extends Node
class Leaf(id:String, val name:String) extends Node

object Nonleaf {
  def unapply(obj:Nonleaf) = Some((obj.id, obj.left, obj.right))
}

object Leaf {
  def unapply(obj:Leaf) = Some((obj.id, obj.name))
}

val leaf1 = new Leaf("l1", "leafname1")
val root = new Nonleaf("root", null, leaf1)

val result = root match {
  case "hello" => "matched literal value"
  case Leaf(_, _) => "matched leaf"
  case Nonleaf("unknown", a, b) => "matched unknown"
  case Nonleaf("root", _, Leaf("l1", name)) => s"matched root with right child having name=$name"
  case _ => "Default case matched"
}
println(result)

The first case is evaluated:

  • The case pattern is a value (and not followed by parentheses), so it is directly compared against the object being matched; they are not equal (one is a string, the other is a NonLeaf instance) so the case fails.

The second case is evaluated:

  • Leaf is an object followed by parentheses, but method Leaf.unapply(root) cannot be invoked, as there is no method Leaf.unapply(Nonleaf); the case therefore fails to match.

The fact that Leaf is a singleton instance is not relevant; any object will be treated the same here.

The third case is evaluated:

  • Nonleaf is an object followed by parentheses, so method Nonleaf.unapply(root) is invoked; it returns Some("root", null, leaf1), ie “extracts” the properties of the root object
  • The elements of the returned tuple are then compared pairwise with the elements specified in the case: ("unknown", a, b), ie the match-algorithm is recursively applied to each (value, pattern-element) pair.
    • the first pair to compare are value “root” andd literal “unknown” which are not equal, so the case fails to match

The fourth case is evaluated:

  • Nonleaf is an object followed by parentheses, so method Nonleaf.unapply(root) is invoked as with the case above
  • The elements specified in the case are then compared pairwise with the elements returned by unapply:
    • literal “root” is equal to literal “root”
    • an underscore matches anything
    • Leaf is an object followed by parentheses, so method Leaf.unapply(leaf1) is invoked, returning the tuple ("l1", "leafname1")
    • The elements specified in the case are recursively compared pairwise with the elements returned by unapply:
      • literal “l1” is equal to “l1”
      • varname “name” is bound to “leafname1”
  • there are no more elements to compare, so this case succeeds

Notes

Classes Leaf and Nonleaf could be implemented as case-classes, in which case companion-objects would automatically be generated, with unapply methods that look just like the ones above.

Note that Leaf.unapply does not take an object of type Node or Nonleaf as parameter. When the match-statement executes, it therefore must skip the case-clause if no matching unapply exists.

An unapply method often returns values which are similar to, or identical to, the tuple passed to the object constructor. When an object defines an apply method for use as a factory-method, then often the apply params and unapply return-value are symmetrical. However this isn’t a requirement.

An object can potentially define multiple unapply methods which take different types of parameters; the match-statement will select the most appropriate method depending on the runtime type of the object being matched.

There is yet another magical method named unapplySeq which can be defined for objects which are containers for variable length data. As example,

   var somelist = List(....)
   somelist match { case head::tail => ...}

The singleton object List defines an unapplySeq method rather than unapply, as the contents of a list cannot be reasonably represented as a tuple. Similarly, a regular expression applied to a string can have an arbitrary number of capture-groups so class Regex defines an unapplySeq method which returns Option[List[...]] rather than an unapply(..) method which returns Option[Tuple].

Methods unapply and unapplySeq are also used in “destructuring assignments” such as

    val SomeMatcher(field1, field2) = someobj

The At Operator

A match expression can also include an @ symbol, which allows capturing of the “top level” object in the match.

  obj match {
    case widget:Widget => println(s"Encountered a Widget with content ${widget.content}") // match root object but not nested fields
    case Gadget(x, y) => println(s"Encountered a Gadget with x=$x y=$y") // match nested fields but not root object
    case f @ Fidget(x, y, 1) if x > 0 => println(s"Encountered a Fidget with obj=$f x=$x y=$y") // match root object and nested fields
    case other => println(s"encountered an object of type ${other.getClass.getName}") // match root object but not nested fields
  }

In the third case, not only are x and y bound to attributes of the matched object, but f is also bound to the entire object.

The @ is only needed when a reference to the whole matched object is needed and bindings are required to attributes of the matched object at the same time.

References