elixir's lovely pipelines

13 Jul 2015 - New York

Elixir’s pipeline operator is a stroke of genius.

All it does is pass the expression on its left-hand side as the first argument to the function on its right. Pretty simple.

Simple, but it makes a huge difference in how cleanly we can represent composed data transformations in code.

To illustrate, I’ll refer to an early iteration of a module I wrote for an Exercism exercise. It’s more verbose than it needs to be for the work being performed, but what it lacks in conciseness it adds in opportunities for Elixir’s syntax to shine.

defmodule Words do
  @doc """
  Count the number of words in a sentence.
  Case-insensitive, ignores leading/trailing punctuation.

  Returns a `Map<String, Int>` where each string maps to a count.

  ## Examples

     iex> Words.count("This is a sentence that has a certain number of wor3ds.")
     %{"a" => 2, "certain" => 1, "has" => 1, "is" => 1, "number" => 1,
       "of" => 1, "sentence" => 1, "that" => 1, "this" => 1}
  """
  def count(sentence_string) do
    sentence_string
    |> list_of_strings
    |> list_of_words
    |> word_tallies
  end

  defp word_tallies(list) do
    list
    |> Enum.reduce(Map.new, &update_tallies/2)
  end

  defp update_tallies(word, counts) do
    counts
    |> Map.update(String.downcase(word), 1, &(&1 + 1))
  end

  defp list_of_strings(string) do
    string
    |> String.split(~r/[[:space:]]/)
  end

  defp list_of_words(list) do
    list
    |> Enum.filter_map(&is_word?/1, &strip_non_alphanumeric_chars/1)
  end

  defp is_word?(string) do
    string
    |> String.match?(~r/^[[:alpha:]]+$/)
  end

  defp strip_non_alphanumeric_chars(string) do
    word_match = Regex.run(~r{(*UTF)[[:alpha:]-]+}, string) || []
    word_match |> List.first
  end
end

Our public function, count, serves a similar purpose as a composed method, providing a high-level overview of the transformations being applied to the input data. It dispatches to a series of functions, each of which does one thing in a stateless fashion.

Beginning with list_of_strings and list_of_words, we continue decomposing the problem, each time at a lower level of abstraction, while straightforwardly self-documenting their return values.

The functions strip_non_alphanumeric_chars, update_tallies, and is_word? are passed to the higher-order functions map and filter_map.

These could be passed as lambdas, but as much as possible I try to keep all logic in a given function at the same level of abstraction (SLAP), which makes it easier both to maintain and to reason about.

The final product is straightforward to read and understand. Each function gives a clear sense of what comes in, how data is being transformed, and what comes out. And thanks to the elegance of |>, we get this with a minimum of syntactic noise getting in the way.

To achieve a similar effect in Ruby, for example, you’d either have to use explaining variables,

  def count(sentence_string)
    word_list = list_of_words(sentence_string)
    word_tallies(word_list)
  end

or method composition, which reverses the left-to-right order in which the logic unfolds (although an occasional ‘from’ and Seattle-style invocation marginally mitigates this, it still adds some cognitive overhead):

  def count(sentence_string)
    word_tallies_from list_of_words(sentence_string)
  end

or lastly, for a more authentically object-oriented design, one or more instance variables to maintain state:

class Words
  attr_reader :sentence

  def initialize(sentence)
    @sentence = sentence
  end

  def count
    parse_words
    tally_words
  end

  private

  def parse_words
    parse_list_of_strings_from_sentence
    parse_list_of_words_from_sentence
  end
  # . . . etc. . . .
end

Alternatively, if we have our command methods return self, we can approximate the succinctness Elixir achieves with |>, but imho it overshoots the mark:

  # . . .
  def parse_words
    parse_list_of_strings.parse_list_of_words
  end

  def parse_list_of_strings
    @sentence = sentence.split(/[,\s_]/)
    self
  end
  # . . . etc. . . .

Each of these seems like an awkward fit for the kind of work we’re performing, however. Especially so for the object-oriented approach, since we might not need to maintain any state for a problem like this, and doing so can end up adding significant rigidity as modules grow.