Or maybe, more accurately, Functional Programming by a data scientist.

Functional programming is something I find myself thinking about quite a bit. I’ve written a good bit of Haskell in my time, and I’ve found myself hacking a lot with Lisp down the years too – to the point of making a Lisp of my own a while ago. It’s a hugely important paradigm, and one that gives a really useful perspective that in my opinion leads to better code. In this blog post I want to talk a bit about FP, with a particular focus on how you can make practical use of these concepts in Python.¹

Functions

What is a function? The canonical function looks like this

#include <stdio.h>
main()
{
    printf("hello, world\n");
}

Its an operation, a print statement, encapsulated in a little box called main. This idea of a function as effectively just a way to carry around a set of operations is an old one, and ultimately it comes from Structured Programming, one of the first paradigms in language design and programming. This paradigm has two core elements to it, the first being Structured Data – essentially, picking an appropriate data structure will make your code cleaner and more efficient. The other part is Structured Code, your problem has some inherent structure and if you reflect that structure in your code then your solution will be better; something like this

def problem():
    s1 = sub_problem1()
    s2 = sub_problem2()
    return s1 + s2

def sub_problem1():
    return sub_solution

def sub_problem2():
    return sub_solution

Both of these ideas are fairly unimpeachable, by now they’re so deeply ingrained in the way that we approach problems that they almost go without saying. The second idea here is what I’ll focus on, breaking a problem down and making code more modular. We get a lot of benefits from modularity,

the code becomes easier to debug, because the functions are small
the small functions are easier to code in the first place, so they should be clearer
small functions are more likely to be re-usable than big, monolithic ones

We like modularity. Extrapolating from this, if we can write more modular code then we’ll generate better solutions, at less expense of programmer time. This all sounds great, but there’s one snag, embedded in a single character above: +. We can break our problem down into lots of small functions, but combining their outputs is rarely as simple as just summing them. To really leverage the power of modularity we need to have a flexible way to combine functions. Enter, Functional Programming. FP is glue.² It’s a way for us to take modules and stick them together at a higher level, giving us more powerful abstractions to work with.

There are 4 main ideas to FP (IMHO) that make it useful as glue,

Purity
Static Analysis
Laziness
Higher Order Functions

1 - Purity

The first idea we’ll look at it purity. Pure functions are very much associated with FP, and they’re quite simple to define. A pure function is one that has no state, no side-effects,³ and whose behavior is solely determined by its arguments. At first blush it sounds like, as Hughes puts it, ‘a medieval monk, denying himself the pleasures of life in the hope that it will make him virtuous’. But we get a lot in return for our stringency. For example this function is impure,

data = [1, 2, 3, 4]
n = 5
def scale(lst):
    for i in range(len(lst)):
        lst[i] = lst[i] * n
    return lst

result = scale(data)

It relies on some external state, the global variable $n$, and it has the side effect of mutating its input, in the case above after we’ve run the function data == result. This is bad. Since it depends on the value of $n$, which isn’t handed in as an input, we need to know the value of that variable if we want to know what the code will do. And that variable could be far away or could be altered by some other function. Also, the fact that it changes its input is really annoying, it’s easy to see in a simple case like this, but bugs that arise from mutating inputs like that can be tricky. We can refactor this code to make it pure, and do away with those issues

def scale(lst, s=5):
    return [s * v for v in lst]

Much better. What we’ve gained from sticking to purity, in this trivial case, is a piece of code that a lot easier to reason about. When we can easily reason about what a piece of code does – when all it does is return outputs which are entirely determined by its inputs – we save ourselves a lot of mental effort when combining functions together.

2 - Static Analysis

Static analysis is a really powerful tool, and while it’s not limited solely to functional languages, functional languages tend to provide for strong static analysis.⁴ The idea is to check a piece of code for ‘correctness’ before having to run it. Python, being a fairly cavalier, dynamically typed language, doesn’t provide a lot of static analysis tools, but recently MyPy was made available. Python now supports type-hints in function signatures, and MyPy can check that those hints are obeyed. Taking our last example, we can now write

from typing import List

Vector = List[int]

def scale(lst: Vector, s: int=5) -> Vector:
    return [s * v for v in lst]

Here we define a type called Vector, a list of ints, and we annotate our function to indicate that it should expect a Vector as input and return a Vector as output. It’s still possible to call this code with some other input, the hints are just hints and do nothing at run time, but we can use MyPy to analyse the code now. For instance if we call the function with a list of strings as input, MyPy will tell us off. This is super handy, and can be integrated into a CI/CD tool to do this checking whenever code is added. I like to think of the (huge) parameter space of possible inputs to a function; tests can verify that the code performs correctly for individual inputs, but static type checking like this can give us assurances about whole dimensions of the parameter space. With pure functions all we need to worry about are the inputs and outputs, and when we can type check those we start to get some meaningful guarantees about the correctness of our code.

3 - Laziness

Laziness is probably the most important kind of glue that FP gives us, and the one that’s hardest to recreate outside of functional languages. The idea is, say we have a set-up like this, $g(f(x))$. Here $x$ is a potentially infinite list of inputs, $f$ generates a possible answer to the problem we’re trying to solve from each $x$, and $g$ selects the right answer from this haystack. The idea behind laziness is that we only evaluate $f(x)$ as many times as $g$ requires, $g$ drives the computation. $f$ is only started once $g$ tries to read some input, and only runs for long enough to deliver the output $g$ is trying to read. Once $g$ is finished the process stops. Even though our input is infinite we only do a finite amount of work. This makes for a very flexible way to glue together functions.

Python doesn’t have true laziness, but it does have something close, generators. Take this generator for instance

def numbers(x: int=0) -> int:
    while True:
        yield x
        x += 1

N = numbers()

$N$ will produce all of the natural numbers, one by one, when we call the function next on it. A potentially infinite list of outputs, which are only computed when needed, when the function next asks for a value. Python provides some really great functionality for working with generators, specifically in the itertools module. For instance we can write functions like this, using generators

from itertools import takewhile

def predicate(x: int) -> bool:
    p = x**2 + 10 * x + 50
    return p < 1000

odd_nums = (x for x in N if x % 2)
nums_lt = takewhile(predicate, odd_nums)
sum(nums_lt)

In this example we use $N$ to supply a stream of numbers. We remove the even numbers, using the generator comprehension syntax. This gives us another infinite generator containing only the odd numbers. We use takewhile to get all of the numbers for which the predicate we’ve defined is true. This is another generator. All of these generators are lazy, they haven’t done anything at this point, they’re more like a promise to compute these things if we ever need them. It’s only when we call the sum function that all of this computation is actually carried out, and an answer returned. This is an extremely useful workflow, one that I use a lot when dealing with extremely large data sets.⁵ We can parse and analyse this data without ever needing to read into memory more than a line at a time.

4 - Higher Order Functions

The final piece of the puzzle is Higher Order Functions. The idea here is that functions should be able to take other functions as input, the canonical example (and probably the oldest) being the map function:

list(map(str.capitalize, ['alice', 'bob', 'charlie']))

So map takes 2 inputs, a function and a list, and it applies the function to each value in the list – $\rm{map}(f, x) = [f(x_1), …, f(x_n)]$. Notice also that I have to apply list to the map output, by default it’s lazy and will return a generator.

There are a lot of HOFs out there, but really a few simple HOFs are all you need in your toolbox. Along with map, the Python standard library has filter, which take a Boolean function and a list and will return the values of the list where the function evaluates to true. Up until recently the standard library also had reduce but it has now been relegated to the functools module, apparently Guido Van Rossum wasn’t a fan. This function applies a function to a list in a kind of nested way (an operation that often called folding in other languages),

\[\rm{reduce}(f, [x_1, x_2, x_3], i) = f(x_3, f(x_2, f(x_1, i)))\]

This might seem like a weird operation, but it turns out to be very useful, for instance sums and factorials can be written as reduction,

from functools import reduce
from operator import mul

reduce(mul, [1, 2, 3, 4], 1) #factorial

You can also combine reductions and maps, to get the Map-Reduce paradigm. This is very common in Big Data, the idea being that you can easily parallelise these operation, each map job can happen independently and then the reduction collects together the outputs

from functools import reduce
from operator import mul, add

reduce(add, map(mul, [1, 2, 3, 4], [2, 3, 4, 5])) #dot product

To get some more HOFs we can turn to the toolz module – probably my all-time favourite module to be honest. It’s got a lot of useful stuff, I’ll just mention 2 HOFs I use quite a bit. First is currying, this takes a function of many arguments and turns it into a sequence of functions, each with a single argument,

from toolz import curry

def add_and_scale(x: int, y: int, z: int) -> int:
    return (x + y) * z

add_and_scale = curry(add_and_scale)

add_and_scale(10)(20)(2)

This is very handy, and very common in purely functional languages, in Haskell for instance every function is curried by default. The other HOF is composition, which allows us to take a series of functions and glue them together to get one function, $g(f(x)) = (g \cdot f)(x)$,

from toolz import compose

def add(x: int, y: int) -> int:
    return (x + y)

def scale(x: int, y: int) -> int:
    return x * y

scale2 = curry(scale)(2)
add_and_scale2 = compose(scale2, add)
add_and_scale2(10, 20)

This is another really useful thing to be able to do, it’s FP as glue in the most literal sense, we can make pipelines of functions by composition.

Conclusion

So that’s about it. The point that I’ve tried to make here is that modularity is hugely important in programming, modular code is good code. And if we want to write modular code, then we need to have strong glue to stick those pieces together. We often hear that FP leads to great increases in programmer efficiency, and in my opinion this stems from the power of these functional concepts. Rather than programming languages needing to be bigger, the expressive power of a language comes from how flexibly it allows for simple functions to be combined.

This blog post is based on a talk I’ve given a couple of times, the slides are here. ↩
This isn’t a new observation, it originates from a great paper by John Hughes (not the one who made Ferris Bueller’s Day Off, another one). ↩
This just means the function changes nothing in the process of running, either by printing, or mutating data, or really doing anything other than computing and returning a result. ↩
It’s sometimes said of Haskell for instance that if a piece of code compiles, it’s probably correct, such is the faith people put in the compilers power to validate code. ↩
Or even potentially infinite data sets, like a channel that we’re reading from as new data is streaming in. ↩