In a previous post, I discussed how functional programming constructs are becoming more prevalent in mainstream languages like C#. I also discussed how Linq (Language Integrated Query) is based on fundamental functional language features such as lambda expressions, closures, etc. In the previous post, we looked closely at the C# syntax for lambda expressions. In this post, I would like to take a closer look at closures, how they make Linq possible and how they allow functions to be first class citizens in C#.
What is a Closure?
A closure is a function that is bound to the environment in which it is declared, allowing the function to reference things, like variables, from the environment. To put it more clearly, a closure allows an inner function to access variables that are local to the calling outer function (even though the variables are no longer in scope). Closures have been around for a long time and are available in many functional languages as well as languages like JavaScript, Ruby, Smalltalk, C#, etc. Without closures, functions cannot be first class citizens in a language.
This is probably about as clear as mud now, so let’s see a few examples in C#.
A Few Examples
Let’s take the following simple example…searching a list of numbers for numbers that are greater than a defined maximum value. We can do this pretty simply with the syntax below.
public int[] ReturnNumbersGreaterThanMax() {
int max = 10;
int[] numbers = { 2, 5, 7, 10, 12, 13, 17, 20, 25, 40 };
return numbers.Where(number => number > max).ToArray();
}
}
Many of us have written code this way 100 times without thinking too much about it. What you may not have realized is that you were using a functional construct called a closure. The lambda expression within the Where method is actually a separate function. If it is a separate function, then how does it have access to the variable named max? After all, max is local to the ReturnNumbersGreaterThanMax method and is not in scope for the lambda expression. This works because the C# compiler creates a closure around max, making it available to the lambda expression. Behind the scenes, the compiler creates a class that has a method which executes the lambda expression and a public integer variable named max. The compiler generated class creates a referencing environment around the max variable so that it can continue to be accessed and used by the lambda. Below is an example of what the compiler generated class looks like.
[CompilerGenerated]
private sealed class <>c__DisplayClass1
{
// Fields
public int max;
// Methods
public <>c__DisplayClass1();
public bool <ReturnNumbersGreaterThanMax>b__0(int number);
}
}
The implementation of <ReturnNumbersGreaterThanMax>b__0(int number) looks like the following:
public bool <ReturnNumbersGreaterThanMax>b__0(int number)
{
return (number > this.max);
}
}
Notice that the generated class above provides a referencing environment to execute the lambda expression and give the lambda access to the max variable. That’s all there is to a closure, but it allows you to do some pretty neat things.
For example, let’s say we had a list of employees and we wanted to write a method that returns a list of employees that we feel are underpaid (we’re a generous company) based on pay amount and tenure. We could write something like the following:
public Employee[] GetUnderpaidEmployees(int tenure, int minSalary) {
List<Employee> employees = new List<Employee>();
employees.Add(new Employee { Name = "Jim", Salary = 20000, Tenure = 10 });
employees.Add(new Employee { Name = "Jane", Salary = 50000, Tenure = 2 });
employees.Add(new Employee { Name = "Harry", Salary = 200000, Tenure = 30 });
employees.Add(new Employee { Name = "Clyde", Salary = 80000, Tenure = 3 });
employees.Add(new Employee { Name = "Matilida", Salary = 10000, Tenure = 20 });
employees.Add(new Employee { Name = "Bob", Salary = 5000, Tenure = 35 });
return employees.Where(employee => employee.Salary <= minSalary)
.Where(employee => employee.Tenure >= tenure)
.OrderByDescending(employee => employee.Salary).ToArray();
}
}
Without closures, our lambda expression(or inner functions) would not be able to reference tenure or salary. We wouldn’t be able to do anything really useful with Linq. Closures make the above example possible.
A Gotcha!
Take the simple example below. At first this example seems like it would produce the correct results.
public int CalculateSumSquaresWoops() {
int sumOfSquares = 0;
List<Func<int>> functions = new List<Func<int>>();
for (int i = 1; i <= 10; i++) {
functions.Add(() => i * i);
}
foreach (var function in functions) {
sumOfSquares += function();
}
return sumOfSquares;
}
}
(On a side note, notice how I can create functions, put them in a list and execute them at later time. In functional programming, functions are data and can be passed around like data.)
The result of this code is 1210. Not what you would expect. The compiler creates a closure around i and uses the same instance of the closure for each function. When the functions finally execute, i is 11…so it’s 11 * 11, 11*11…continued. This can be tricky, but there is way around this.
public int CalculateSumSquares() {
int sumOfSquares = 0;
List<Func<int>> functions = new List<Func<int>>();
for (int i = 1; i <= 10; i++) {
int currentNum = i;
functions.Add(() => currentNum * currentNum);
}
foreach (var function in functions) {
sumOfSquares += function();
}
return sumOfSquares;
}
}
Notice now that I am creating a new integer named currentNum for every loop iteration. This tells the compiler that it needs to create a new instance of the closure for every iteration (and every function). It’s a different currentNum instance for every iteration, so the compiler creates a new closure instance to provide a referencing environment to currentNum for each function. The above code evaluates to 385, which is expected.
Functions are Data
As I mentioned, functional programming allows us to treat functions, or code, as data. Without closures, this wouldn’t be possible. Below is an example of a method that returns a list of functions that give each underpaid employee a raise (again, we’re generous). Each function returns a new instance of employee with their new salary. These functions can be executed at a later time, but for now we are just carrying them around as data.
public List<Func<Employee>> GetUnderpaidEmployeeRaiseFunctions(int tenure, double minSalary, double raiseAmount) {
List<Employee> employees = new List<Employee>();
employees.Add(new Employee { Name = "Jim", Salary = 20000, Tenure = 10 });
employees.Add(new Employee { Name = "Jane", Salary = 50000, Tenure = 2 });
employees.Add(new Employee { Name = "Harry", Salary = 200000, Tenure = 30 });
employees.Add(new Employee { Name = "Clyde", Salary = 80000, Tenure = 3 });
employees.Add(new Employee { Name = "Matilida", Salary = 10000, Tenure = 20 });
employees.Add(new Employee { Name = "Bob", Salary = 5000, Tenure = 35 });
var underPaids = employees.Where(employee => employee.Salary <= minSalary)
.Where(employee => employee.Tenure >= tenure);
List<Func<Employee>> functions = new List<Func<Employee>>();
foreach (var underPaid in underPaids) {
// Create a new variable of employee, remember our closure issue.
Employee employee = underPaid;
// Each function will return an employee instance and give the
// employee a raise. The functions will be executed later.
functions.Add(() => new Employee
{ Name = employee.Name,
Salary = employee.Salary * (1 + raiseAmount),
Tenure = employee.Tenure });
}
return functions;
}
}
Kind of a contrived example, but pretty powerful. The next time you are using Linq, remember that you are probably using closures. Also remember that functions are now data and can be treated as such.
Posted by Shane Foster