Understanding Closures

February 15, 2009

In a previous post, I discussed how functional programming constructs are becoming more prevalent in mainstream languages like C#.  I also discussed how Linq (Language Integrated Query) is based on fundamental functional language features such as lambda expressions, closures, etc.  In the previous post, we looked closely at the C# syntax for lambda expressions. In this post, I would like to take a closer look at closures, how they make Linq possible and how they allow functions to be first class citizens in C#.

What is a Closure?

A closure is a function that is bound to the environment in which it is declared, allowing the function to reference things, like variables, from the environment.  To put it more clearly, a closure allows an inner function to access variables that are local to the calling outer function (even though the variables are no longer in scope).  Closures have been around for a long time and are available in many functional languages as well as languages like JavaScript, Ruby, Smalltalk, C#, etc.  Without closures, functions cannot be first class citizens in a language.

This is probably about as clear as mud now, so let’s see a few examples in C#.

A Few Examples

Let’s take the following simple example…searching a list of numbers for numbers that are greater than a defined maximum value. We can do this pretty simply with the syntax below.

 public int[] ReturnNumbersGreaterThanMax() {
            int max = 10;
            int[] numbers = { 2, 5, 7, 10, 12, 13, 17, 20, 25, 40 };
            return numbers.Where(number => number > max).ToArray();
        }
}

Many of us have written code this way 100 times without thinking too much about it. What you may not have realized is that you were using a functional construct called a closure.  The lambda expression within the Where method is actually a separate function. If it is a separate function, then how does it have access to the variable named max? After all, max is local to the ReturnNumbersGreaterThanMax method and is not in scope for the lambda expression.  This works because the C# compiler creates a closure around max, making it available to the lambda expression.  Behind the scenes, the compiler creates a class that has a method which executes the lambda expression and a public integer variable named max. The compiler generated class creates a referencing environment around the max variable so that it can continue to be accessed and used by the lambda. Below is an example of what the compiler generated class looks like.

    [CompilerGenerated]
    private sealed class <>c__DisplayClass1
    {
        // Fields
        public int max;

        // Methods
        public <>c__DisplayClass1();
        public bool <ReturnNumbersGreaterThanMax>b__0(int number);
    }
}

The implementation of <ReturnNumbersGreaterThanMax>b__0(int number) looks like the following:


   public bool <ReturnNumbersGreaterThanMax>b__0(int number)
   {
      return (number > this.max);
   }
}

Notice that the generated class above provides a referencing environment to execute the lambda expression and give the lambda access to the max variable. That’s all there is to a closure, but it allows you to do some pretty neat things.

For example, let’s say we had a list of employees and we wanted to write a method that returns a list of employees that we feel are underpaid (we’re a generous company) based on pay amount and tenure. We could write something like the following:


  public Employee[] GetUnderpaidEmployees(int tenure, int minSalary) {
            List<Employee> employees = new List<Employee>();
            employees.Add(new Employee { Name = "Jim", Salary = 20000, Tenure = 10 });
            employees.Add(new Employee { Name = "Jane", Salary = 50000, Tenure = 2 });
            employees.Add(new Employee { Name = "Harry", Salary = 200000, Tenure = 30 });
            employees.Add(new Employee { Name = "Clyde", Salary = 80000, Tenure = 3 });
            employees.Add(new Employee { Name = "Matilida", Salary = 10000, Tenure = 20 });
            employees.Add(new Employee { Name = "Bob", Salary = 5000, Tenure = 35 });

            return employees.Where(employee => employee.Salary <= minSalary)
                            .Where(employee => employee.Tenure >= tenure)
                            .OrderByDescending(employee => employee.Salary).ToArray();
        }
}

Without closures, our lambda expression(or inner functions) would not be able to reference tenure or salary. We wouldn’t be able to do anything really useful with Linq. Closures make the above example possible.

A Gotcha!

Take the simple example below. At first this example seems like it would produce the correct results.


 public int CalculateSumSquaresWoops() {
            int sumOfSquares = 0;
            List<Func<int>> functions = new List<Func<int>>();
            for (int i = 1; i <= 10; i++) {
                functions.Add(() => i * i);
            }

            foreach (var function in functions) {
                sumOfSquares += function();
            }

            return sumOfSquares;
        }
}

(On a side note, notice how I can create functions, put them in a list and execute them at later time. In functional programming, functions are data and can be passed around like data.)

The result of this code is 1210. Not what you would expect. The compiler creates a closure around i and uses the same instance of the closure for each function. When the functions finally execute, i is 11…so it’s 11 * 11, 11*11…continued. This can be tricky, but there is way around this.


  public int CalculateSumSquares() {
            int sumOfSquares = 0;
            List<Func<int>> functions = new List<Func<int>>();
            for (int i = 1; i <= 10; i++) {
                int currentNum = i;
                functions.Add(() => currentNum * currentNum);
            }

            foreach (var function in functions) {
                sumOfSquares += function();
            }

            return sumOfSquares;
        }
}

Notice now that I am creating a new integer named currentNum for every loop iteration. This tells the compiler that it needs to create a new instance of the closure for every iteration (and every function). It’s a different currentNum instance for every iteration, so the compiler creates a new closure instance to provide a referencing environment to currentNum for each function. The above code evaluates to 385, which is expected.

Functions are Data

As I mentioned, functional programming allows us to treat functions, or code, as data. Without closures, this wouldn’t be possible. Below is an example of a method that returns a list of functions that give each underpaid employee a raise (again, we’re generous). Each function returns a new instance of employee with their new salary.  These functions can be executed at a later time, but for now we are just carrying them around as data.


 public List<Func<Employee>> GetUnderpaidEmployeeRaiseFunctions(int tenure, double minSalary, double raiseAmount) {
            List<Employee> employees = new List<Employee>();
            employees.Add(new Employee { Name = "Jim", Salary = 20000, Tenure = 10 });
            employees.Add(new Employee { Name = "Jane", Salary = 50000, Tenure = 2 });
            employees.Add(new Employee { Name = "Harry", Salary = 200000, Tenure = 30 });
            employees.Add(new Employee { Name = "Clyde", Salary = 80000, Tenure = 3 });
            employees.Add(new Employee { Name = "Matilida", Salary = 10000, Tenure = 20 });
            employees.Add(new Employee { Name = "Bob", Salary = 5000, Tenure = 35 });

            var underPaids = employees.Where(employee => employee.Salary <= minSalary)
                                      .Where(employee => employee.Tenure >= tenure);

            List<Func<Employee>> functions = new List<Func<Employee>>();
            foreach (var underPaid in underPaids) {

                // Create a new variable of employee, remember our closure issue.
                Employee employee = underPaid;

                // Each function will return an employee instance and give the
                // employee a raise. The functions will be executed later.
                functions.Add(() => new Employee
                                    { Name = employee.Name,
                                      Salary = employee.Salary * (1 + raiseAmount),
                                      Tenure = employee.Tenure });
            }

            return functions;
        }
}

Kind of a contrived example, but pretty powerful. The next time you are using Linq, remember that you are probably using closures. Also remember that functions are now data and can be treated as such.


Working from home with Dropbox

December 28, 2008

Working from home in my company, if you’re a developer working within an IDE,  is a bit of a pain.  In order to work within the IDE, one needs to log-in to Citrix, remote desktop into their machine at work, then begin working within Visual Studio from a remote connection. It’s painful.  It’s actually easier to drive to work in a blizzard.

The obvious solution is to use some sort of VPN, allowing the developer to work locally from their home machine (assuming they have Visual Studio installed at home) and use the VPN to access the Subversion repository only when needed. All coding, compiling, debugging, etc is done locally. No more fighting technology and getting nothing done.

However, since we have no current plans to get a VPN, it would be nice to be able to synchronize multiple machines to a single location. Any changes to the “synchronized” files would automatically update every other machine that you have selected to participate (No more carrying files around on a USB drive)

A relatively new product called Dropbox allows you to download a small client to multiple machines and select files or folders to synchronize across all of the machines. There is also a web site that allows you to manage your account, view revision history, etc. The base account is free and gives you 2 Gigs of space.

This could work well to ease our work from home pains. We could all synchronize our local work machine’s repository folders to our Dropbox accounts. Any time we made a change to the local repository at work, it would synchronize with Dropbox. If we needed to work from home, we would have the same code files on our home machine that we have on our work machine. No more worrying about updating our USB drive or using remote desktop through Citrix.  The only time we would need to remote in from home is to update the Subversion repository (easy to do since our work machine local repository will be automatically synchronized to the changes we made from our home machine). We could also just wait until we get into work the next day to update the Subversion repository.

I’m installing Dropbox to my work machine in I get in Monday.


Continuous Integration = Continuous Improvement

September 14, 2008

Typical Software Development Nightmares

Release Integration Nightmare – You and you’re team have been frantically building a new system for a few months now, and the first iteration release is coming in a couple of weeks.  There are many different components involved and each one has been completed. Today is the day you will start making the components work together. You put everything in place…and things are broken. Interfaces are not compatible, there are major bugs. Components that worked well in isolation just don’t work well together. There is no way you will be able to solve all of these issues before the release.

Can’t Build The Source Nightmare – You’re team has built up quite the code repository. It’s complex and there are many dependencies. You have been tasked with enhancing a particular part of the application. You download the source from the repository and try to build it on your machine…it doesn’t build.  You’re missing class files, interfaces, and other dependencies. You’ve worked with “Ted, the Progammer” and 3 other developers to track down everything you need to build this thing. After two days, you’ve built it and now you can start the enhancements so the next person that downloads the source can’t build. If you can’t build on your machine, what the hell is in production?

I’m Not Sure What I’m Putting In Production Nightmare – With developers always building the software from their local machine, you are never sure if the most recent version is actually in the repository. As a matter of fact, you really don’t even trust your repository. When it is time for you to deploy, you just build the software on your machine and deploy it. The problem is, Ted also built the software on his machine to fix a nasty bug, which he later deployed. He also forgot to commit the change to the repository (of course, you don’t trust the repository so you probably wouldn’t have downloaded the change anyway).  You deploy your build (which does not contain Ted’s bug fix).  The nasty bug is back. It sucks to be you.

I Don’t Have Visibility Into The Code Nightmare – The code base is large and you have no way of knowing how the pieces fit together. There is no “big picture” view of the software and the dependencies. You would like to refactor and put together some improvement plans, but you don’t even know where to start.

Continuous Integration

Integration is one of the primary risks in a software development project. Unless you are working on you’re own, you will need to worry about integration and making sure that the components created can actually work together.  Practicing continuous integration involves the following:

Build software at every change – Whenever anything is changed and committed to the repository (you do have a code repository, don’t you?), an automated build system downloads the changes from the repository and completely builds all of the source. This build should be quick (no longer than 10 minutes).

All tests pass every time a build occurs – While simply building and compiling your code can help, automated tests are central to continuous integration.  Having a large set of tests (unit, component, etc.) will help you find bugs very quickly.  A good test suite gives the team a lot of confidence every time they commit.

Inspect code continuously and automatically – There are a lot of great inspection tools that can alert the team of design, security, and coding standards issues.  In the .NET world, tools such as NDepend and FxCop can do some of the more mundane code inspections automatically, allowing manual code inspections to focus on high level design and requirements issues.

Continuous deployment – With builds happening multiple times per day, the team is always ready to grab the latest build to demo, test, etc.  Deployment becomes much less of a hassle because you have been doing it continuously throughout the release.

Documentation – I’m one of those people that believe the best documentation about the code is the code itself.  In my experience, an external document about the code and design almost always becomes inaccurate over time, and developers would rather look at the code anyway.  I’m not talking about user documentation, I’m talking about programmer documentation.  Creating extensive documents about interfaces, classes, etc ends up being a waste of time because no one reads them for fear that they are inaccurate.  There are many nice tools out there that can create code documentation automatically for you (from the code itself), providing descriptions of all of the classes and interfaces based off the actual code files. Some tools can even create diagrams based off the design and dependencies. Including these tools in your continuous integration environment allows code to be documented naturally, as it is written. You don’t have to worry about keeping your documentation in sync with all of your code, it happens automatically…no double effort.

Fix issues quickly - If a continuous integration build fails, it “goes red” so to speak and it becomes a priority to fix the issue as soon as possible.  Integration and test failures do not sit around…bugs found by automated tests are fixed immediately.

It’s Agile?

A cornerstone of being agile is having access to continuous information, allowing one to identify issues quickly and notice trends. In order to accurately change, you must have up to date information.  Continuous integration allows you to reduce risk and provides excellent visibility into the code base.  With continuous integration, team members know right away if a change fails a test, or does not integrate well with existing components.  It also provides immediate feedback and alerts you if code is not within standards, or design dependencies are not where you would like them to be.  The team finds out about these things as soon as they happen, not months into the project. Yes..it’s agile.

Continuous Integration = Continuous Improvement

Without information at your fingertips, it’s very difficult to find areas where you can improve. For example, without some visibility into the code, it can be difficult to find areas of the code that are candidates for refactoring. Continuous integration allows the team to keep their fingertips on the pulse of the code base. The team gets a status of the code base multiple time per day. Trends can be identified and areas of code that lack tests can have code coverage added.  Day by day you continue to gather information, allowing the team to steadily improve the code base. For continuous improvement, you need to be able to measure a little, change a little, and measure again.  Continuous integration allows a team to apply these principles by providing continuous feedback into the code base.


Functional Programming Leaks Into The Mainstream

August 24, 2008

What is Functional Programming?

Functional programming is a programming paradigm that has been around for a very long time, but has never really been utilized outside of academia…until recently.  Functional programming allows the developer to express solutions using functions, providing a succinct and declarative style while limiting side effects.  This style of programming allows the developer to write code that expresses what to do instead of how to do it.

This probably makes no sense to someone who has not had any experience with functional programming, so let me provide a simple example.

We are all familiar with the “for loop”. Below is a standard for loop in C# that squares each element in the array and then returns the sum of the squared values:

public int CalculateSumSquaresImperative() {
    int[] numbers = { 1, 2, 3, 4, 5 };
    int sum = 0;
    for (int i = 0; i < numbers.Length; i++) {
        int numberSquared = numbers[i] * numbers[i];
        sum += numberSquared;
    }
    return sum;
}

As you can see, I had to tell the compiler how to do everything. I said, “Mr. Compiler, initialize the sum variable, loop though each element in the numbers array (by providing the correct index of the array), calculate the square of the current item in the array, update the value of sum, rinse and repeat. ” This type of code is imperative. In other words, you have to tell the compiler how to do everything. I also had to change the state of the sum variable every time through the loop. Contrast this imperative example with a functional example:

public int CalculateSumSquaresFunctional() {
    int[] numbers = { 1, 2, 3, 4, 5 };
    return numbers.Sum(i => i * i);
}

Hmmm…this code is extremely clean and succinct, and I told the compiler what to do instead of how to do it.  With this code I said, “Mr. Compiler, square each number in this list and return the sum of the squares.” Ahhh…much easier. I tell the compiler what to do and he can decide how to do it. I don’t have to explain everything to him.

I’d like to take a second and explain some of the syntax in the code example above. I’m using the new Linq (Language Integrated Query) libraries provided in .NET 3.5. The i => i * i is called a lambda expression and is common in functional programming. It allows us to define a function with no name (anonymous) and easily pass the function to another function (Sum() in this case). It can be read as follows:

parameter(s) => function body.

Passing functions around like this is core to functional programming. Functions are first class entities and can be passed to other functions as parameters, returned from other functions, etc. Below is another example that displays just how much cleaner a functional syntax can be.

 public int[] GetOddNumbersImperative() {
     int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
     List<int> oddNumbers = new List<int>();
         for (int i = 0; i < numbers.Length; i++) {
             if ((numbers[i] % 2) != 0) {
                 oddNumbers.Add(numbers[i]);
             }
         }
         return oddNumbers.ToArray();
}

In this example, I’m using imperative programming to identify all of the odd numbers in the original list. I then place them into the new list and return it.  Functional programming can make this much easier.

 public int[] GetOddNumbersFunctional() {
     int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
     return numbers
            .Where(i => (i % 2) != 0)
            .Select(i => i).ToArray();
 }

Once again, telling the compiler what to do instead of how to do it results in much cleaner code, and state doesn’t have to be managed or changed (this is important).  As you can see, a functional style keeps us from writing code that is bug prone. I can’t tell you how many bugs I have needed to work through with code similar to the above imperative examples.

Why Should I Care About This Functional Stuff

Notice that the above examples are written in C#. I could have also written these examples in JavaScript, VB.NET, Ruby, or F#.  Are any of these languages purely functional? The answer is no. C#, JavaScript, VB.NET, and Ruby lean toward the object oriented and imperative side of programming, but they do have elements of functional programming. F# leans toward the functional side of programming, but it also allows one to write object oriented code. We are reaching a time where multi-paradigm languages are becoming more popular. The functional style of programming is just starting to leak into the mainstream development community.

Functional programming is leaking into languages like C# not only because functional programs can be cleaner and more succinct, but also because state is not modified.  This is particularly useful as we develop more programs that handle large data sets and need to be executed in parallel.  The details of processing data can be handled by the system, which may be able to split the processing of data on multiple CPU’s or servers. The programmer doesn’t have to tell the system how to do this. Since functional programming does not modify state, we don’t have to worry about concurrency issues and whether several threads are reading and writing the same memory location. If you’re a C# programmer, watch out for PLinq.

If you’re a C# programmer and starting to learn Linq, don’t just learn it with your blinders on. Start reading articles and books about functional programming…Linq has its roots in functional programming. Play around with Scheme or F#. You don’t need to become an expert, but learning the concepts of functional programming will make your C# or JavaScript better. Learn about lambdas, anonymous functions, higher order functions, currying, closures, etc. You’ll be a more well rounded programmer and you’ll have a head start on something that will become more prevalent over the next few years. Functional programming will continue to become more important. You need to become a multi-paradigm programmer if you want to stay relevant in the future.

It’s just a leak now, so start learning before you get behind. I’ll have more posts on this subject…stay tuned.


Getting the Lobotomy!

August 10, 2008

At some point in your career as a software developer, you may need to decide if you want to manage other people for a living.  It’s a difficult decision and should not be taken lightly.  You’ve always been one of the best developers on your team and you are a master in your technical arena.  Your boss and colleagues have always looked to you to solve the most difficult problems and complete the most important projects.  Maybe it’s time that you use those great technical skills to manage other software developers?  There’s only one problem…technical skill is not the most important attribute needed to be a successful manager.

A Typical Day in the Life of a Software Developer

  • Designed a new “WizBang” class
  • Implemented some of the stub methods in the “WizBang” class
  • Wrote some unit tests for “WizBang” and fully implemented most of the methods
  • Integrated “WizBang” with the “Super Duper” application, fixed some bugs
  • A bug was reported in some previous code I wrote, spent about 2 hours debugging and fixing
  • Checked some email
  • The build broke…spent some time helping to fix it.
  • A fellow developer asked me to perform a code review before a deployment, spent some time going over some possible refactorings
  • Read an article about dependency injection and how to use it to mock for unit testing
  • Met with my manager and talked a bit about the status of my current project

A Typical Day in the Life of a Software Development Manager

  • Read email and created follow-up tasks
  • Attended a management meeting and discussed resource allocation and project status
  • Worked on a annual review for “Ted the Programmer”
  • Met one on one with a couple of the developers to discuss project status, objectives, etc.
  • Reviewed and approved some deployments for later today
  • Had a phone interview with a potential candidate
  • Had multiple “ad-hoc” discussions with multiple team members (I keep an open office)
  • Helped “Janie the Programmer” prioritize conflicting release deadlines
  • Read some more email and updated my task list

The Skills Required Are Different

What do you notice about what the developer does every day compared to what the development manager does every day? You may answer – “Well, Fozz…the developer actually does real work!” You’re correct.

The developer spends the vast majority of her time on the actual code base. Doing the stuff she loves – writing and debugging code, solving technical problems, learning new design patterns, refactoring an existing class, writings tests, etc.  This is the stuff that developers love to do.

The manager, on the other hand, spends time doing everything except working directly on the code base. She coordinates resources, plans future releases, attends meetings, searches for strong candidates, communicates with staff on progress and road blocks, removes obstacles, etc.

Both jobs can be fun and challenging, but they require completely different skills and desires.  Just because you were always the best programmer does not mean you will be the best manager.  Also, just because you’re a great manager does not mean you could jump in and be the best programmer. It’s a different job, period.

When you’re a non manager, you’re typically judged by your individual contributions. For the most part, you can worry about yourself and make sure you are doing your part for the team.  When you’re a manager, you need to get work done through other people.  Often times, it’s difficult to to tie anything concrete to what a manager does on a daily basis (I’m sure my team often asks what the hell I do everyday). Ultimately, a managers performance is based on the performance of the entire team.

The Lobotomy

In software, moving from the purely technical realm to management is considered “getting the lobotomy”.  Trust me, your technical skills will get rusty when you’re not writing code everyday. You will not be as good at the nitty-gritty technical details as your team is. The longer you manage, the dumber you will become (from a nitty-gritty technical perspective).  Learning to realize this is essential to becoming a successful manager. You don’t need to be in the nitty-gritty details, anyway. Let your team handle the nitty-gritty (More on this in future posts)


Interviews Should Include Code Discussions

July 27, 2008

I’m amazed by the number of interviews I’ve been through, or heard about,  where real code wasn’t discussed, or code examples were not written.  The primary responsibility of a software developer is to write code. It’s not the only responsibility, but it’s really what you’re hiring a software developer to do on a daily basis.  As a manager, you have a very short period of time to select a candidate that you will work with for the next few years, if not more.  You’re taking a huge risk if you don’t ask the candidate about specific code examples, or have them write some code that can be discussed during an interview.

It’s possible that many hiring managers don’t use real code because they have no idea what to look for or what questions to ask. If this is the case, you need to have a guru on your team that can help you with this.  There are a lot of bad programmers out there. Many people can talk all day about technology, but write code like a 3 year old writes their name…it’s a cute effort, but it’s ugly.

An example that I’ve used in the past is not a question that I made up, but it’s a common question that can tell you a lot about a candidate. I’ll typically have the candidate turn in the code example before an in-person interview.  The question is as follows:

Write a function that takes an array of strings and returns an array of strings with no duplicates. Consider an algorithm that is most efficient from a “Big O” perspective. Treat the function as if you were writing it for production code.

A solution for this question can be written in a short amount of time (which is important for interview purposes), but it also tells a lot about how a candidate thinks and how they write code.

Below is an example solution I have seen from a candidate:

 public string[] RemoveDuplicates(string[] localArray) {
            //Internal arraylists used for processing the inputs
            ArrayList original = new ArrayList();
            ArrayList final = new ArrayList();
            int j = 0;

            try {
                //load the input array into an arraylist
                for (int i = 0; i <= localArray.GetUpperBound(0); i++) {
                    original.Add(localArray[i]);
                }

                //sort the array (Arraylist.Sort implements Quicksort)
                original.Sort();

                //Loop through the sorted arraylist and removes any duplicate strings, the
                //duplicate check is NOT case-sensitive
                while (j < original.Count) {
                    if (j == 0) {
                        //add the first element
                        final.Add(original[j]);
                    } else {
                        //check for duplicate and if not found add the element
                        if (String.Compare(original[j].ToString(), original[j - 1].ToString(), true) != 0) {
                            final.Add(original[j]);
                        }
                    }
                    j++;
                }
            } catch (Exception e) {
                System.Diagnostics.Debug.WriteLine(e.Message);
            }

            //return the final cleaned array
            return (string[])final.ToArray(typeof(string));

        }

OK…let’s talk about the positives. The solution will return the correct answer, but the correct answer is not what I’m looking for. If the function doesn’t return the correct result, the interview will be over pretty quick.  I’m looking for an answer that is not only correct, but is also well written.  Here are some things that stand out about this example:

  • The input array is immediately enumerated and placed into an ArrayList. When I asked this particular candidate why he did this, his answer was “I prefer ArrayLists over the standard Array.” OK…wrong answer.
  • The input parameter is not validated. Why don’t developers validate their input anymore?
  • He used the standard ArrayList object (this example was written in C# 2.0). This forced him into calling ToString on line 24. This tells me he probably doesn’t know a thing about generics (something most C# developers should know about by now).
  • A for loop would probably be tighter that using while.
  • What’s with the try/catch? Nothing of value is being done with the exception, let it run up the call stack.
  • The code just seems longer than it should. Let’s see another, more concise example..
  public string[] RemoveDuplicates(string[] input) {
            if (input == null) {
                throw new ArgumentNullException("input");
            }
            List<string> returnList = new List<string>();
            Array.Sort<string>(input);
            for (int index = 0; index < input.Length; index++) {
                if (index == 0) {
                    // Add the first element.
                    returnList.Add(input[index]);
                } else {
                    if (String.Compare(input[index], input[index - 1], true) != 0) {
                        returnList.Add(input[index]);
                    }
                }
            }
            return returnList.ToArray();
        }

This solution is much cleaner and tighter than the previous example. Parameters are checked, generics are used, the code is clean and tight.  Now, I would not eliminate the candidate who wrote the first example, but I would ask a bunch of questions about the code during the interview. The candidate may be able to explain their solution and may also be able to discuss alternatives…this would be a big plus for the candidate.

The second example leaves a much better first impression. It shows that they can write clear and concise code, but what about the “Consider an algorithm that is most efficient from a “Big O” perspective.” part of the question. With this solution the array must be sorted and enumerated.  Array.Sort uses the Quick Sort algorithm, which is O(n log n) on average. This is great for sorting, but you really don’t need to sort the array to begin with. Why not use a hash table?

        public string[] RemoveDuplicates(string[] input) {
            if (input == null) {
                throw new ArgumentNullException("input");
            }
            List<string> returnList = new List<string>();
            Dictionary<string, string> lookup = new Dictionary<string, string>();
            foreach (string stringItem in input) {
                string lowercaseStringItem = stringItem.ToLower();
                if (!lookup.ContainsKey(lowercaseStringItem)) {
                    returnList.Add(lowercaseStringItem);
                    lookup.Add(lowercaseStringItem, lowercaseStringItem);
                }
            }
            return returnList.ToArray();
        }

When a candidate provides this solution, it immediately puts a smile on my face. It’s actually a big jump for a candidate to go from the sort solution to the hash table solution. This solution is clean and has a Big O of about O(n)…much better than the example that sorts the array.  Of course, I would still like the candidate to know that the array solution is possible and that the hash table example potentially uses more memory because a hash table is created, etc.  That’s the beauty of having a candidate provide a code example…it generates a bunch of follow-up questions that will tell you a lot about the habits of the candidate, how they think, etc.

During a recent interview, one of my co-workers mentioned that .NET 3.5 actually has a new HashSet class.  I had forgotten about this, but it was a good point.  A candidate would really make me happy if they provided the following example:

         public string[] RemoveDuplicates(string[] input) {
            if (input == null) {
                throw new ArgumentNullException("input");
            }
            HashSet<string> stringSet = new HashSet<string>(input, StringComparer.OrdinalIgnoreCase);
            string[] returnList = new string[stringSet.Count];
            stringSet.CopyTo(returnList);
            return returnList;
        }

Wow. This code is extremely clean and simple…4 lines without the parameter validation. The original example had around 20 lines or so. It also shows that the candidate keeps up with new tools.

Now…some may think this is unnecessary and that most developers don’t need to analyze their algorithms so closely. I disagree, multiply the first example by the thousands of other functions that the candidate will write if they are hired. The best developers pay attention to this kind of detail and are always trying to write clear and concise code.  One function is not that bad, but thousands written without this attention to detail leads to applications that are difficult to support.

Something else to keep in mind is that I don’t dictate to the candidate what language to use when they write the code example. We currently use C#, but our interviews are not necessarily technology specific. Good programmers can master new languages quite easily.

Interviews should include discussions about real code that the candidate has written. It’s not the only thing you will use to evaluate a candidate, but it is one for your interviewing toolbox.