Schwartzian Transform in C# 3.0

23Aug07

(Update 8/25: Changed the example scenario slightly.)

Inspired by Beautiful Code, which I just started reading yesterday, I’ve decided to post some beautiful code of my own. Specifically, I’d like to show off some of the fancy features of C# 3.0, and explain why they’re good for much more than just LINQ.

What’s a Schwartzian Transform? It’s a way of efficiently sorting a list of objects according to some potentially expensive-to-calculate property of those objects. For example, let’s say you have an array of filenames, and you want to sort them by their last modified dates.

First, let’s provide the function that gets the date. I’ll use this throughout the rest of the post.

    public static DateTime GetDate(string file)
    {
        return File.GetLastWriteTimeUtc(file);
    }

And here’s a naive sort, without doing a Schwartzian Transform.

    public static void Sort1(List<string> files)
    {
        files.Sort(new ByDateComparer());
    }

    private class ByDateComparer : IComparer<string>
    {
        public int Compare(string x, string y)
        {
            return GetDate(x).CompareTo(GetDate(y));
        }
    }

OK, we can do better than that. Let’s at least use C# 3.0’s new closure syntax, which we’re calling lambda expressions.

    public static void Sort2(List<string> files)
    {
        files.Sort((x, y) => GetDate(x).CompareTo(GetDate(y)));
    }

In either case, GetDate is getting executed twice per comparison. Assuming that List.Sort is a quicksort, that’s O(n log n) times. And it’s pretty unnecessary, if you think about it; why repeatedly fetch the date if the values are never going to change?

The idea behind the Schwartzian Transform is to avoid all these excessive executions of GetDate using the following steps.

  1. Create a new array that’s the same length as the original array.
  2. Stuff both the value and the computed property (i.e., the filename and the DateTime) into the new array (using a 2-tuple, pair, etc.).
  3. Sort the new array using the computed property.
  4. Re-populate the old array using the values from the new array.

And since this is C#, we would ideally like to do this in a fully typesafe manner–no casts. Here’s another strawman implementation:

    public static void Sort3(List<string> files)
    {
        List<KeyValuePair<DateTime, string>> pairs;
        pairs = files.ConvertAll(
            file => new KeyValuePair<DateTime, string>(

                GetDate(file), file));
        pairs.Sort((x, y) => x.Key.CompareTo(y.Key));
        for (int i = 0; i < files.Count; i++)
            files[i] = pairs[i].Value;
    }

If computing the property is expensive, this is a lot more efficient. But, wow, is it ugly! Compare it to Sort2 above, which gets straight to the point. Fortunately, we can write a higher-order function to put the ugliness into an encapsulated, reusable package.

    public static void SchwartzSort<TElement, TSortBy>(
        List<TElement> list,
        Converter<TElement, TSortBy> converter,
        Comparison<TSortBy> comparison)
    {
        List<KeyValuePair<TSortBy, TElement>> pairs;
        pairs = list.ConvertAll(
            x => new KeyValuePair<TSortBy, TElement>(converter(x), x));
        pairs.Sort((x, y) => comparison(x.Key, y.Key));
        for (int i = 0; i < list.Count; i++)
            list[i] = pairs[i].Value;
    }

    public static void Sort4A(List<string> files)
    {
        SchwartzSort(files,
            x => GetDate(x),
            (a, b) => a.CompareTo(b));
    }

SchwartzSort can take a list of any type, perform any conversion, and do any comparison between the converted values, using the efficient Schwartzian Transform algorithm. But three things stop this code from being as beautiful, especially compared to the Ruby version: files.sort_by {|x| GetDate(x)}

  1. SchwartzSort is a separate method–not very object-oriented
  2. Have to pass in a comparison operation, even when there is a “natural” sort order to the elements, as in this case
  3. Lots of ugly angle brackets and type info in the SchwartzSort implementation

We can fix #1 by converting SchwartzSort to an extension method on List called SortBy. #2 is easy–just add another override that doesn’t take the comparison argument. #3 can be greatly improved using type inference and anonymous types, both new to C# 3.0.

    public static void SortBy<TElement, TSortBy>(
        this List<TElement> coll,
        Converter<TElement, TSortBy> converter,
        Comparison<TSortBy> comparison)
    {
        var temp = coll.ConvertAll(
            el => new { Key = converter(el), Value = el });
        temp.Sort((a, b) => comparison(a.Key, b.Key));
        for (int i = 0; i < coll.Count; i++)
            coll[i] = temp[i].Value;
    }

    public static void SortBy<TElement, TSortBy>(
        this List<TElement> coll,
        Converter<TElement, TSortBy> converter)
        where TSortBy : IComparable
    {
        SortBy(coll, converter, (a, b) => a.CompareTo(b));
    }

    public static void Sort5(List<string> files)
    {
        files.SortBy(x => GetDate(x));
    }

Sort5 is actually more beautiful than the Ruby equivalent! Not only is it slightly shorter in terms of number of characters, but it does more because it’s fully typesafe. If files was List<int>, this wouldn’t compile. And if GetDate(x) was replaced with an expression whose type doesn’t have a natural sort order (i.e. it doesn’t implement IComparable), that would be a compile error too! (However, you could then use the other override and pass in an explicit comparison operation.)

The SortBy implementation is pretty free of visual clutter–now the worst offender is the for-loop, and that’s only necessary because we’re preserving the in-place semantics of the standard List.Sort operator. If Sort returned a new instance and we wanted SortBy to do the same, the implementation could be pretty beautiful too:

    public static List<TElement> SortBy2<TElement, TSortBy>(
        this List<TElement> coll,
        Converter<TElement, TSortBy> converter,
        Comparison<TSortBy> comparison)
    {
        return coll
            .ConvertAll(el => new { Key = converter(el), Value = el })
            .Sort((a, b) => comparison(a.Key, b.Key))
            .ConvertAll(x => x.Value);
    }

Still fully typesafe, and impressively free of cruft–almost every token has a clear semantic purpose. Now that’s my definition of beauty!

About these ads


13 Responses to “Schwartzian Transform in C# 3.0”

  1. The example is a bit unfortunate, in that it uses a sane date format to begin with: one that can be sorted as a string and give the right order.

    Luckily there are enough insane date formats to choose from that will illustrate the point better, for example a mail header: “Date: Sat, 25 Aug 2007 15:10:39 +0200″

  2. Thanks for pointing that out, Bart. It was a stupid example anyway. I’ve changed it to be /slightly/ less stupid.

  3. Wow. Interesting to see how different it looks in C#. Of course, I think the original in Perl still looks the most elegant. :)

  4. With query syntax, the C# version can look closer to the Perl version (structurally, anyway). A LINQ query can perform the transform, but I don’t think it can sort in place. See http://eltonomicon.blogspot.com/2007/10/another-schwartzian-transform-using.html for an example.

  5. 5 Mark Rockmann

    Actually, the SortBy operator (and its relatives) as implemented in the current framework perform a Schwartzian Sort. No key is computed more than once for each element. This fact is not documented, but I was able to verify it experimentally and by using Reflector. I was positively surprised to find another thing about LINQ’s sort algorithm: it is stable. This fact, however, is documented.

  6. Thank you for this blog post. I was looking for exactly this topic.

    I am not familiar with the new features of C# 3.0 and could use your help on compiling the last method. The compiler error message is “Operator ‘.’ cannot be applied to operand of type ‘void’ “. The compiler seems to interpret the last ConvertAll to operate on the void returned by the Sort call.

    static class BlogCode
    {
    public static List SortBy2(
    this List coll,
    Converter converter,
    Comparison comparison)
    {
    return coll
    .ConvertAll(el => new { Key = converter(el), Value = el })
    .Sort((a, b) => comparison(a.Key, b.Key))
    .ConvertAll(x => x.Value);
    }
    }

    • 7 Jonathan Lim

      This is an attempt to get the sample code to look better.

      static class BlogCode
      {
      public static List SortBy2(
      this List coll,
      Converter converter,
      Comparison comparison)
      {
      return coll
      .ConvertAll(el => new { Key = converter(el), Value = el })
      .Sort((a, b) => comparison(a.Key, b.Key))
      .ConvertAll(x => x.Value);
      }
      }

      • 8 Jonathan Lim

        Another attempt to try to get the code to look better.

            static class BlogCode
            {
                public static List SortBy2(
                      this List coll,
                      Converter converter,
                      Comparison comparison)
                {
                    return coll
                        .ConvertAll(el => new { Key = converter(el), Value = el })
                        .Sort((a, b) => comparison(a.Key, b.Key))
                        .ConvertAll(x => x.Value);
                }
            }
        

  1. 1 Scott Hanselman's Computer Zen - The Weekly Source Code 7
  2. 2 Scott Hanselman's Computer Zen - The Weekly Source Code 7
  3. 3 22 Links Today (2007-10-12)
  4. 4 Schwartzian Transform in C# 3.0, Part II « whateverblog.
  5. 5 Charlie Calvert's Community Blog : Community Convergence XXXIII

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: