Equals and compareTo in Subclasses

2010-03-15

The other day I read this interesting paper on contructing a correct equals method when subclassing. It is about Java, but it applies equally well to C#. They cite Josh Bloch’s book Effective Java, who writes:

There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you are willing to forgo the benefits of object-oriented abstraction. I read this book a while back—maybe six or seven years ago now. At the time I thought it was invaluable. It seemed like the Java version of Expert C Programming: Deep C Secrets (the fish book). But when I think back on it now, all I can remember is the author’s ongoing struggle to overcome Java’s limitations and contradictions. It seemed he was constantly recommending more and more verbose code to get around problems in the underlying language. I guess that’s not Bloch’s fault, but Java just seems to be like that. It’s the reason my resume has a crowded half-page of Java TLAs.

Anyway, contrary to Bloch’s commonly-accepted denial, the authors of the paper present a way to write an equals method that preserves the contract of equals even when extending from another non-abstract class and adding more state. Their solution isn’t even that verbose or twisted. It’s worth a read. Basically they recommend this (some parts changed for brevity):

public class Point {
    public int x;
    public int y;

    public boolean equals(Object other) {
        boolean result = false;
        if (other instanceof Point) {
            Point that = (Point)other;
            result = that.canEqual(this) && this.x == that.x && this.y == that.y;
        }
        return result;
    }

    public boolean canEqual(Object other) {
        return (other instanceof Point);
    }

    public int hashCode() {
        return 41 * (41 + x) + y;
    }
}

public class ColoredPoint extends Point {
    public Color color;

    public boolean equals(Object other) {
        boolean result = false;
        if (other instanceof ColoredPoint) {
            ColoredPoint that = (ColoredPoint)other;
            result = that.canEqual(this) && color.equals(that.color) && super.equals(that);
        }
        return result;
    }

    public boolean canEqual(Object other) {
        return (other instanceof ColoredPoint);
    }

    public int hashCode() {
        return 41 * super.hashCode() + color.hashCode();
    }
}

The trick here is that canEquals method. It is not really a public method; it is only called from within the equals method. But note that an object doesn’t call it’s own canEquals method; it calls it on the other object. This lets the two objects agree that they are really equal, and it solves the problem of non-symmetric implementations of equals (where a.equals(b) != b.equals(a)). This is a common problem, because a Point might think it equals a ColoredPoint, whereas the ColoredPoint knows it doesn’t equal the Point.

The naive way of ensuring symmetry would be to replace instanceof with a comparison of the object’s actual class. But this is too crude, because it means you can’t use anonymous classes. For instance, a Point should still equal an anonymous class instance like this one:

Point pAnon = new Point() {
    public void overrideSomeMethod() {
        // ...
    }
}

With canEquals, the anonymous class simple inherits canEquals from Point, and the two objects will still agree on their equality. I think this is a really nice solution to a thorny problem.

The forum discussion about the paper (which is almost as good as the paper itself) argues that Java ought to support an Equalator interface as a parallel to Comparator<T>. The idea is that just as you can override the “natural ordering” of a class, you should be able to override its “natural equivalence.” This would let you instantiate a Set, HashMap, etc. with an Equalator to get a different notion of equals than usual. Just as objects may sort differently in different contexts, so they may “be equal” differently in different contexts, depending on what you care about. Who hasn’t run into the need for a Set based on reference identity, for example? Apache Collections provides just such a class.

The need for an Equalator seems most pressing in classes like TreeSet that use compareTo rather than equals to test for set duplicates. If you use a TreeSet with a Comparator that is not consistent with equals, that the TreeSet will appear to violate the set contract, because you could have a.equals(b) but s.contains(a) != s.contains(b). I went to bed thinking it’s a shame Sun hasn’t added this Equalator concept.

But as I was thinking about it over the night, I started to believe Sun is right to leave it out, at least in so far as it pertains to classes like TreeSet that use compareTo instead of equals. Basically, the TreeSet‘s Comparator is already operating as an Equalator here. Why do you need an Equalator, too? What problem would it solve that isn’t already solved by the Comparator? If you passed an Equalator to a TreeSet, it wouldn’t change this code problem: a.equals(b) && (s.contains(a) != s.contains(b)). The whole point of an Equalator is to impose a different notion of equality on a limited context, and with TreeSet a Comparator is sufficient for that.

Of course, that’s not to say an Equalator wouldn’t be useful in a regular Set or Map. It turns out that C# does have the Equalator idea, but it’s called IEqualityComparer<T>. It doesn’t seem to be used much, but Dictionary<K,V> and HashSet<T> both support it.

I actually came across this paper while thinking about how C#‘s CompareTo<T> can work in a class heirarchy. As in Java, this method must be “consistent with Equals.” That is, whenever Equals returns true, CompareTo must return 0, and whenever CompareTo returns 0, Equals must return true. Put into code, (a.CompareTo(b) == 0) == a.Equals(b). The CompareTo method is tricky because it’s parameterized: its signature is bool CompareTo(T o), where T comes from IComparable<T>.

So what happens if you have Base : IComparable<Base> and Subclass : Base, IComparable<Subclass>? My instinct is you’re asking for trouble, although when I think it through it seems that the compiler will choose the method based on the current static type, not the instance’s actual type, so maybe you’d be okay. If your code is interested in comparing Bases, you’ll call that method; you’ll only call CompareTo(Subclass o) if you’re explicitly comparing Subclasses. So maybe everything will work out, but I’m still uneasy.

I also see that C# 4.0 is supporting new keywords for co- and contra-variance in generic parameters. So we get IEnumerable<out T> and IComparable<in T>. This means that if you implement IEnumerator<Subclass> GetEnumerator, you also fulfill the contract for IEnumerable<Base>, and if you implement CompareTo(Base o), your subclass doesn’t have to implement CompareTo(Subclass o) in order to fulfill the contract for IComparable<Subclass>. I hope I’ve got that right!

The first part—covariant return types—seems like the bigger deal here. (I only wish it were full covariant return types as in C++!) But the part about IComparable seems nice, too. It should save a bit of code, because it means that if you have a full-featured base class and you want to write a quick subclass on top of it, you can still use your subclass in things that require an IComparable<Subclass> (like List<Subclass>) without writing another CompareTo implementation.

blog comments powered by Disqus Prev: A Linq Catalog Next: Viewing Unit Test Output in Visual Studio