Archive for the ‘java’ Category

Equals and compareTo in Subclasses

Monday, March 15th, 2010

The other day I read this interesting paper on contructing a correct equals method when subclassing. It is about Java, but it applies equally well to C#. They cite Josh Bloch’s book Effective Java, who writes:

There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you are willing to forgo the benefits of object-oriented abstraction.

I read this book a while back—maybe six or seven years ago now. At the time I thought it was invaluable. It seemed like the Java version of Expert C Programming: Deep C Secrets (the fish book). But when I think back on it now, all I can remember is the author’s ongoing struggle to overcome Java’s limitations and contradictions. It seemed he was constantly recommending more and more verbose code to get around problems in the underlying language. I guess that’s not Bloch’s fault, but Java just seems to be like that. It’s the reason my resume has a crowded half-page of Java TLAs.

Anyway, contrary to Bloch’s commonly-accepted denial, the authors of the paper present a way to write an equals method that preserves the contract of equals even when extending from another non-abstract class and adding more state. Their solution isn’t even that verbose or twisted. It’s worth a read. Basically they recommend this (some parts changed for brevity):


public class Point {
    public int x;
    public int y;

    public boolean equals(Object other) {
        boolean result = false;
        if (other instanceof Point) {
            Point that = (Point)other;
            result = that.canEqual(this) && this.x == that.x && this.y == that.y;
        }
        return result;
    }

    public boolean canEqual(Object other) {
        return (other instanceof Point);
    }

    public int hashCode() {
        return 41 * (41 + x) + y;
    }
}

public class ColoredPoint extends Point {
    public Color color;

    public boolean equals(Object other) {
        boolean result = false;
        if (other instanceof ColoredPoint) {
            ColoredPoint that = (ColoredPoint)other;
            result = that.canEqual(this) && color.equals(that.color) && super.equals(that);
        }
        return result;
    }

    public boolean canEqual(Object other) {
        return (other instanceof ColoredPoint);
    }

    public int hashCode() {
        return 41 * super.hashCode() + color.hashCode();
    }
}

The trick here is that canEquals method. It is not really a public method; it is only called from within the equals method. But note that an object doesn’t call it’s own canEquals method; it calls it on the other object. This lets the two objects agree that they are really equal, and it solves the problem of non-symmetric implementations of equals (where a.equals(b) != b.equals(a)). This is a common problem, because a Point might think it equals a ColoredPoint, whereas the ColoredPoint knows it doesn’t equal the Point.

The naive way of ensuring symmetry would be to replace instanceof with a comparison of the object’s actual class. But this is too crude, because it means you can’t use anonymous classes. For instance, a Point should still equal an anonymous class instance like this one:


Point pAnon = new Point() {
    public void overrideSomeMethod() {
        // ...
    }
}

With canEquals, the anonymous class simple inherits canEquals from Point, and the two objects will still agree on their equality. I think this is a really nice solution to a thorny problem.

The forum discussion about the paper (which is almost as good as the paper itself) argues that Java ought to support an Equalator interface as a parallel to Comparator<T>. The idea is that just as you can override the “natural ordering” of a class, you should be able to override its “natural equivalence.” This would let you instantiate a Set, HashMap, etc. with an Equalator to get a different notion of equals than usual. Just as objects may sort differently in different contexts, so they may “be equal” differently in different contexts, depending on what you care about. Who hasn’t run into the need for a Set based on reference identity, for example? Apache Collections provides just such a class.

The need for an Equalator seems most pressing in classes like TreeSet that use compareTo rather than equals to test for set duplicates. If you use a TreeSet with a Comparator that is not consistent with equals, that the TreeSet will appear to violate the set contract, because you could have a.equals(b) but s.contains(a) != s.contains(b). I went to bed thinking it’s a shame Sun hasn’t added this Equalator concept.

But as I was thinking about it over the night, I started to believe Sun is right to leave it out, at least in so far as it pertains to classes like TreeSet that use compareTo instead of equals. Basically, the TreeSet’s Comparator is already operating as an Equalator here. Why do you need an Equalator, too? What problem would it solve that isn’t already solved by the Comparator? If you passed an Equalator to a TreeSet, it wouldn’t change this code problem: a.equals(b) && (s.contains(a) != s.contains(b)). The whole point of an Equalator is to impose a different notion of equality on a limited context, and with TreeSet a Comparator is sufficient for that.

Of course, that’s not to say an Equalator wouldn’t be useful in a regular Set or Map. It turns out that C# does have the Equalator idea, but it’s called IEqualityComparer<T>. It doesn’t seem to be used much, but Dictionary<K,V> and HashSet<T> both support it.

I actually came across this paper while thinking about how C#’s CompareTo<T> can work in a class heirarchy. As in Java, this method must be “consistent with Equals.” That is, whenever Equals returns true, CompareTo must return 0, and whenever CompareTo returns 0, Equals must return true. Put into code, (a.CompareTo(b) == 0) == a.Equals(b). The CompareTo method is tricky because it’s parameterized: its signature is bool CompareTo(T o), where T comes from IComparable<T>.

So what happens if you have Base : IComparable<Base> and Subclass : Base, IComparable<Subclass>? My instinct is you’re asking for trouble, although when I think it through it seems that the compiler will choose the method based on the current static type, not the instance’s actual type, so maybe you’d be okay. If your code is interested in comparing Bases, you’ll call that method; you’ll only call CompareTo(Subclass o) if you’re explicitly comparing Subclasses. So maybe everything will work out, but I’m still uneasy.

I also see that C# 4.0 is supporting new keywords for co- and contra-variance in generic parameters. So we get IEnumerable<out T> and IComparable<in T>. This means that if you implement IEnumerator<Subclass> GetEnumerator, you also fulfill the contract for IEnumerable<Base>, and if you implement CompareTo(Base o), your subclass doesn’t have to implement CompareTo(Subclass o) in order to fulfill the contract for IComparable<Subclass>. I hope I’ve got that right!

The first part—covariant return types—seems like the bigger deal here. (I only wish it were full covariant return types as in C++!) But the part about IComparable seems nice, too. It should save a bit of code, because it means that if you have a full-featured base class and you want to write a quick subclass on top of it, you can still use your subclass in things that require an IComparable<Subclass> (like List<Subclass>) without writing another CompareTo implementation.

Perseus Installation Directions

Friday, June 5th, 2009

I managed to get Perseus up and running on my Macbook Pro with OS X 10.4. Everything appears to work except dictionary and morphology lookups. If I get those working, I’ll post how here. In the meantime, I thought I’d post my process for getting the rest of it working, especially since other people seem to have encountered so many problems. For the most part, I followed the instructions in the INSTALLWITHDATA.html file. Any divergences from that file are noted below.

For clarity, I begin all Perseus paths and filenames with the sgml directory. So of course they are relative to wherever you put that directory. In my case, it is at /Users/paul/local/perseus/sgml. Relative paths for MySQL, Apache, and Tomcat are relative to their own installation directories, in my case /usr/local/mysql, /www, and ~/local/java/tomcat. All ant scripts are run from Perseus’ sgml/reading directory.

Installing MySQL

I put MySQL in /usr/local/mysql. This was complicated for me because I had to upgrade from MySQL 4 while supporting an existing installation of MediaWiki, but it should be easy enough for others. I just moved my old installation aside, installed MySQL 5, and then copied my data directory into the new location. Since there is no my.cnf file initially, I used support-files/my-medium.cnf as a starting point.

I’m not sure why Perseus wants this line in the config file about InnoDB tables, because all the tables created in the sql scripts are MyISAM:

innodb_data_file_path = ibdata1:500M


Oh well. It was an inconvenience for me, because I already had an ibdata1 file for MediaWiki’s InnoDB tables, and since it wasn’t 500M, MySQL refused to engage the InnoDB engine upon startup. When I hit MediaWiki, I get errors about the InnoDB engine not being found. You can tell if you have this problem by checking the .err file in your data directory. You’ll see lines like this:

InnoDB: Error: data file /usr/local/mysql/data/ibdata1 is of a different size
InnoDB: 1152 pages (rounded down to MB)
InnoDB: than specified in the .cnf file 32000 pages!
InnoDB: Could not open or create data files.
InnoDB: If you tried to add new data files, and it failed here,
InnoDB: you should now edit innodb_data_file_path in my.cnf back
InnoDB: to what it was, and remove the new ibdata files InnoDB created
InnoDB: in this failed attempt. InnoDB only wrote those files full of
InnoDB: zeros, but did not yet use them in any way. But be careful: do not
InnoDB: remove old data files which contain your precious data!
090528 17:27:46 [ERROR] Plugin 'InnoDB' init function returned error.
090528 17:27:46 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.


I fixed this by calculating that there are 64 pages per MB, so I changed the config line to this:

innodb_data_file_path = ibdata1:18M:autoextend


But I guess this tangential to Perseus. If you aren’t running any InnoDB tables in prior databases, the standard config line should be harmless.

By default, Perseus uses to make your database username/password be webuser/webuser. That is a rather uninformative username, especially if your database hosts several apps. Anything with a web interface could potentially be webuser; it’s better to use descriptive names like addressbook and budget and library. So I changed the Perseus database user to perseus. According to the installation directions, this is okay as long as you put the correct values in properties/hopper.properties, altering these lines:

hopper.database.username=webuser
hopper.database.password=webuser


This is not quite true. You do have to change those lines, but you should also change all instances of webuser in these files, too:

jsp/META-INF/context.xml
src/perseus/util/HibernateUtil.java


I suspect the latter may not actually be necessary, but I did it anyway just to be safe.

I loaded the SQL files per the README instructions. There are a whole lot of SQL files to download for this step. I couldn’t find a single .tar or file with them all together. I don't know why this is missing.

Installing Apache

I couldn't get Apache 2.2.9, 10, or 11 to compile with shared libraries. You can see my bug report here. Apache 2.2.8 worked, so I used that. My configure line was:

./configure --prefix=/www --enable-proxy=shared --enable-rewrite=shared --enable-ssl --enable-dav


Before running configure, I also made some source code changes to srclib/apr/network_io/unix/sendrecv.c, as described here (Macs only). I'm not sure if these are really necessary, but I did it just to be safe.

My Perseus installation is private, just running on my laptop, but if you're planning to make yours public, you may want to work harder at getting 2.2.11 to work, so you have the latest security updates. I also built and installed PHP 5.2.6, again for non-Perseus use.

The directions from Perseus seem to assume that Apache serves nothing but Perseus. This is not my case, so I had to do things a little differently. Basically, Apache needs a cue somewhere within the URL indicating when a request belongs to Perseus. You can place this in the hostname portion using name-based virtual hosts (recommended), or you can place it in the directory portion using Apache's Alias command. I went the former route.

First, I added perseus as an alternate hostname for my machine, by editing the /etc/hosts file, like so:

127.0.0.1	localhost perseus


This will let me type just perseus into my browser, and I'll go straight to my local installation. A shortcut like this is really only suitable for private installations like my own. To get a similar effect on a LAN, you should do it with DNS records. Or you could forget this syntactic sugar altogether, use your real hostname, and serve perseus with some directory prefix, like http://yourhost/hopper/. I won't go into the details of this approach, but it's not hard using the Alias command.

Once I had the perseus hostname pointing at my laptop, I told Apache to use perseus as a named-based virtual host, by adding this section to the conf/extra/httpd-vhosts.conf file:

<VirtualHost *:80>
  ServerName perseus
  DocumentRoot "/Users/paul/local/perseus/sgml/reading/static"
</VirtualHost>


Note that I'm using sgml/reading/static as the DocumentRoot. This lets me ignore the part in the Perseus instructions about creating symbolic links to the {css,img,js,xml} directories. This is perfectly safe, as there are no other files in sgml/reading/static. (You should make sure that all the parent directories are writable by you alone, however.)

You also need to configure Apache so it permits access to the Perseus directory, so in conf/httpd.conf you'll need something like this:

<Directory "/Users/paul/local/perseus/sgml/reading/static">
  Options FollowSymLinks
  Order allow,deny
  Allow from all
</Directory>

At this point, I recommend testing your configuration before adding Tomcat to the mix. Add a one-line index.html file to sgml/reading/static. Let it just say testing or the like. Restart Apache and try to reach your index.html file. In my case, that meant visiting http://perseus/index.html. If that doesn't work, then something above isn't right.

Perseus wants you to use conf/localhost.conf as an additional Apache configuration file, but that didn't work for me. If you also want to tweak the settings or do something different, it might help to know just what localhost.conf is doing. Here is an explanation of its rewrite rules. I assume some basic knowledge of regular expression syntax. Naturally, the first part of the rule is what to match; the second part, what to change it to. The letters in brackets at the end are flags. Full documentation is available here.

The first rule is this:

RewriteRule		^/hopper$		http://localhost/hopper/		[R]


This means that if you forget the trailing slash on /hopper, then Apache tells your browser to try again with it added onto the end. The [R] means "redirect." This is a pretty trivial rule.

The next two rules read:

RewriteRule		^/hopper/home$		http://localhost/hopper/		[P,L]
RewriteRule		^/hopper/home/$		http://localhost/hopper/		[P,L]


The [P,L] stands for "proxy and last." The P means that if the rule matches, the request should be proxied (in our case, proxied to Tomcat). The L means to stop applying rewrite rules. (It is actually redundant, because a P flag always terminates rewriting.) These rules apply if the user asks for /hopper/home or /hopper/home/. These are both weird special cases, which make /hopper/home/ synonymous with simply /hopper/. I assume they are meant to maintain backwards compatibility with older links that erroneously point to /hopper/home/.

The fourth rule says:

RewriteRule		^/hopper/opensource/(.*)	http://www.perseus.tufts.edu/hopper/opensource/$1	[P,L]


This is also a special case. If a user visits the opensource part of the site, he should go to the real Perseus site hosted by Tufts, not yours.

The last rule is the most general. It catches everything not covered by the above special cases:

RewriteRule		^/hopper/(.*)		http://localhost:8080/hopper/$1		[P,L]


This means that any requests beginning with /hopper/ should be proxied to Tomcat (at port 8080).

My rules are similar, but with a few minor tweaks. I added them to the VirtualHost section of Apache's conf/extra/httpd-vhosts.conf file. I also gave a two-line version of this section above. Now it expands to something larger:

<VirtualHost *:80>
    ServerName perseus
    ServerAlias perseus
    DocumentRoot "/Users/paul/local/perseus/sgml/reading/static"

    <IfModule mod_rewrite.c>
        RewriteEngine On

        RewriteRule     ^/$             /hopper/    [R]
        RewriteRule     ^/hopper        /hopper/    [R]
        RewriteRule     ^/hopper/home$  /hopper/    [P,L]
        RewriteRule     ^/hopper/home/$ /hopper/    [P,L]
        RewriteRule     ^/hopper/opensource/(.*)    http://www.perseus.tufts.edu/hopper/opensource/$1   [P,L]
        RewriteRule     ^/hopper/(.*)   http://localhost:8080/hopper/$1  [P,L]
    </IfModule>

    <IfModule mod_proxy.c>
        <Proxy *>
          Order deny,allow
          Allow from all
        </Proxy>
        ProxyRequests Off

        ProxyPass           /hopper    http://localhost:8080/hopper
        ProxyPassReverse    /hopper    http://localhost:8080/hopper
    </IfModule>
</VirtualHost>

Installing Tomcat

I unpacked Tomcat at ~/local/java/tomcat. This is not where Perseus expects it. If you look at sgml/properties/hopper.properties, near the bottom you'll find a line that reads:

tomcat.home=/usr/local/tomcat


I changed this to:

tomcat.home=/Users/paul/local/java/tomcat


Actually, tomcat is a symlink to the directory I got when I unpacked the Tomcat tarball:

$ ls -ld /Users/paul/local/java/*tomcat*
drwxr-xr-x   15 paul  paul  510 Dec 14  2007 /Users/paul/local/java/apache-tomcat-5.5.25
lrwxr-xr-x    1 paul  paul   20 May 31 17:41 /Users/paul/local/java/tomcat -> apache-tomcat-5.5.25

You probably need to adjust tomcat's memory settings by editing bin/startup.sh, or you'll eventually get OutOfMemoryErrors in your log. But I haven't really looked into this yet.

You also need to set up Tomcat's manager webapp. This is how Perseus gets installed when you type ant install. You can read about the Tomcat manager app here.

By default, the manager app is deactivated, to prevent unauthorized people from controlling Tomcat. To turn it on, you must follow the instructions at the link above. Part of the process is choosing a username and password. You'll have to give this same username and password to the Perseus build process, so it can upload your webapp. You can set those values in this file:

sgml/properties/hosts/localhost.properties


You just need to change the lines here:

localhost.tomcat.manager.username=username
localhost.tomcat.manager.password=password


You can see if your manager is running by going here:

http://localhost:8080/manager/html


After you enter the username and password, you should see a page titled "Tomcat Web Application Manager." You don't need to do anything there; the ant script will do it all for you. Visiting the page is just to test that it's up and running.

Note that once you have installed the webapp once, you should no longer use ant install. If for whatever reason you want to re-deploy the webapp, you must instead use ant remove followed by ant install. ant remove install may also work, but I'm not sure it will always give Tomcat enough time to clear out the old version before adding the new. From what I can tell, ant reload does not work. It reloads the existing webapp, but new changes go unnoticed. Perhaps this is by design. In any case, it's an issue with the Tomcat tools, not Perseus.

At this point everything should be ready for building and deploying Perseus! You do these using these steps (as in the README file):

ant dist jsp
ant build-release
ant install

If something is wrong, you can start to isolate the problem by hitting Tomcat directly, rather than via Apache. Go to this URL:

http://localhost:8080/hopper/


If you see a Perseus page (with no images or formatting), then Tomcat is handling things properly, and the issue probably lies with your Apache configuration. If you get some kind of error message, then the problem is with Tomcat. Check catalina.out in its logs directory. If you don't see anything there, then probably your webapp was never even deployed, due either to a build failure or not connecting to the manager app.

I hope this helps!