Perseus Installation Directions

2009-06-05

I managed to get Perseus up and running on my Macbook Pro with OS X 10.4. Everything appears to work except dictionary and morphology lookups. If I get those working, I’ll post how here. In the meantime, I thought I’d post my process for getting the rest of it working, especially since other people seem to have encountered so many problems. For the most part, I followed the instructions in the INSTALLWITHDATA.html file. Any divergences from that file are noted below.

For clarity, I begin all Perseus paths and filenames with the sgml directory. So of course they are relative to wherever you put that directory. In my case, it is at /Users/paul/local/perseus/sgml. Relative paths for MySQL, Apache, and Tomcat are relative to their own installation directories, in my case /usr/local/mysql, /www, and ~/local/java/tomcat. All ant scripts are run from Perseus’ sgml/reading directory.

Installing MySQL

I put MySQL in /usr/local/mysql. This was complicated for me because I had to upgrade from MySQL 4 while supporting an existing installation of MediaWiki, but it should be easy enough for others. I just moved my old installation aside, installed MySQL 5, and then copied my data directory into the new location. Since there is no my.cnf file initially, I used support-files/my-medium.cnf as a starting point.

I’m not sure why Perseus wants this line in the config file about InnoDB tables, because all the tables created in the sql scripts are MyISAM:

innodb_data_file_path = ibdata1:500M

Oh well. It was an inconvenience for me, because I already had an ibdata1 file for MediaWiki’s InnoDB tables, and since it wasn’t 500M, MySQL refused to engage the InnoDB engine upon startup. When I hit MediaWiki, I get errors about the InnoDB engine not being found. You can tell if you have this problem by checking the .err file in your data directory. You’ll see lines like this:

InnoDB: Error: data file /usr/local/mysql/data/ibdata1 is of a different size
InnoDB: 1152 pages (rounded down to MB)
InnoDB: than specified in the .cnf file 32000 pages!
InnoDB: Could not open or create data files.
InnoDB: If you tried to add new data files, and it failed here,
InnoDB: you should now edit innodb_data_file_path in my.cnf back
InnoDB: to what it was, and remove the new ibdata files InnoDB created
InnoDB: in this failed attempt. InnoDB only wrote those files full of
InnoDB: zeros, but did not yet use them in any way. But be careful: do not
InnoDB: remove old data files which contain your precious data!
090528 17:27:46 [ERROR] Plugin 'InnoDB' init function returned error.
090528 17:27:46 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

I fixed this by calculating that there are 64 pages per MB, so I changed the config line to this:

innodb_data_file_path = ibdata1:18M:autoextend

But I guess this tangential to Perseus. If you aren’t running any InnoDB tables in prior databases, the standard config line should be harmless.

By default, Perseus uses to make your database username/password be webuser/webuser. That is a rather uninformative username, especially if your database hosts several apps. Anything with a web interface could potentially be webuser; it’s better to use descriptive names like addressbook and budget and library. So I changed the Perseus database user to perseus. According to the installation directions, this is okay as long as you put the correct values in properties/hopper.properties, altering these lines:

hopper.database.username=webuser
hopper.database.password=webuser

This is not quite true. You do have to change those lines, but you should also change all instances of webuser in these files, too:

jsp/META-INF/context.xml
src/perseus/util/HibernateUtil.java

I suspect the latter may not actually be necessary, but I did it anyway just to be safe.

I loaded the SQL files per the README instructions. There are a whole lot of SQL files to download for this step. I couldn’t find a single .tar or zip file with them all together. I don’t know why this is missing.

Installing Apache

I couldn’t get Apache 2.2.9, 10, or 11 to compile with shared libraries. You can see my bug report here. Apache 2.2.8 worked, so I used that. My configure line was:

./configure --prefix=/www --enable-proxy=shared --enable-rewrite=shared --enable-ssl --enable-dav

Before running configure, I also made some source code changes to srclib/apr/network_io/unix/sendrecv.c, as described here (Macs only). I’m not sure if these are really necessary, but I did it just to be safe.

My Perseus installation is private, just running on my laptop, but if you’re planning to make yours public, you may want to work harder at getting 2.2.11 to work, so you have the latest security updates. I also built and installed PHP 5.2.6, again for non-Perseus use.

The directions from Perseus seem to assume that Apache serves nothing but Perseus. This is not my case, so I had to do things a little differently. Basically, Apache needs a cue somewhere within the URL indicating when a request belongs to Perseus. You can place this in the hostname portion using name-based virtual hosts (recommended), or you can place it in the directory portion using Apache’s Alias command. I went the former route.

First, I added perseus as an alternate hostname for my machine, by editing the /etc/hosts file, like so:

127.0.0.1	localhost perseus

This will let me type just perseus into my browser, and I’ll go straight to my local installation. A shortcut like this is really only suitable for private installations like my own. To get a similar effect on a LAN, you should do it with DNS records. Or you could forget this syntactic sugar altogether, use your real hostname, and serve perseus with some directory prefix, like http://yourhost/hopper/. I won’t go into the details of this approach, but it’s not hard using the Alias command.

Once I had the perseus hostname pointing at my laptop, I told Apache to use perseus as a named-based virtual host, by adding this section to the conf/extra/httpd-vhosts.conf file:

<VirtualHost *:80>
  ServerName perseus
  DocumentRoot "/Users/paul/local/perseus/sgml/reading/static"
</VirtualHost>

Note that I’m using sgml/reading/static as the DocumentRoot. This lets me ignore the part in the Perseus instructions about creating symbolic links to the {css,img,js,xml} directories. This is perfectly safe, as there are no other files in sgml/reading/static. (You should make sure that all the parent directories are writable by you alone, however.)

You also need to configure Apache so it permits access to the Perseus directory, so in conf/httpd.conf you’ll need something like this:

<Directory "/Users/paul/local/perseus/sgml/reading/static">
  Options FollowSymLinks
  Order allow,deny
  Allow from all
</Directory>

At this point, I recommend testing your configuration before adding Tomcat to the mix. Add a one-line index.html file to sgml/reading/static. Let it just say testing or the like. Restart Apache and try to reach your index.html file. In my case, that meant visiting http://perseus/index.html. If that doesn’t work, then something above isn’t right.

Perseus wants you to use conf/localhost.conf as an additional Apache configuration file, but that didn’t work for me. If you also want to tweak the settings or do something different, it might help to know just what localhost.conf is doing. Here is an explanation of its rewrite rules. I assume some basic knowledge of regular expression syntax. Naturally, the first part of the rule is what to match; the second part, what to change it to. The letters in brackets at the end are flags. Full documentation is available here.

The first rule is this:

RewriteRule		^/hopper$		http://localhost/hopper/		[R]

This means that if you forget the trailing slash on /hopper, then Apache tells your browser to try again with it added onto the end. The [R] means “redirect.” This is a pretty trivial rule.

The next two rules read:

RewriteRule		^/hopper/home$		http://localhost/hopper/		[P,L]
RewriteRule		^/hopper/home/$		http://localhost/hopper/		[P,L]

The [P,L] stands for “proxy and last.” The P means that if the rule matches, the request should be proxied (in our case, proxied to Tomcat). The L means to stop applying rewrite rules. (It is actually redundant, because a P flag always terminates rewriting.) These rules apply if the user asks for /hopper/home or /hopper/home/. These are both weird special cases, which make /hopper/home/ synonymous with simply /hopper/. I assume they are meant to maintain backwards compatibility with older links that erroneously point to /hopper/home/.

The fourth rule says:

RewriteRule		^/hopper/opensource/(.*)	http://www.perseus.tufts.edu/hopper/opensource/$1	[P,L]

This is also a special case. If a user visits the opensource part of the site, he should go to the real Perseus site hosted by Tufts, not yours.

The last rule is the most general. It catches everything not covered by the above special cases:

RewriteRule		^/hopper/(.*)		http://localhost:8080/hopper/$1		[P,L]

This means that any requests beginning with /hopper/ should be proxied to Tomcat (at port 8080).

My rules are similar, but with a few minor tweaks. I added them to the VirtualHost section of Apache’s conf/extra/httpd-vhosts.conf file. I also gave a two-line version of this section above. Now it expands to something larger:

<VirtualHost *:80>
    ServerName perseus
    ServerAlias perseus
    DocumentRoot "/Users/paul/local/perseus/sgml/reading/static"

    <IfModule mod_rewrite.c>
        RewriteEngine On

        RewriteRule     ^/$             /hopper/    [R]
        RewriteRule     ^/hopper        /hopper/    [R]
        RewriteRule     ^/hopper/home$  /hopper/    [P,L]
        RewriteRule     ^/hopper/home/$ /hopper/    [P,L]
        RewriteRule     ^/hopper/opensource/(.*)    http://www.perseus.tufts.edu/hopper/opensource/$1   [P,L]
        RewriteRule     ^/hopper/(.*)   http://localhost:8080/hopper/$1  [P,L]
    </IfModule>

    <IfModule mod_proxy.c>
        <Proxy *>
          Order deny,allow
          Allow from all
        </Proxy>
        ProxyRequests Off

        ProxyPass           /hopper    http://localhost:8080/hopper
        ProxyPassReverse    /hopper    http://localhost:8080/hopper
    </IfModule>
</VirtualHost>

Installing Tomcat

I unpacked Tomcat at ~/local/java/tomcat. This is not where Perseus expects it. If you look at sgml/properties/hopper.properties, near the bottom you’ll find a line that reads: tomcat.home=/usr/local/tomcat

I changed this to: tomcat.home=/Users/paul/local/java/tomcat

Actually, tomcat is a symlink to the directory I got when I unpacked the Tomcat tarball: $ ls -ld /Users/paul/local/java/tomcat drwxr-xr-x 15 paul paul 510 Dec 14 2007 /Users/paul/local/java/apache-tomcat-5.5.25 lrwxr-xr-x 1 paul paul 20 May 31 17:41 /Users/paul/local/java/tomcat -> apache-tomcat-5.5.25

You probably need to adjust tomcat’s memory settings by editing bin/startup.sh, or you’ll eventually get OutOfMemoryErrors in your log. But I haven’t really looked into this yet.

You also need to set up Tomcat’s manager webapp. This is how Perseus gets installed when you type ant install. You can read about the Tomcat manager app here.

By default, the manager app is deactivated, to prevent unauthorized people from controlling Tomcat. To turn it on, you must follow the instructions at the link above. Part of the process is choosing a username and password. You’ll have to give this same username and password to the Perseus build process, so it can upload your webapp. You can set those values in this file: sgml/properties/hosts/localhost.properties

You just need to change the lines here: localhost.tomcat.manager.username=username localhost.tomcat.manager.password=password

You can see if your manager is running by going here: http://localhost:8080/manager/html

After you enter the username and password, you should see a page titled “Tomcat Web Application Manager.” You don’t need to do anything there; the ant script will do it all for you. Visiting the page is just to test that it’s up and running.

Note that once you have installed the webapp once, you should no longer use ant install. If for whatever reason you want to re-deploy the webapp, you must instead use ant remove followed by ant install. ant remove install may also work, but I’m not sure it will always give Tomcat enough time to clear out the old version before adding the new. From what I can tell, ant reload does not work. It reloads the existing webapp, but new changes go unnoticed. Perhaps this is by design. In any case, it’s an issue with the Tomcat tools, not Perseus.

At this point everything should be ready for building and deploying Perseus! You do these using these steps (as in the README file): ant dist jsp ant build-release ant install

If something is wrong, you can start to isolate the problem by hitting Tomcat directly, rather than via Apache. Go to this URL: http://localhost:8080/hopper/

If you see a Perseus page (with no images or formatting), then Tomcat is handling things properly, and the issue probably lies with your Apache configuration. If you get some kind of error message, then the problem is with Tomcat. Check catalina.out in its logs directory. If you don’t see anything there, then probably your webapp was never even deployed, due either to a build failure or not connecting to the manager app.

I hope this helps!

blog comments powered by Disqus Prev: Calling a C++ DLL from C# Next: Ambigrams, Explosions, and Fractals