Disposable Staging Site on Heroku

November 19th, 2011

We are building ElectNext on Heroku. Although we’ve had some stability problems, we’re very happy with all the infrastructure they provide us. One advantage is that we can treat applications as “disposable.” We can’t take this as far on Heroku as we could on AWS, but there are still places it comes in handy. One example is for our staging site. We deploy here just before deploying to production, as a final check that everything works as expected. Really, at this point we are testing the act of deploying more than the code itself. The staging site lets us rehearse each deployment.

Of course we want staging to mirror production as closely as possible, but inevitably the two get out of sync. On bare metal, this used to be a thorny problem to solve, but in the cloud it’s easy to throw away staging completely and re-create it based on production. That means the two never drift apart, and we can re-initialize staging just before testing a deploy. This gives us greater confidence that the production deployment will run smoothly. Also, if the deployment to staging fails, then we can re-initialize again so that we are testing our fixes appropriately.

To perform a staging initialization, I wrote the following Makefile. You’ll want to adapt a few things, but it’s pretty generic. It assumes that you have two git remotes, one named heroku (for production) and one named staging. Although on Heroku code usually runs in the Rails production environment, we run staging in a custom staging environment, by setting RACK_ENV=staging. You also may not want the same add-ons, config vars, etc. as are listed here. But hopefully this will be a start!:


# We use this Makefile to destroy the staging site
# and re-create it based on production.
# This is useful to keep staging current,
# and it's especially nice so we can rehearse deployments.
# If we nuke staging and then deploy the latest code there,
# we can be more confident that the production deploy
# will run smoothly.

PRODUCTION_APP=example
STAGING_APP=example-staging

initialize_staging: destroy_staging create_staging populate_staging_database

populate_staging_database:
	heroku pgbackups:capture --expire --app=${PRODUCTION_APP}
	heroku pgbackups:restore DATABASE `heroku pgbackups:url --app=${PRODUCTION_APP}` --app=${STAGING_APP} --confirm ${STAGING_APP}
	# We use these next two Rake tasks so it's easy to get consistent
	# indexes and foreign keys on all systems,
	# and they are all listed in one place.
	# This is especially handy for foreign keys,
	# because Heroku's db:{push,pull} commands don't transfer them.
	heroku rake db:add_indexes	--app=${STAGING_APP}
	heroku rake db:add_foreign_keys	--app=${STAGING_APP}
	heroku ps:restart		--app=${STAGING_APP}

create_staging:
	heroku apps:create --remote staging --stack bamboo-mri-1.9.2 ${STAGING_APP}
	git push staging master
	heroku sharing:add someone@example.com                          --app=${STAGING_APP}
	# We run staging as its own Rails environment:
	heroku config:add RACK_ENV=staging			        --app=${STAGING_APP}
	# Uncomment this line if you want a bigger database on staging:
	# heroku addons:upgrade shared-database:20gb			--app=electnext-staging
	heroku addons:add releases:basic				--app=${STAGING_APP}
	heroku addons:add pgbackups:basic				--app=${STAGING_APP}
	heroku addons:add custom_domains:basic				--app=${STAGING_APP}
	heroku addons:add newrelic:standard   				--app=${STAGING_APP}
	heroku addons:upgrade logging:expanded 				--app=${STAGING_APP}
	heroku domains:add staging.example.com                          --app=${STAGING_APP}
	heroku addons:add memcache:5mb					--app=${STAGING_APP}
	# ...
	heroku ps:restart						--app=${STAGING_APP}

destroy_staging:
	heroku apps:destroy --app=${STAGING_APP} --confirm ${STAGING_APP}

# vim: set filetype=make :

Piping in Ruby with popen3

October 24th, 2011

If you’re using Ruby to glue together shell commands, you may want to pass some values through a filter and read the results back into Ruby. This is trivial to do in a shell script, but from a language like Perl or Ruby it is really hard.

For example, suppose you have a database full of names, and you want to parse each one into first name, last name, title, suffix, etc. This is a really complicated task, so you’re much better off using a specialized tool. Now it happens there is no such tool in Ruby, but there is a great Perl library called Lingua::EN::NameParse. So you (i.e. I) decide to write a filter in Perl that will read one name per line on stdin and print the name breakdown to stdout in YAML. Then you can read the YAML back into your Ruby script and do whatever you like with it. Viola: structured data!

The problem is that Ruby doesn’t give any easy way to run a command for which you both read and write. There is the popen3 command, which you can call like this:

Open3.popen3([cmd, cmd]) do |stdin, stdout, stderr|
  # ...
end

But if you try that approach, it is probably going to deadlock your code. This is a well-known problem; you’ll encounter the same thing in Perl or Python. The problem is that you need to keep feeding lines to your filter and consuming them at the same time. All the pipes (stdin, stdout, and stderr) have limited buffers, and when one fills up, everything is going to stop. Here is one page that gives a long description of the problem and attempts a (very complicated) solution using select(3). Here is another page that tackles the problem with threads.

But I found an even easier solution. In io/wait there is a module that adds the ready? method to IO objects. ready? is sort of like a non-blocking read, except it doesn’t read anything. It returns true if it’s possible to read without blocking, false if not, and nil if it’s unknown. So you can write your code like this:

require 'io/wait'

yaml = []
errors = []

stdin, stdout, stderr = Open3.popen3([cmd, cmd])

names.each do |name|
  stdin.puts name

  while stdout.ready?
    yaml << stdout.readline
  end
  while stderr.ready?
    errors << stderr.readline
  end
end

# Now get whatever else we still have to read:
stdin.close
stdout.each_line do |line|
  yaml << line
end
stderr.each_line do |line|
  errors << line
end

Don’t forget the require 'io/wait'!

The only other thing you should do is ensure your Perl code uses line-buffering even when not writing to a tty. Just include this near the top of your file:

$| = 1;

Now Perl will print each line as you ask it to, so your Ruby code will get data as you tell Perl to print it.

The Ruby code above still isn’t perfect. If our Perl program writes an incomplete line, it will deadlock. We could fix this by not using readline, but since we have complete control over the Perl program, that seems unnecessarily complicated.

Another problem with my Ruby here is it is hard to debug if Perl quits unexpectedly (e.g. with die). You’ll just see a “Broken pipe” error, probably when you try stdin.puts. The text you give to die will get lost, so it may be challenging to track down the source of the problem. For quick-and-dirty data munging, this isn’t such a problem, but it would be nice to solve somehow.

Synchronous Ajax in Rails 3

August 17th, 2011

Synchronous Ajax sounds like an oxymoron, but sometimes it is what you want. That is, you still want the Ajax requests to happen in the background, but you want to make sure they hit the server in the order the user clicked. For example, suppose you have a multiple-choice question, and when the user clicks an answer, you record it using Ajax and also update the display to show the answer the user chose. You need to synchronize these requests, because whatever answer the user clicked last should override all the previous answers. If network strangeness causes your requests to hit the server out-of-order, the wrong answer is going to get recorded.

There is a nice jQuery plugin I’ve been using for these situations, called Ajax Manager. For something like my question/answer example, I’d use code like this:


$(function() {
  var answerManager = $.manageAjax.create({
    queue: 'clear',
    abortOld: true,
    maxRequests: 1
  });
  $('a.answer').click(function(event) {
    event.preventDefault();
    answerManager.add({
      url: $(this).attr('href'),
      method: 'POST'
    });
    return false;
  });
});

The trouble comes when I combine this with Rails 3. Suppose I’m creating my HTML with this Rails code (in HAML):


- @question.choices.each do |ch|
  = link_to question_answers_path(@question, :choice_id => ch.id), :method => :post, :remote => true do
    .choice= ch.text

One problem is that when Rails sees the :remote => true, it’s going to set its own jQuery listener on that link, which will interfere with my own click listener. So we take that off. But the real problem is saying :method => :post, because that also causes Rails to listen on the link, so it can turn the GET into a POST. Because Rails is going to send a POST request from its own Javascript code, it doesn’t matter that I’m returning false from my own click listener. The anchor tag will get canceled, but Rails’ Javascript will still submit a POST for me (and not via Ajax).

I could remove the :method => :post, but then link_to is going to complain because it thinks I want the :index route, not the :create route, and maybe that doesn’t exist. Arg! Besides, with this approach my code is misleading. The HAML looks like I want a GET link, but elsewhere I’m substituting a POST. We could just write the link by hand, without using link_to, but that doesn’t make the code any less misleading, and it’s just too far down the road of fighting your framework to make me comfortable.

The solution, which unties all these knots, is to use Rails’ Ajax callbacks to cancel Rails’ own Ajax submission and replace it with our own. With this approach, you keep :method => :post, :remote => true, so link_to is happy and your HAML makes sense. But in your Javascript, instead of listening for click, you listen for ajax:before:


$(function() {
  var answerManager = $.manageAjax.create({
    queue: 'clear',
    abortOld: true,
    maxRequests: 1
  });
  $('a.answer').bind('ajax:before', function() {
    answerManager.add({
      url: $(this).attr('href'),
      method: 'POST'
    });
    return false;
  });
});

Here we’re binding to this custom Rails-provided event, so instead of fighting with our framework, we’re cooperating with it. Note our event parameter is gone, but we don’t need it. We launch our own request via Ajax Manager, and then we return false to cancel the Rails request.

If you needed the xhr or settings parameters, you could bind to ajax:beforeSend instead. Either method is cancelable by returning false.

Here is some documentation on Rails’ Ajax integration, if you want to read more.

RESTless Doubts

July 11th, 2011

I’ve wanted to blog for a while now about some user experience problems I’ve seen with a RESTful approach in Rails. I love the RoR framework, and I think REST is very useful for giving a consistent structure to your web interface. But there are a couple things that harm a RESTful site’s usability.

The first comes about when a form gets submitted to create a new object, and it has errors. Before submitting, you were on /employees/new, and now you’ve POSTed to /employees. In the Ruby world, you gather up the error messages and re-render the form, but the URL doesn’t change. That means your location bar still shows /employees, even though you’re seeing the same thing you saw on /employees/new. It also means that if you type Ctrl-L, Enter to reload the page (am I the only one who does this?), you get either the index page (if it exists), a nasty routing error (if the developer was careless), or something else unexpected (if bad routing errors are handled).

You get similar weirdness when updating an existing object, because you PUT to /employees/1, and re-requesting the page from the Location Bar takes you to the show page, not the edit one.

It seems better either to create new objects by POSTing back to /employees/new and update old ones by PUTing to /employees/1/edit, or to issue a redirect on errors so the user goes back to the original page. I would prefer this second approach, because it keeps things consistent: the form for new objects is always at /employes/new, never at /employees, etc. It also preserves bookmarkability, which I guess is the more mainstream version of my Ctrl+L, Enter. Sadly, RoR’s way of storing errors on your model object makes the redirect approach impractical. You’d have to save model state and errors on some flash-like object. Just putting the messages into flash[:error] isn’t quite good enough if you want to preserve the user’s input and highlight the problematic form fields. Then the form would need to know to check flash (or rather your improved version that has some drawer for not-to-be-rendered-as-a-notice type things), or your own code would have to pull the model out of flash, if it exists there. And of course flash is implemented via session, so this might break with the new cookie-based session storage.

The second problem with REST involves the back button. Suppose you’re writing a survey, where each question can have one answer per user. The question appears on /questions/1. That page contains a form either POSTing to /questions/1/answers or PUTing to /questions/1/answers/1. Now suppose I’m taking the test, so I see a question, POST a new answer, then have second thoughts. I click Back, change my answer, re-POST . . . and see an error message. This is because Rails wrote the form to POST to /questions/1/answers, not to PUT to /questions/1/answers/1. It would seem more intuitive to always POST to /questions/1/answer (”answer” is a verb here).

Arguably my first complaint is about Rails rather than REST. The second complaint, on the other hand, seems to have no good RESTful solution. How can any RESTful framework know where to submit the form, if it has to work with the Back button? If Rails has any fault here, I’d say it is in making it so easy for new.html.erb and edit.html.erb to share the same form code, even though the form gets rendered differently in each case.

STARTTLS Problems with Ruby

March 25th, 2011

If you want to send email from a Ruby program, one approach is to hit an SMTP server where you have an account, and ask it to send the email in your name. This is nice, because you ensure your From: and Sender: fields contain real addresses. There are lots of sites out there that describe how to do this with Gmail. The best library to use seems to be Pony. From a Ruby 1.8.7 script, you can say this:


#!/usr/bin/env ruby

require 'rubygems'
require 'pony'

from = 'example@gmail.com'
to = 'someone@example.com'
subject = 'testing'
msg = <<EOM
Hello, this is a test!
EOM

Pony.mail(:to => to,
          :from => from,
          :subject => subject,
          :body => msg,
          :via => :smtp,
          :via_options => {
            :address => 'smtp.gmail.com',
            :port => '587',
            :enable_starttls_auto => true,
            :user_name => from,
            :password => 'OMITTED',
            :authentication => :plain,
          }
         )

The :enable_starttls_auto line is actually optional, because it defaults to true. Unless you tell it otherwise, Ruby’s SMTP library will negotiate with the server for TLS. This is an SSL-like encryption scheme which you initiate from a regular SMTP session by sending the STARTTLS command. Without TLS, the whole SMTP conversation is in plaintext, including the email body and your account password. So in general, Ruby’s approach is a good thing. The trouble I encountered is that I didn’t want my emails to come from a Gmail account. I wanted them to come from illuminatedcomputing.com. And while my webhost does let me send emails via its SMTP server, its SSL certificate is a little wanting: the domain name on it is localhost.

Strictly speaking, this means a client should reject the certificate, because it doesn’t match the domain you think you’re connecting to. This is what Ruby does, which means Pony was failing with this error:

/usr/lib/ruby/1.8/openssl/ssl.rb:124:in `post_connection_check':
hostname was not match with the server certificate (OpenSSL::SSL::SSLError)

(I guess this is the kind of error message you get when you use a language from Japan. :-)

I couldn’t find any obvious way around this. You can tell Ruby’s SSLContext to ignore a mismatched domain name, but how can you tell Pony to pass this setting on to Mail, which should pass it on to SMTP, which should set it on the SSLContext? It seemed hopeless. My first attempt was rather heavy-handed, just overriding the default SSLContext:


class << Net::SMTP
  remove_method :default_ssl_context # if defined?(Net::SMTP.default_ssl_context)
end

module Net
    class SMTP
        def SMTP.default_ssl_context
            ctx = OpenSSL::SSL::SSLContext.new
            ctx.verify_mode = OpenSSL::SSL::VERIFY_NONE
            ctx
        end
    end
end

This works, but it affects too much: probably all the SSL connections you make throughout your application.

With some more digging, I noticed that Mail::SMTP accepted an option called :openssl_verify_mode, which could be set to the constant OpenSSL::SSL::VERIFY_NONE. If I passed this to Pony in the :via_options hash, would it pass it along? There was no documentation suggesting it would, but I tried it anyway, and guess what? It worked! So this code is a way to get around a no-domain SSL certificate without wrecking SSL for your whole application:


Pony.mail(:to => to,
          :from => from,
          :subject => subject,
          :body => msg,
          :via => :smtp,
          :via_options => {
            :address => 'smtp.mydomain.com',
            :port => '25',
            :enable_starttls_auto => true,
            :user_name => from,
            :password => 'OMITTED',
            :authentication => :plain,
            :openssl_verify_mode => OpenSSL::SSL::VERIFY_NONE,
          }
         )

I’m very impressed that Pony lets you get away with this.

Please note that this approach is not entirely secure! The reason an SSL certificate contains a domain name is to prove you’re talking to the real owner of the domain. So the code above is not ideal. Please use it at your own risk, and don’t blame me if anything goes wrong. In my case, I’m not too surprised that my hosting provider doesn’t pay for a separate SSL certificate for every single customer, and I’m willing to risk a DNS spoofing attack.

UPDATE: The same parameter works with ActionMailer, so you can say this (Rails 3):


config.action_mailer.smtp_settings = {
      :address              => 'smtp.mydomain.com',
      :port                 => '25',
      :domain               => 'mydomain.com',
      :user_name            => 'example@mydomain.com',
      :password             => 'OMITTED',
      :authentication       => 'plain',
      :enable_starttls_auto => true,
      :openssl_verify_mode  => OpenSSL::SSL::VERIFY_NONE,
  }

What git add does

March 15th, 2011

I’ve used git on a few small projects now—not real work, just quick scripts for things—and so far I’ve mostly treated it like Subversion. I knew about the staging area, but it was a little blurry, and the need to type commit -a on files I’d already added was just a speck of annoyance. But suddenly it struck me that git’s add is completely different from svn’s add. On svn, add puts your file under version control. On git, add promotes your file (whether new, changed, or deleted) to the staging area. This distinction is somewhat obscured by commit -a, which only operates on files that have at least one commit—or, it might seem, on files that are not yet under version control because you’ve never used add on them.

I’m sure this note is no news to regular git users, and maybe it’s incomprehensible to others, but for me it’s one of those minor epiphanies, and I thought I’d document it for a change.

jQuery-UI Linked Sliders

March 11th, 2011

For a current project, I needed to show multiple sliders whose value summed to a fixed amount, so when a user moved one slider, the others would move in the other way to compensate. I googled around for a bit and found nothing (other than people complaining that this was hard). So I wrote my own version. For this project, I thought I’d try hosting the code on github. But I’ve also put a demo here.

Javascript: setTimeout/setInterval on Object Instances

March 1st, 2011

I was trying to use Javascript’s setInterval function, but I needed to call a method on an object, and that isn’t really supported:


function Foo(x) {
    this.x;
    this.bar = function() {
        alert(this.x);
    }
    this.baz = function() {
        setInterval("this.bar", 50);
    }
}

I could pretty much guess the above code wouldn’t work, because this in "this.bar" would clearly be something else once the string was evaluated. I was hoping this code would be okay: setInterval(this.bar, 50). It doesn’t use the string, but passes a method reference directly. It does pass the right method, but unlike, say, Python, Javascript still untethers the method from the object, so inside bar this is still in the global scope and doesn’t evaluate to your instance.

I solved the problem by using closures:


function Foo(x) {
    this.x;
    this.bar = function() {
        alert(this.x);
    }
    this.baz = function() {
        var _this = this;
        setInterval(function() {
            _this.bar();
        }, 50);
    }
}

If bar was essentially a private method and not used elsewhere, you could simplify things to this:


function Foo(x) {
    this.x;
    this.baz = function() {
        var _this = this;
        setInterval(function() {
            alert(_this.x);
        }, 50);
    }
}

Visual Studio Unit Tests with Resources

February 12th, 2011

I have a project that needs to load a lot of historical data from a JSON file to support various computations. This is a Windows Forms app built in C# using Visual Studio 2008. In the application, I can open the JSON file like this:


string json = File.ReadAllText(Path.Combine(Application.StartupPath, "figures.json"));

But I also want to write unit tests against the parts of the app that need this file. I’m using Visual Studio’s built-in unit test framework. I was trying to figure out how to ensure the unit test can get access to the file. It turns out, I just add this annotation to any test method that needs the file:


[DeploymentItem("figures.json")]

Then I can open the file like this:


string json = File.ReadAllText(Path.Combine(testContextInstance.TestDeploymentDir, "figures.json"));

I can also put the annotation on the test class, rather than tagging each separate method. Unfortunately I can’t apply it to the whole test assembly. I was hoping I could stick it in AssemblyInfo.cs, but then I get an error that it only works on classes and methods. Oh well!

C# Access Modifiers and Unit Tests

January 23rd, 2011

Access modifiers (public, private, protected, internal) for C++, Java, and C# were designed before unit testing really became popular, and it’s hard to write unit tests against non-public methods. I like the pattern of putting all tests into a separate package/assembly, so you can easily separate them from a production build. You can also use naming conventions, but that seems less robust. When I’ve tried that approach, inevitably some support classes for tests, like mock objects, leaked into the production release. That can be dangerous. But if you put your tests into a separate assembly, how do you test non-public methods?

Some people say you shouldn’t test non-public methods, but that sounds to me like coder religion. Why not? I find it very useful, when separating a big method into helper methods, to test the helper methods individually so I know they’re right. If the code is complex enough to warrant a helper method, it may be complex enough to warrant a test. I’ve been writing a lot of financial computation code lately, and that’s the kind of thing you need to get right. I’ve never been so diligent about writing unit tests as now. It’s quite handy to test units smaller than public method calls. So how do you do it?

One solution is to mark everything as public. I confess this actually appeals a little to my practical side. As long as you’re not building a truly public API (and few business programmers are), it’s okay to leave helper methods as public. It’s easy to spend unnecessary time thinking about public vs. private, and inevitably you wind up too strict somewhere so you waste time going back and changing things.

But ultimately I prefer to keep helper methods private. It is self-documenting. When you read someone else’s code (including month-ago you), it’s a relief to know that some methods are never called from outside the class. It helps you understand a method’s intent if public and private mean something. Isolating the true public methods can substantially lighten the complexity burden of understanding how a class works. Also, a private method doesn’t need its inputs sanitized quite so rigorously. And with Visual Studio, keeping helper methods private improves IntelliSense, if you use that. So I want to take advantage of private and friends, if I can make it work with unit testing.

Fortunately, C# offers a way to do this. In Properties/AssemblyInfo.cs, you can add a line like this:


[assembly:InternalsVisibleTo("FooTests")]

Then, whatever methods you mark internal are accessible to your test project. (Remember that internal allows access from anywhere in the same assembly.) Sadly this attribute doesn’t open up private methods, but if your development group knows that internal is basically synonymous with private, it’s not a bad compromise. It still lets you distinguish between the public API and everything else. I like this approach a lot.