Software, and the stupid stuff I did: ruby on rails

Wednesday, December 1, 2010

Ruby on Rails and inconsistent database results

I started stressing a RoR project that has grown pretty big. Some serious hands on testing was showing that functionality was working well, performance was fine, but sometimes I would just get weird results from the database or ActiveRecord. I would create a new AR object, save it, use it a few times, update it, then it would suddenly just disappear. I would start getting ActiveRecord::RecordNotFound exceptions doing a Thingy.find(1234), when thingy#1234 definitely existed in the database. It would take a restart of Phusion Passenger or for one of the workers to timeout before I would start seeing the object again, and if I refreshed a page with Thingy.all(:conditions=>c) the results would change, then change back. I'm using MySQL so its not exactly what I was expecting to see.

I had issues in the past with some forking of processes that could just run through to conclusion in the background - they were removed. I made sure that there were good Thingy.transaction do end blocks covering my updates. Still, things were getting worse, not better.

Eventually I ended up hunting around the code from the dim and distant past. That stuff I don't touch because it "just works". Well, I roll up to an interesting section in a class :

sql = ActiveRecord::Base.connection();
   sql.execute "SET autocommit=0";
   sql.begin_db_transaction
   sql.delete 'delete from a_table where some_conditions'
   sql.update sqlstring
   sql.commit_db_transaction

This was valid, as the SQL going on in the sqlstring was complex to say the least. But since I've removed this from the main flow of the application things seem to have settled down considerably.

I'm guessing that my standard transactions were getting caught up in my attempt to borrow a connection from the pool explicitly and who knows what was happening. Or maybe Passenger was losing its connection and recreating a connection. I don't know, but I'm not doing it again!

Wednesday, October 27, 2010

Counting pages

I've been working on a Ruby on Rails project for a while. One area of it has morphed into a bit of document management, and for some users it is important to know how many pages a specific document has in it. At least for PDFs and TIFFs.

Well, ImageMagick is one approach, letting you load the document then review its properties. But as anybody who has used it will know, unless you are careful, this can be a huge memory sink. In fact I use ImageMagick 'convert' as a way to force my machine to run out memory during testing, to see if it fails gracefully.

So, I hunted around a bit and came up with these programs: tiffdump and pdfinfo. I also considered tiffinfo, although the 'rawness' of tiffdump just seemed more appealing when parsing out the data I needed.

To install them (on Ubuntu):

sudo apt-get install libtiff-tools poppler-utils

Then use the command line programs from Ruby, something like this:

path = '/home/someone/somewhere/somefile.xxx'
mime_type = WEBrick::HTTPUtils.mime_type(path, WEBrick::HTTPUtils::DefaultMimeTypes)
if mime_type=='image/tiff'
  return `tiffdump '#{path}' | grep 'Directory'`.count('\n')
elsif mime_type=='application/pdf'
  return `pdfinfo '#{path}' | grep 'Pages'`.split(':')[1].chomp.to_i
else
  # whatever
end

Not pretty, not clever, but a lot faster than RMagick, and a lot easier than the Ghostscript approaches I've seen discussed but never actually working.