This is how you “open” data


I am halfway through a project of making all Unit 4 check regsiters truly accessible online. First, let me briefly provide an overview of what curretly exists, why it is not enough, and what I am doing about it.


Since about 2010, the Unit 4 Finance team has been posting check registers online via the district’s BoardDocs portal. If you have ever looked at the portal, you know that it is obscenely difficult to find documents unless you have some idea of where to look. For instance, Check Registers are always posted during the Regular Meeting under the Action agenda item “Bills and Treasurer’s Report”. It will always be for the prior month; so for the June Board Meeting, they posted the check registers from May. So, to give credit where credit is due, kudos to Unit 4 for posting check registers online – not many school districts do that.

But here is the problem – who looks at them? Among the small handful of people that actually take the time to hunt them down and scan through several pages of checks, who actually can make any use of it? If you want to hunt down all the transactions for a given “vendor”, you have to manually find and open each month’s check register yourself. I tried it once – it was quite painful. 🙂


In May of 2013 I asked Unit 4 to provide an index or some type of archive for all check registers. Stephanie Stuart and Matt Foster quickly responded and now we have a very tidy Check Register Archive. It isn’t perfect (some months are missing), but it is yet another step in a good direction, so for that I give Unit 4 another kudo. There is still an issue of tracking vendors and payments across months, which is a very manual process. Several times in the past year I have asked for read-only access to the data in one form or another, but I am always hitting legal obstructions along the way.


So here is my solution. Given that Unit 4 is providing a very nice central repository for check registers, and given that the PDF documents are true exports from a reporting tool (as opposed to those notorious PDF images of data that are so prevelant), without asking Unit 4 to lift a finger I can now provide the entirety of those check registers via a mildly reverse-engineered database. This is the first half of my project. The next half is to provide a handy dandy php-driven web page to allow dynamic access to those who have no idea how to connect to a database and write queries.


Sounds exciting, right? You are anxious to go look at the data, right? Unfortunately, the pretty stuff isn’t my forte. I can provide excel spreadsheets of the data for you, which is probably “good enough” for most everyone:


Why two different files? About the time I asked for the check register archive, it appears that Unit 4 changed the format of the output. The PDF files are in what’s called “fixed-width fields”, and the number and size of those fields changed near May of 2013. So another nice thing about having a database is that I can combine all these lines into one resource and make the formats a moot point.


More fun stuff to come later this week. I have been inspired by Adam Andrzejewski and his “Open the Books” campaign.


Technical details

connection string = jdbc:mysql://

user = u4reader/redaer#4

Number of records: 50,868

data loaded with perl using CAM::PDF modules; dynamically reads all PDF’s available on Unit 4 website, parses, cleans and loads the data.



One Response to “This is how you “open” data”

  1. charlesdschultz Says:

    Just noticed this morning that the Check Register archive has been updated. Sweet! Thanks to Matt Foster and David Hohman’s team.

    I will be generating a new dataset soon.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: