Posts Tagged technology

Automate Everything for Peace of Mind

Anybody who has written a computer program appreciates the warm glow of having a machine do your work.  That pleasure often masks the burden of holding the program’s hand — kicking it off and checking that it ran correctly.

At least for me, the natural tendency is to stop short of complete automation.  Instead, if I’m being lazy, I’ll start the program manually and catch errors by noticing odd behavior.  Perhaps I’ll see a glaring failure message.  Or maybe I’ll just notice that the program ran too quickly.

I do this because I’ve seen trusty programs fail in myriad ways.  I’ve seen malformed or truncated inputs.  I’ve seen choppy or slow network connections.  I’ve seen disks fill up.  I’ve seen memory hogs.

When an error like that pops up when you’re sitting right there at the command prompt, it’s typically easy to diagnose and move on.  When run naively from a scheduler, you might not notice a problem until disaster strikes, sparking a late night emergency call, or worse.  Therefore, I have to fight my natural tendency to hand-hold my programs.

Through experience, I’ve learned to go the extra mile and properly automate my programs.  These are some of my best practices — I hope other people can point out ones that I’ve overlooked.

  1. Every program must use a proper logger.  The first thing I do when I write *any* program is to make sure it’s set up to log using that language’s preferred logging utility.  This allows you to properly investigate what happened every time the program runs.
  2. Save the logs from each run.  It’s possible that an error will go unnoticed for a while, and there’s nothing worse than saying “unfortunately the log file was deleted.”
  3. Validate the program’s inputs.  I’ve been burned before by people who will “always save their Excel spreadsheet in the same way,” “websites that never change,” and data providers that “guarantee their data are validated before given to you.”
  4. Sanity check the outputs.  If you always expect your program to generate some new data, then compare the new results to the old results, and verify that there are actually new results.
  5. Catch errors and retry.  Computers are deterministic, but real world resources are unpredictable. I’ve seen one-off errors too often.  Perhaps I call an external program that fails for an unknown reason.  Or a file copy fails, but when I retry, it works fine.
  6. Have an extremely robust email notification system.  When there’s an error, you should be notified.  One trick is that a failure in the email system should never cause the program to fail.  Perfect this library and reuse it.
  7. Try your program every time you change it.  This is easy to say, but it’s tempting to say “oh, but I’m just adding one line” and not run your unit tests.  If you change it, at the very least, give it a whirl.
  8. Run the program from multiple machines.  Computers fail at all the wrong times.  At the very least, design your program so that only one instance publishes a final result.  Then if that computer fails, you can run a quick command on the other machine to publish whatever that program creates.

Leave a Comment

SPY Closing Price Update

Just to round out my quick post on dirty financial data, I came into work today and saw that Thursday’s closing price for SPY is now correctly stated as 94.15.  Sometimes I half-wonder if somebody intentionally causes these mistakes just to put a stick in the wheel of quantitative backtests.

spy_hp_20090720

Leave a Comment

Dirty Financial Data: SPY Closing Price

Before joining finance, my naive assumption was that the market’s high stakes would necessitate accurate, high quality data.  In particular, I expected frequently traded stocks and ETFs on public exchanges to have accurately quoted end of day prices.  In reality, even those data are noisy.

Yesterday’s (7/16/2009) closing price for SPY is one example.  The NYSE reported a composite closing price of 93.11 for SPY, but the intraday price graph makes it clear that 94.15 is much more accurate.  Here are two Bloomberg screens showing both the tabulated closing price and the intraday price plot.

spy_hp_20090717

spy_gip2_20090717

Over 200 million shares of SPY trade every business day, yet bad closing prices like this still pop up.

Leave a Comment

First version of RankBuzzard.com

I spent the holiday weekend implementing a website using the Django web framework.  In just a few days, I managed to put together a fully functional website that displays Google Hot Trends data and collects user comments.  The general point might be described as “Why are these searches popular?”  You can see the results at RankBuzzard.com.

I last built a public website roughly 4 years ago, when I wrote imwatching.net (which is no longer up).  At a broad level, the two websites are similar.  Both collect time series data, put it in a database, and present it via HTML.  I built imwatching.net using a collection of perl scripts that used the CGI module.  I had no ORM.  I had to build my own user management logic.  I had no templating system.  As a result, it took me at least 3 or 4 times as long to develop imwatching.net.  The end result also wasn’t nearly as tidy and well structured as my implementation of RankBuzzard.com.

What’s more, I hosted imwatching.net on a dedicated server that I rented from serverbeach.com for roughly $100/month.  Although I was happy with the quality of the server, the $100/month cost ultimately caused me to close the site.  Now I’m paying $20/month for a virtual server at linode.com.  Although it’s not a direct comparison, I’m been very happy so far with the service at linode.com.

Hopefully some people will find RankBuzzard interesting, informative, and fun.  I have several ideas for how to improve it, and I plan on adding to it when I find the free time over the next few weeks.

Leave a Comment

Accelerating Wall Street

Today I attended a conference in New York called Accelerating Wall Street. It was an opportunity for technologists like myself to hear about how others in the financial industry are handling the ever growing volume and speed of order and information flow on the electronic markets. The main topics of discussion were:

  • the true meaning of “ultra low latency” today, versus a couple years ago,
  • efforts to utilize multiple cores,
  • exploring complex event processing (CEP), and
  • hardware acceleration.

As I expected, due to Wall Street’s highly competitive and secretive nature, the participants did not share many actual experiences or plans. Instead, most of the conversations were high level and forward looking. A number of intelligent and influential people were there though, so I did leave with some fresh ideas.

First, technology on Wall Street is changing extremely rapidly. You cannot build up a system to implement a trading strategy, and then sit back, light your cigar, and count the money as it rolls in. To succeed you must continually improve your system and look for new opportunities. You must be highly agile so opportunities don’t disappear while you build your solution. For example, two years ago the highest latency traders were concerned about measuring and saving milliseconds, and now they’re concerned about 100s of microseconds. Similarly, be cautious of anybody touting results or best practices from even one or two years ago — they are likely outdated and largely irrelevant.

As a result, only a few elite firms can compete in the latency race. To win the race, you must optimize all of your system components and all of their interactions. At the conference, several stories were tossed around about shops that focused on microseconds within a system that included a hop that takes 10s of milliseconds. In a sense, this realization questions the relevance of the conference’s topic.

Recently I have researched some of the major complex event systems. I have a hard time differentiating the various offerings or identifying the best product. This experience was echoed by others at the conference. Further, all of the participants who spoke directly about their experience using CEP said that they used Esper, which is open source. Another common observation, which I share, was that CEP products need to offer tighter integration with messaging bus products.

Finally, I didn’t hear any lucid comments about multi-core processors or grid computing. Instead of taking the fresh development perspective that’s needed to utilize parallel processing, Wall Street seems to be stuck in the mode of optimizing their legacy techniques and products. For example, some attendees discussed using hardware acceleration for Java components. The general mindset is still, “well, I code it up using threads in Java, and if that’s too slow I either recode it in C++ or accelerate it with special hardware.” The firms that manage to embrace more modern techniques for developing concurrent software systems will be the ultimate winners.

Edit: here is a post from one of the panelists at the conference.

Leave a Comment

Follow

Get every new post delivered to your Inbox.