Trading Framework Part I: Tools I Use

I received a question from a reader regarding the software I use...more specifically...the open source software I use in trading. Instead of a direct response, I figured this type of question might be useful to other readers of this blog.

My basic trading framework is the following:
Operating System:Windows Vista Home Premium
Programming Languages:Python 2.6.2 & R 2.9.1
Databases:SQLite 2.4.1, Numpy 1.3.0, & CSV
Programming Editor:SciTE 1.78
Graphing Engines:Matplotlib 0.98.5 & R
GUI:HTML & JavaScript
Scheduler:Windows Task Scheduler
Shells:Command.com (DOS) & Cygwin (Bash)
Historical Quotes:CSI & Yahoo Finance

Operating System
Choosing Windows as the operating system is mainly out of convenience. As you can see above, the only real item that would prevent a full move to Linux is the historical quote provider, CSI. Everything else can run on another platform or a suitable alternative is available.

Another reason I've stayed with Windows is due to my current job (windows shop). But, I will admit, I have been very close to switching to a Mac the past few months or possibly OpenSUSE. Just haven't taken the bite yet.

On a side note, prior to my current employer...I worked for a University that was really ahead of its time. Every program we developed had to pass a compatibility test, "Could it easily run on another platform?" While this at times was an impossible task due to user requirements...we still always coded with this compatibility in mind. And I've kept this same philosophy in developing the trading simulation engine.

Programming Languages
I'm originally a Cobol programmer. Yes, that's right...if you've never heard of one...now you're reading a blog by one. Cobol programmers, the good ones, are very keen on whitespace. When you're throwing a lot of code around...the whitespace is what keeps you sane. And so, when I was trying out the various scripting languages back in the day...Python really struck my fancy. I spent the better part of 9 years trying to force programmers to keep the code pretty in Cobol. Only to see Python come around and truly force programmers to code clean. Over the years, I have worked in various other languages, but I've always stuck with Python.

I think another reason I chose Python was due to WealthLab's Scripting language (Pascal-based). I felt I could build an environment similar to WealthLab that would offer the same scripting ease. So far, Python has done a great job in keeping the framework simple and extensible.

Another language I have used from time to time in my trading is R. I use R mainly to analyze trading results. A few years ago, I actually developed a prototype of the trading simulation engine in R. But, it was too slow. The loops killed it. With the recent development of Revolution Computing's ParallelR...I've often wondered what the results would now be. But, I'm past the point of return with the engine in Python. But, as far as fast analysis of CSV files...it is really hard to beat R.

Databases
I struggled several years with how to store and retrieve the historical price series data for the trading simulation engine. The main problem was the data could not fit into memory yet access had to be extremely fast. So, for years I used plain CSV files to store the data.

Basically, reading the CSV files from CSI and writing out new price CSV files with my fixes from possible bad data along with additional calculated fields. At first I stored the data into 1 big CSV file. Then used either the DOS sort or Bash sort command to sort the file by date. I was afraid I would run into file size limits (at the time I was on Windows XP 32-bit). So, I started writing the data out to thousands of files broken down by date. Basically, each file was a date containing all the prices for that date. Worked really well...except analysis on the backend became difficult. Plus, it felt kludgy.

I had always tried to use regular databases for the pricing backend...but they couldn't handle the storage and retrieval rates I required. Just too slow. And yes, I tried all of them: MySQL, PostGreSQL, Firebird, Berkely DB, SQLite, etc.

It wasn't until I read an article by Bret Taylor covering how FriendFeed uses MySQL that I had an idea as to how to use a database to get the best of both worlds - fast storage & retrieval along with slick and easy access to the data. That's when I went back to SQLite and began a massive hacking of code while on a Texas Hill Country vacation. Really bumped the trading simulation engine to another level. The trick to fast storage & retrieval? Use less but bigger rows.

For a memory database? I use numpy. It's a fantastic in-memory multi-dimensional storage tool. I dump the price series from SQLite to numpy to enable row or column-wise retrieval. Only recently have I found the performance hit is a little too much. So, I've removed numpy from one side of the framework. And contemplating removing it from the other side as well. It takes more work to replicate numpy via a dictionary of dictionaries of lists. But, surprisingly, it is worth the effort when dealing with price series. Which means, I may not use numpy in the engine for long. Still a great tool to use for in-memory storage.

Graphing Engines and GUI
I really try to keep it simple in the front-end of the trading framework. I use Matplotlib to visualize price or trading results. And HTML along with Javascript to display trading statistics. Honestly, not a lot has gone into this side of things. Still very raw. My goal for 2010 is to work more in this area.

I have used R quite a bit in analyzing the output of the trading backtests. R is really powerful here. Quickly and easily chart and/or view pretty much any subset of the data you wish.

If there's certain items I look at over and over in the backtests...I'll typically replicate in Python & Matplotlib and include in the backtest results.

Editor, Schedulers, and Shells.
SciTE is hands down my favorite Python editor. I don't like the fancy IDE type stuff. SciTE keeps it simple.

Windows Task Scheduler is for the birds. I should know...my main job is centered around Enterprise Scheduling. But, the windows task scheduler gets the job done most of the time. I just have to code around a lot of the times it misses or doesn't get things quite right. Which is okay...that's life. That's one of the main reasons I have thought about switching to a nix box for cron and the like.

The DOS shell or Bash shell...I don't get too fancy in either. I do use the Bash shell quite a bit in performing global changes in the python code. Or back when the database was CSV based. Again, nix boxes win here. But, us windows developers hopefully can always get a copy of Cygwin to save the day.

Historical Quotes
I have used CSIdata for many years. Mainly for the following reasons:
  • Dividend-adjusted quotes which are essential if analyzing long-term trading systems.
  • Adjusted closing price - needed if you wish to test the exclusion of data based on the actual price traded - not the split-adjusted price.
  • CSV files - CSI does a great job of building and maintaining CSV files of price history.
  • Delisted data - I thought this would be a bigger deal but didn't really impact test results...but still nice to have for confirmation.
  • Data is used by several hedge funds and web sites such as Yahoo Finance.
The only drawback I have to CSI is the daily limit to the number of stocks you can export out of the product. It can get frustrating trying to work around the limit. Of course, you can always pony up for a higher limit.

This covers Part I of the series. Next up? The type of analysis I perform with the trading framework.

Later Trades,

MT

Labels: , , , ,

 

What I'm Researching...


Jim Barry's Rexx Tutor Part2

Posted: 13 Nov 2008 01:00 PM CST

great summaries on the classic rexx functions.

Project Aardvark

Posted: 13 Nov 2008 12:53 PM CST

Joel on Software's Real World. A must see!

Reading List: Fog Creek Software Management Training Program - Joel on Software

Posted: 13 Nov 2008 12:50 PM CST

great reading list!

In Python how do I sort a list of dictionaries by values of the dictionary? - Stack Overflow

Posted: 09 Nov 2008 09:29 PM CST

nice efficient sorting of values in a python dictionary.

AT&T Labs Research - Yoix / YWAIT

Posted: 07 Nov 2008 07:36 AM CST

Interesting way to build a web application. Wonder how complex this would be to use versus traditional web-based systems (LAMP)? This may be easier to deploy if the goal of the software is simulation/visualizations. Something to toy with.

AT&T Labs Research - Yoix / Byzgraf

Posted: 07 Nov 2008 07:33 AM CST

Another great looking toolset using Yoix that enables plotting functions: line, bar, histograms, etc.

AT&T Labs Research - Yoix / YDAT

Posted: 07 Nov 2008 07:32 AM CST

Extremely cool visualization toolset from AT&T Labs Research. Handles graphviz files.

Labels: , , ,

 

What I'm Researching...


Overview of RAMFS and TMPFS on Linux

Posted: 06 Nov 2008 11:02 PM CST

Map your memory as a drive? Wonder how this would work if you built a linux server with 32gb memory and mapped at least half that dedicated for simulations? How much faster would this be versus traditional disk-based sims?

Replacing multiple occurrences in nested arrays - Stack Overflow

Posted: 06 Nov 2008 10:58 PM CST

will this work in updating a dictionary of prices? if you have a dictionary of portfolio positions with values being python lists...would this be a good solution in updating the closing price of the stock (one of the items in the list)?

Labels: , ,

 

What I'm Researching...


RocketDock - About RocketDock

Posted: 20 Oct 2008 12:17 AM CDT

extremely cool application dock for windows.

Python Programming/Lists - Wikibooks, collection of open-content textbooks

Posted: 20 Oct 2008 12:12 AM CDT

Great collection of python list examples.

Introduction To New-Style Classes in Python

Posted: 19 Oct 2008 01:18 AM CDT

great explanation of python classes. check out the final part discussing the __slots__ feature. basically, reserve attributes...those not defined cannot be assigned.

PyTables User's Guide

Posted: 18 Oct 2008 12:30 PM CDT

html version of the pytables userguide.

rdoc:graphics:barplot [R Wiki]

Posted: 17 Oct 2008 04:22 PM CDT

R doc for barplot

Welcome to DrQueue Commercial Website

Posted: 12 Oct 2008 11:44 PM CDT

queue manager with python binding. looks to be used as a render manager...but could see other uses as well.

Building home linux render cluster

Posted: 12 Oct 2008 11:30 PM CDT

excellent article on building a cheap 24 core x 48GB ram linux cluster.

Labels: , , , , ,

 

What I'm Researching...


Linus' blog: .. so I got one of the new Intel SSD's

Posted: 07 Oct 2008 10:02 PM CDT

great analysis on evaluating SSD hard drives. read the comments for more info. as an aside...linus has a blog...cool.

pymc - Google Code

Posted: 07 Oct 2008 12:45 PM CDT

monte carlo in python? looks worth exploring further.

Labels: , ,

 

What I'm Researching...


The Sect of Homokaasu - The Rasterbator

Posted: 07 Oct 2008 01:45 AM CDT

Cool, print huge posters from normal paper - software breaks up images to fit on 8.5 x 11 paper. Hat-tip to my wife for finding this site.

PerTrac Support - Statistics

Posted: 06 Oct 2008 12:43 PM CDT

Great site covering formulas of investment stats. Useful for coding the performance part of the testing platform.

pickle(cPickle) vs numpy tofile/fromfile - Python - Snipplr

Posted: 05 Oct 2008 11:09 PM CDT

interesting code snippet comparing performance of cpickle and numpy to/from file routines. been thinking about this lately...using numpy directly or cpickle instead of using a bloated dbms for persistent storage of time series on the testing platform.

HintsForSQLUsers - Hierarchical Datasets in Python

Posted: 05 Oct 2008 11:06 PM CDT

covers many of the faq of SQL developers when developing with PyTables.

EasyvizDocumentation - scitools - Google Code - Easyviz Documentation

Posted: 05 Oct 2008 09:55 PM CDT

Python plotting interface to various backend plotting engines: Gnuplot, Matplotlib, Grace, Veusz, PyX, VTK, VisIt, OpenDX, and a few more. Seems like a fairly straight-forward interface. And choosing the backend used is a one-line import statement. Interesting.

PyX - Python graphics package

Posted: 05 Oct 2008 12:25 PM CDT

looks like a dead-simple plotting library in python to produce pub quality pdf/ps images. Need to explore.

Labels: , , , ,

 

What I'm Researching...


TinyMCE - Home

Posted: 05 Oct 2008 12:12 AM CDT

WYSIWYG Javascript WYSIWYG editor - haven't tried it...but may be worth testing on a new project of mine.

PyTables - Hierarchical Datasets in Python

Posted: 04 Oct 2008 01:35 PM CDT

the original python interface to the HDF5 library. Have tested this before...need to test again using new architecture. Original tests found speeds that were equivalent to SQLite but of course slower than CSV files.

Python bindings for the HDF5 library — h5py v0.3.1 documentation

Posted: 04 Oct 2008 01:33 PM CDT

a python interface to the excellent HDF5 library. worth testing in project.

Dive into Erlang

Posted: 04 Oct 2008 12:24 PM CDT

enjoyed reading this guy's take on Erlang. Of course, he had me with quoting Unix philosophy, "Do one thing and do it well."

Optimal RAID setup for SQL server - Stack Overflow

Posted: 04 Oct 2008 10:35 AM CDT

Excellent Q&A on choosing the optimal RAID config for disk i/o performance. By the by, stackoverflow is an awesome site for programmers!!!

Labels: , , , , , ,

 

Recent Links for 09/21/2007

Newbie - converting csv files to arrays in NumPy
Great message thread on how to convert csv files to numpy arrays.
Cookbook/InputOutput - Numpy and Scipy
File processing examples using numpy, scipy, and matplotlib. How to read/write a numpy array from/to ascii/binary files.
Numpy Example List
Examples of Numpy functions such as fromfile(), hsplit(), recarray(), shuffle(), sort(), split(), sqrt(), std(), tofile(), unique(), var(), vsplit(), where(), zeros(), empty(), and many more.
Introducing Plists: An Erlang module for doing list operations in parallel
Could you spawn a trading system process for each stock of a given day's trading (a list)? What if you had 20,000 stocks for a given day? Can plists/erlang handle 20,000 processes without hitting memory constraints?

Labels: , , ,

 

Recent Links for 09/18/2007


Chapter 22. Struct and Array Modules
Overview of the python struct and array modules

Building Skills in Programming

Nice python tutorial.
Python Grimoire
Nice python cookbook.

Labels:

 

Recent Links for 09/17/2007

Labels: , , ,

 

Recent Links for 09/15/2007

Links for 2007-09-15 [del.icio.us]

Posted: 16 Sep 2007 12:00 AM CDT

Labels: , , , ,

 

Recent Links 09/05/2007

Speed up R, Python, and MATLAB - Going Parallel

Labels: , ,

 

Recent Links 09/04/2007

World Beta - Engineering Targeted Returns and Risk: More On The Endowment Style Of Investing  Annotated

    • World Beta shares some links covering the endowment investing side of things...
      • A link to
        Frontier Capital Management
        - check out their knowledge section for more great papers similar to the ones Faber links to.
      • Faber mentions a great upcoming book covering the twelve top endowment CIO's .
      • from Alpha Magazine...Highbridge Capital Managment shares its office organization - putting traders and developers together.  I've always thought this would be a great idea in any shop.  By putting users and developers together - manual taks can be seen and automation can happen.

     - post by taylortree

SourceForge.net: tkdiff

  • Great little file compare utility.  Graphic front end to the diff program.
    note:  tested this today against a large file/program (well, not that large in my line of work...but I guess to Google's)...couldn't handle it.  But, works great on small files.
     - post by taylortree

Google Mondrian: web-based code review and storage

  • Online code review that works like a blog/wiki.  I wonder...is it possible to create a code review system similar to Mondrian within a source management toolset such as subversion?  Seems like most of the backend is there already...would only need to add some front end tools to display the changes being committed and allow comments on those changes.
     - post by taylortree

Labels: ,

 

Recent Links 09/03/2007

ONLamp.com -- Numerical Python Basics

Programming in R

Finding Duplicate Elements in an Array :: Phil! Gregory  Annotated

Now, suppose that the array is of length n and only contains positive
integers less than n. We can be sure (by the pigeonhole principle)
that there is at least one duplicate.
    So, how do we find the beginning of the cycle? The easiest approach is to
    use Floyd's cycle-finding algorithm. It works roughly like this:
    Start at the beginning of the sequence. Keep track of two values (call
    them ai and aj). At
    each step of the algorithm, move ai one step
    along the sequence, but move aj two steps. Stop
    when ai = aj.

      Labels: ,

       

      Author

      • Mike Taylor
        mike@taylortree.com
        I write about trading systems development, portfolio management, and systems research.

        Subscribe to blog via RSS RSS

      Wise Words

      "True observation begins when one is devoid of set patterns."

      - Bruce Lee

      Archives

      Twitter Updates

      Recent Bookmarks

      From GReader