Posted by Mike Taylor | Thursday, February 11, 2010I received a question from a reader regarding the software I use...more specifically...the open source software I use in trading. Instead of a direct response, I figured this type of question might be useful to other readers of this blog.
My basic trading framework is the following:
|Operating System:||Windows Vista Home Premium|
|Programming Languages:||Python 2.6.2 & R 2.9.1|
|Databases:||SQLite 2.4.1, Numpy 1.3.0, & CSV|
|Programming Editor:||SciTE 1.78|
|Graphing Engines:||Matplotlib 0.98.5 & R|
|Scheduler:||Windows Task Scheduler|
|Shells:||Command.com (DOS) & Cygwin (Bash)|
|Historical Quotes:||CSI & Yahoo Finance|
Choosing Windows as the operating system is mainly out of convenience. As you can see above, the only real item that would prevent a full move to Linux is the historical quote provider, CSI. Everything else can run on another platform or a suitable alternative is available.
Another reason I've stayed with Windows is due to my current job (windows shop). But, I will admit, I have been very close to switching to a Mac the past few months or possibly OpenSUSE. Just haven't taken the bite yet.
On a side note, prior to my current employer...I worked for a University that was really ahead of its time. Every program we developed had to pass a compatibility test, "Could it easily run on another platform?" While this at times was an impossible task due to user requirements...we still always coded with this compatibility in mind. And I've kept this same philosophy in developing the trading simulation engine.
I'm originally a Cobol programmer. Yes, that's right...if you've never heard of one...now you're reading a blog by one. Cobol programmers, the good ones, are very keen on whitespace. When you're throwing a lot of code around...the whitespace is what keeps you sane. And so, when I was trying out the various scripting languages back in the day...Python really struck my fancy. I spent the better part of 9 years trying to force programmers to keep the code pretty in Cobol. Only to see Python come around and truly force programmers to code clean. Over the years, I have worked in various other languages, but I've always stuck with Python.
I think another reason I chose Python was due to WealthLab's Scripting language (Pascal-based). I felt I could build an environment similar to WealthLab that would offer the same scripting ease. So far, Python has done a great job in keeping the framework simple and extensible.
Another language I have used from time to time in my trading is R. I use R mainly to analyze trading results. A few years ago, I actually developed a prototype of the trading simulation engine in R. But, it was too slow. The loops killed it. With the recent development of Revolution Computing's ParallelR...I've often wondered what the results would now be. But, I'm past the point of return with the engine in Python. But, as far as fast analysis of CSV files...it is really hard to beat R.
I struggled several years with how to store and retrieve the historical price series data for the trading simulation engine. The main problem was the data could not fit into memory yet access had to be extremely fast. So, for years I used plain CSV files to store the data.
Basically, reading the CSV files from CSI and writing out new price CSV files with my fixes from possible bad data along with additional calculated fields. At first I stored the data into 1 big CSV file. Then used either the DOS sort or Bash sort command to sort the file by date. I was afraid I would run into file size limits (at the time I was on Windows XP 32-bit). So, I started writing the data out to thousands of files broken down by date. Basically, each file was a date containing all the prices for that date. Worked really well...except analysis on the backend became difficult. Plus, it felt kludgy.
I had always tried to use regular databases for the pricing backend...but they couldn't handle the storage and retrieval rates I required. Just too slow. And yes, I tried all of them: MySQL, PostGreSQL, Firebird, Berkely DB, SQLite, etc.
It wasn't until I read an article by Bret Taylor covering how FriendFeed uses MySQL that I had an idea as to how to use a database to get the best of both worlds - fast storage & retrieval along with slick and easy access to the data. That's when I went back to SQLite and began a massive hacking of code while on a Texas Hill Country vacation. Really bumped the trading simulation engine to another level. The trick to fast storage & retrieval? Use less but bigger rows.
For a memory database? I use numpy. It's a fantastic in-memory multi-dimensional storage tool. I dump the price series from SQLite to numpy to enable row or column-wise retrieval. Only recently have I found the performance hit is a little too much. So, I've removed numpy from one side of the framework. And contemplating removing it from the other side as well. It takes more work to replicate numpy via a dictionary of dictionaries of lists. But, surprisingly, it is worth the effort when dealing with price series. Which means, I may not use numpy in the engine for long. Still a great tool to use for in-memory storage.
Editor, Schedulers, and Shells.
SciTE is hands down my favorite Python editor. I don't like the fancy IDE type stuff. SciTE keeps it simple.
Windows Task Scheduler is for the birds. I should know...my main job is centered around Enterprise Scheduling. But, the windows task scheduler gets the job done most of the time. I just have to code around a lot of the times it misses or doesn't get things quite right. Which is okay...that's life. That's one of the main reasons I have thought about switching to a nix box for cron and the like.
The DOS shell or Bash shell...I don't get too fancy in either. I do use the Bash shell quite a bit in performing global changes in the python code. Or back when the database was CSV based. Again, nix boxes win here. But, us windows developers hopefully can always get a copy of Cygwin to save the day.
I have used CSIdata for many years. Mainly for the following reasons:
- Dividend-adjusted quotes which are essential if analyzing long-term trading systems.
- Adjusted closing price - needed if you wish to test the exclusion of data based on the actual price traded - not the split-adjusted price.
- CSV files - CSI does a great job of building and maintaining CSV files of price history.
- Delisted data - I thought this would be a bigger deal but didn't really impact test results...but still nice to have for confirmation.
- Data is used by several hedge funds and web sites such as Yahoo Finance.
Posted by Mike Taylor | Tuesday, July 21, 2009Funny, I was working through a problem in R today and was seriously wishing R had the same presence as python over at Stack Overflow. Looks like others have this wish as well...and they're doing something about it.
In concert with users online across the country, this session will lead a flashmob to populate Stack Overflow with R language content.
Very cool! Check out R on Stack Overflow. And post those questions!
Posted by Mike Taylor | Monday, October 27, 2008
Posted: 26 Oct 2008 02:28 PM CDT
pre-installed linux provider
Posted: 26 Oct 2008 02:26 PM CDT
pre-installed linux computers (laptops, desktops, servers).
Posted: 26 Oct 2008 11:18 AM CDT
date, time, calendar manipulations in R. Sample functions are diffTimeDate, isWeekday, isWeekend, and the very cool timeNdayOnOrAfter, timeNthNdayInMonth, timeLastNdayInMonth.
Posted: 26 Oct 2008 10:17 AM CDT
details how to wall mount your monitor. very cool.
Posted: 23 Oct 2008 06:59 AM CDT
Posted: 22 Oct 2008 12:38 PM CDT
really cool flex library to display graphviz graphs. haven't explored the flex toolset before...but may have to check it out.
Posted: 21 Oct 2008 12:18 AM CDT
Great summary on Workload Manager (WLM)...including tips for setup and troubleshooting existing setups.
Posted by Mike Taylor | Monday, October 20, 2008
Posted: 20 Oct 2008 12:17 AM CDT
extremely cool application dock for windows.
Posted: 20 Oct 2008 12:12 AM CDT
Great collection of python list examples.
Posted: 19 Oct 2008 01:18 AM CDT
great explanation of python classes. check out the final part discussing the __slots__ feature. basically, reserve attributes...those not defined cannot be assigned.
Posted: 18 Oct 2008 12:30 PM CDT
html version of the pytables userguide.
Posted: 17 Oct 2008 04:22 PM CDT
R doc for barplot
Posted: 12 Oct 2008 11:44 PM CDT
queue manager with python binding. looks to be used as a render manager...but could see other uses as well.
Posted: 12 Oct 2008 11:30 PM CDT
excellent article on building a cheap 24 core x 48GB ram linux cluster.
Posted by Mike Taylor | Thursday, October 09, 2008
Posted: 08 Oct 2008 12:27 PM CDT
examples of barplotting in R - color the bars, horizontal axis, stacked bar graph, and side by side graphs.
Posted: 08 Oct 2008 12:26 PM CDT
common functions in R - just a brief command set.
Posted: 08 Oct 2008 11:59 AM CDT
very cool desktop for ubuntu.
Posted by Mike Taylor | Tuesday, September 23, 2008
Since I'm using python...figured I had to give the matplotlib library a try. It is nice...simple...but something was missing. Couldn't put my finger on it. So, dug around and played with the R language plotting libraries. A bit more my speed...though a bit particular in the settings. Anyway, here's a function I wrote to generate bar charts using R with a replacement for pie charts in mind...
# Simple bar chart - use instead of pie chart when possible.
barPie <- function(xSeries, chTitle="Your Bar Chart", xLab="X Label",
xSeries <- sort(xSeries)
# save off original settings in order to reset on exit
oldPar <- par(no.readonly=TRUE)
# set page margins in inches
# pad 30% for labels
# start plotting at 0.0 unless negative
if (min(xSeries) < 0.0)
xLim = c((min(xSeries) * 1.3), (max(xSeries) * 1.3))
xLim = c(0.00, (max(xSeries) * 1.3))
# horizontal barplot in color baby!
bp <- barplot(xSeries, horiz=T,
xlab=xLab, las=1, col=rainbow(length(xSeries)),
axes=F, cex.names=0.7, main=chTitle)
# if x negative then start label at 0.0
# otherwise, start label at value of x.
xVals = ifelse(xSeries < 0.0, 0.0, xSeries)
text(xVals, bp, paste(xSeries, xDesc, sep=""),pos=4, cex=0.65)
# format x axis
xRange <- formatC(pretty(xSeries), 1, format="f")
axis(1, at=xRange, labels=as.character(xRange), cex.axis=0.75)
#restore par value to previous state
Used data from my portfolio to plot sector allocations and called the function...
sectors <- c(10.64,119.83,162.66,66.48,71.78,35.44,32.77,161.17,53.91,
sectors <- round((sectors/sum(sectors)*100.00), 1)
# write to png driver
barPie(sectors, "Sector Allocation", "Pct Allocated")
# stop writing to png driver
And here's the result...