TaylorTree: R

Showing posts with label R. Show all posts

Thursday, February 11, 2010

Trading Framework Part I: Tools I Use

I received a question from a reader regarding the software I use...more specifically...the open source software I use in trading. Instead of a direct response, I figured this type of question might be useful to other readers of this blog.

My basic trading framework is the following:

Operating System:	Windows Vista Home Premium
Programming Languages:	Python 2.6.2 & R 2.9.1
Databases:	SQLite 2.4.1, Numpy 1.3.0, & CSV
Programming Editor:	SciTE 1.78
Graphing Engines:	Matplotlib 0.98.5 & R
GUI:	HTML & JavaScript
Scheduler:	Windows Task Scheduler
Shells:	Command.com (DOS) & Cygwin (Bash)
Historical Quotes:	CSI & Yahoo Finance

Operating System
Choosing Windows as the operating system is mainly out of convenience. As you can see above, the only real item that would prevent a full move to Linux is the historical quote provider, CSI. Everything else can run on another platform or a suitable alternative is available.

Another reason I've stayed with Windows is due to my current job (windows shop). But, I will admit, I have been very close to switching to a Mac the past few months or possibly OpenSUSE. Just haven't taken the bite yet.

On a side note, prior to my current employer...I worked for a University that was really ahead of its time. Every program we developed had to pass a compatibility test, "Could it easily run on another platform?" While this at times was an impossible task due to user requirements...we still always coded with this compatibility in mind. And I've kept this same philosophy in developing the trading simulation engine.

Programming Languages
I'm originally a Cobol programmer. Yes, that's right...if you've never heard of one...now you're reading a blog by one. Cobol programmers, the good ones, are very keen on whitespace. When you're throwing a lot of code around...the whitespace is what keeps you sane. And so, when I was trying out the various scripting languages back in the day...Python really struck my fancy. I spent the better part of 9 years trying to force programmers to keep the code pretty in Cobol. Only to see Python come around and truly force programmers to code clean. Over the years, I have worked in various other languages, but I've always stuck with Python.

I think another reason I chose Python was due to WealthLab's Scripting language (Pascal-based). I felt I could build an environment similar to WealthLab that would offer the same scripting ease. So far, Python has done a great job in keeping the framework simple and extensible.

Another language I have used from time to time in my trading is R. I use R mainly to analyze trading results. A few years ago, I actually developed a prototype of the trading simulation engine in R. But, it was too slow. The loops killed it. With the recent development of Revolution Computing's ParallelR...I've often wondered what the results would now be. But, I'm past the point of return with the engine in Python. But, as far as fast analysis of CSV files...it is really hard to beat R.

Databases
I struggled several years with how to store and retrieve the historical price series data for the trading simulation engine. The main problem was the data could not fit into memory yet access had to be extremely fast. So, for years I used plain CSV files to store the data.

Basically, reading the CSV files from CSI and writing out new price CSV files with my fixes from possible bad data along with additional calculated fields. At first I stored the data into 1 big CSV file. Then used either the DOS sort or Bash sort command to sort the file by date. I was afraid I would run into file size limits (at the time I was on Windows XP 32-bit). So, I started writing the data out to thousands of files broken down by date. Basically, each file was a date containing all the prices for that date. Worked really well...except analysis on the backend became difficult. Plus, it felt kludgy.

I had always tried to use regular databases for the pricing backend...but they couldn't handle the storage and retrieval rates I required. Just too slow. And yes, I tried all of them: MySQL, PostGreSQL, Firebird, Berkely DB, SQLite, etc.

It wasn't until I read an article by Bret Taylor covering how FriendFeed uses MySQL that I had an idea as to how to use a database to get the best of both worlds - fast storage & retrieval along with slick and easy access to the data. That's when I went back to SQLite and began a massive hacking of code while on a Texas Hill Country vacation. Really bumped the trading simulation engine to another level. The trick to fast storage & retrieval? Use less but bigger rows.

For a memory database? I use numpy. It's a fantastic in-memory multi-dimensional storage tool. I dump the price series from SQLite to numpy to enable row or column-wise retrieval. Only recently have I found the performance hit is a little too much. So, I've removed numpy from one side of the framework. And contemplating removing it from the other side as well. It takes more work to replicate numpy via a dictionary of dictionaries of lists. But, surprisingly, it is worth the effort when dealing with price series. Which means, I may not use numpy in the engine for long. Still a great tool to use for in-memory storage.

Graphing Engines and GUI

I really try to keep it simple in the front-end of the trading framework. I use Matplotlib to visualize price or trading results. And HTML along with Javascript to display trading statistics. Honestly, not a lot has gone into this side of things. Still very raw. My goal for 2010 is to work more in this area.

I have used R quite a bit in analyzing the output of the trading backtests. R is really powerful here. Quickly and easily chart and/or view pretty much any subset of the data you wish.

If there's certain items I look at over and over in the backtests...I'll typically replicate in Python & Matplotlib and include in the backtest results.

Editor, Schedulers, and Shells.
SciTE is hands down my favorite Python editor. I don't like the fancy IDE type stuff. SciTE keeps it simple.

Windows Task Scheduler is for the birds. I should know...my main job is centered around Enterprise Scheduling. But, the windows task scheduler gets the job done most of the time. I just have to code around a lot of the times it misses or doesn't get things quite right. Which is okay...that's life. That's one of the main reasons I have thought about switching to a nix box for cron and the like.

The DOS shell or Bash shell...I don't get too fancy in either. I do use the Bash shell quite a bit in performing global changes in the python code. Or back when the database was CSV based. Again, nix boxes win here. But, us windows developers hopefully can always get a copy of Cygwin to save the day.

Historical Quotes
I have used CSIdata for many years. Mainly for the following reasons:

Dividend-adjusted quotes which are essential if analyzing long-term trading systems.
Adjusted closing price - needed if you wish to test the exclusion of data based on the actual price traded - not the split-adjusted price.
CSV files - CSI does a great job of building and maintaining CSV files of price history.
Delisted data - I thought this would be a bigger deal but didn't really impact test results...but still nice to have for confirmation.
Data is used by several hedge funds and web sites such as Yahoo Finance.

The only drawback I have to CSI is the daily limit to the number of stocks you can export out of the product. It can get frustrating trying to work around the limit. Of course, you can always pony up for a higher limit.

This covers Part I of the series. Next up? The type of analysis I perform with the trading framework.

Later Trades,

Tuesday, July 21, 2009

R on Stack Overflow...

Funny, I was working through a problem in R today and was seriously wishing R had the same presence as python over at Stack Overflow. Looks like others have this wish as well...and they're doing something about it.

In concert with users online across the country, this session will lead a flashmob to populate Stack Overflow with R language content.

Very cool! Check out R on Stack Overflow. And post those questions!

MT

Monday, October 27, 2008

What I'm Researching...

Linux Server, Linux Hardware

Posted: 26 Oct 2008 02:28 PM CDT

pre-installed linux provider

system76, Inc.

Posted: 26 Oct 2008 02:26 PM CDT

pre-installed linux computers (laptops, desktops, servers).

The R fCalendar package (pdf)

Posted: 26 Oct 2008 11:18 AM CDT

date, time, calendar manipulations in R. Sample functions are diffTimeDate, isWeekday, isWeekend, and the very cool timeNdayOnOrAfter, timeNthNdayInMonth, timeLastNdayInMonth.

How To... Mount Your Computer Screen

Posted: 26 Oct 2008 10:17 AM CDT

details how to wall mount your monitor. very cool.

Javascript style dot notation for dictionary keys unpythonic? - Stack Overflow

Posted: 23 Oct 2008 06:59 AM CDT

great thread on object-style dot notation. instead of stock['id'], this thread shows how to create stock.id.

vizierfx - Google Code

Posted: 22 Oct 2008 12:38 PM CDT

really cool flex library to display graphviz graphs. haven't explored the flex toolset before...but may have to check it out.

z/OS Workload Manager - How it works & How to use it (pdf)

Posted: 21 Oct 2008 12:18 AM CDT

Great summary on Workload Manager (WLM)...including tips for setup and troubleshooting existing setups.

Monday, October 20, 2008

What I'm Researching...

RocketDock - About RocketDock

Posted: 20 Oct 2008 12:17 AM CDT

extremely cool application dock for windows.

Python Programming/Lists - Wikibooks, collection of open-content textbooks

Posted: 20 Oct 2008 12:12 AM CDT

Great collection of python list examples.

Introduction To New-Style Classes in Python

Posted: 19 Oct 2008 01:18 AM CDT

great explanation of python classes. check out the final part discussing the __slots__ feature. basically, reserve attributes...those not defined cannot be assigned.

PyTables User's Guide

Posted: 18 Oct 2008 12:30 PM CDT

html version of the pytables userguide.

rdoc:graphics:barplot [R Wiki]

Posted: 17 Oct 2008 04:22 PM CDT

R doc for barplot

Welcome to DrQueue Commercial Website

Posted: 12 Oct 2008 11:44 PM CDT

queue manager with python binding. looks to be used as a render manager...but could see other uses as well.

Building home linux render cluster

Posted: 12 Oct 2008 11:30 PM CDT

excellent article on building a cheap 24 core x 48GB ram linux cluster.

Thursday, October 09, 2008

What I'm Researching...

Tips For Barplots in R

Posted: 08 Oct 2008 12:27 PM CDT

examples of barplotting in R - color the bars, horizontal axis, stacked bar graph, and side by side graphs.

R Functions and Procedures We Should Know

Posted: 08 Oct 2008 12:26 PM CDT

common functions in R - just a brief command set.

Gmail, Weather, Beauty on your Ubuntu Desktop | Quick Tweaks

Posted: 08 Oct 2008 11:59 AM CDT

very cool desktop for ubuntu.

Tuesday, September 23, 2008

Barplot function in R

Much of my backtesting platform is text driven. Not that I'm opposed to graphs...just felt my time was better spent developing the foundation for the platform before adding bells and whistles. Little did I realize how difficult it is to find a simple graphing engine for the platform. Problem is...I'm old school...couldn't care less about flash graphs. Keep it simple.

Since I'm using python...figured I had to give the matplotlib library a try. It is nice...simple...but something was missing. Couldn't put my finger on it. So, dug around and played with the R language plotting libraries. A bit more my speed...though a bit particular in the settings. Anyway, here's a function I wrote to generate bar charts using R with a replacement for pie charts in mind...


#-----------------------------------------------------------------
# Simple bar chart - use instead of pie chart when possible.
#-----------------------------------------------------------------
barPie <- function(xSeries, chTitle="Your Bar Chart", xLab="X Label",
                     xDesc="%")
{
 xSeries <- sort(xSeries)

 # save off original settings in order to reset on exit
 oldPar <- par(no.readonly=TRUE)

 plot.new()

 # set page margins in inches
 par(mai=c(1,1.5,1,1))


 # pad 30% for labels
 # start plotting at 0.0 unless negative
 if (min(xSeries) < 0.0)
 {
     xLim = c((min(xSeries) * 1.3), (max(xSeries) * 1.3))
 }
 else
 {
     xLim = c(0.00, (max(xSeries) * 1.3))
 }

 # horizontal barplot in color baby!
 bp <- barplot(xSeries, horiz=T,
       xlab=xLab, las=1, col=rainbow(length(xSeries)),
       xlim=xLim,
       axes=F, cex.names=0.7, main=chTitle)

 # if x negative then start label at 0.0
 # otherwise, start label at value of x.
 xVals = ifelse(xSeries < 0.0, 0.0, xSeries)
 text(xVals, bp, paste(xSeries, xDesc, sep=""),pos=4, cex=0.65)

 # format x axis
 xRange <- formatC(pretty(xSeries), 1, format="f")

 axis(1, at=xRange, labels=as.character(xRange), cex.axis=0.75)
 box()

 #restore par value to previous state
 on.exit(par(oldPar))
}

Used data from my portfolio to plot sector allocations and called the function...


sectors <- c(10.64,119.83,162.66,66.48,71.78,35.44,32.77,161.17,53.91,
               101.81,53.38,231.45,31.24,103.01)
sectors <- round((sectors/sum(sectors)*100.00), 1)

# write to png driver
png("c:/taylortrade/rlang/sectors_test.png")

barPie(sectors, "Sector Allocation", "Pct Allocated")

# stop writing to png driver
dev.off()

And here's the result...