- Top 100 price gainers in your price series database;
- Top 10 most volatile stocks in your price series database;
- Top 5 longest running batch jobs in your operations arena;
- Any many more...
import heapq class MaxItems(object): """ Caches the max X items of whatever you're analyzing and returns a list containing only those max X values and associated items. """ def __init__(self, size): self.size = size self._recs = [] def push(self, value, items): if len(self._recs) < self.size: heapq.heappush(self._recs, (value, items)) else: minitem = heapq.heappushpop(self._recs, (value, items)) def items(self): return heapq.nlargest(self.size, self._recs)Example call and results:
pricegains = [] pricegains.append({'symbol':'GOOG', 'gain':234.0}) pricegains.append({'symbol':'YHOO', 'gain':124.0}) pricegains.append({'symbol':'IBM', 'gain':1242.0}) pricegains.append({'symbol':'GE', 'gain':1800.0}) pricegains.append({'symbol':'ENE', 'gain':0.0}) pricegains.append({'symbol':'KREM', 'gain':12.0}) maxitems = MaxItems(3) for row in pricegains: maxitems.push(row['gain'], row) print maxitems.items() ---------------------------------------------------------- Results of call: (1800.0, {'symbol': 'GE', 'gain': 1800.0}) (1242.0, {'symbol': 'IBM', 'gain': 1242.0}) (234.0, {'symbol': 'GOOG', 'gain': 234.0})The heapq module works nicely in accomplishing the task. What's ironic is Python's heapq module implements the min-heap algorithm which works out nicely and efficiently in determining the maximum values over a list. But, does not work out so efficiently for determining the minimum values over a list.
I'll cover the MinItems class in another post. But, to give you a hint of what does work in collecting the minimum values over a list is one of the alternatives I explored in building the MaxItems class...
Alternative yet Inefficient version of MaxItems:
import bisect class MaxItems2(object): """ Caches the max X items of whatever you're analyzing and returns a list containing only those max X values and associated items. """ def __init__(self, size): self.size = size self._recs = [] def push(self, value, items): if len(self._recs) < self.size: bisect.insort(self._recs, (value, items)) elif bisect.bisect(self._recs, (value, items)) > self.size: bisect.insort(self._recs, (value, items)) minitem = self._recs.pop(0) def items(self): return sorted(self._recs, reverse=True)MaxItems2 takes advantage of the bisect module and while it works great; performance is at a minimum 2x worse on average than using the heapq method.
Test Code:
import random pricegains = [] maxitems = MaxItems(100) for x in xrange(500000): gain = random.uniform(1.0,500.0) maxitems.push(gain, ('This', 'is', 'Record')) rows = maxitems.items()Calling the above code with the wonderful timeit module produced the following results:
- heapq method: Ten iterations finished in 1.90 seconds.
- bisect method: Ten iterations finished in 3.80 seconds.
MT
No comments:
Post a Comment