Wednesday, July 25, 2012

simplified order book design doc - jump

It's tempting to use virtual function processMessage() to process various order types (A, M, X ...) and trade types, but virtual functions add runtime overhead. Template specialization is a more efficient design, but due to the limited timeframe I implemented an efficient and simple alternative.

Assumption: M messages can only change quantity. Price change not allowed -- Sender would need to cancel and submit new order. The B/S and price fields of the order should not change, but validation is omitted in this version.

Assumption: T messages and the corresponding M and X messages (also the initiator A message) are assumed consistent and complete. Validation is technically possible but omitted. Validation failure indicates lost messages.

The cornerstone of the design is the data structure of the order book -- a RB-tree of linked lists. Add is O(logN) due to the tree-insert. Modify is O(1) thanks to the lookup array. Remove is O(1) -- eliminating tree search. This is achieved with the lookup array, and by saving iterator into the order object.

There are 2 containers of pointers -- the map of lists and the lookup-array. It would be better to use container of smart pointers to ease memory management, but STL doesn't provide any smart pointer.

All equality test on doubles are done using "==". Should use some kind of tolerance if time permits.

Here's the documentation in the lookup array class

/*This class encapsulates an array of pointers.
 Assumption 1 -- Exchanges is likely to generate auto-incrementing orderID's. Here's my reasoning. OrderID's are unique, as stated in the question. If orderID generation isn't continuous, then the generator has 2 choices about the inevitable gap between 2 adjacent ID numbers. It can keep the gap forever wasted, or somehow "go back" into a gap and pick a number therein as a new orderID. To do the latter it must keep track of what numbers are already assigned -- rather inefficient. There are proven in-memory algorithms to generate auto-increment identity numbers. I assume an exchange would use them. Auto-increment numbers make a good candidate as array index, but what about the total number range?

 Assumption 2 -- each day the number range has an upper limit. Exchange must agree with exchange members on the format of the orderID. It's likely to be 32 bits, 64 bits etc and won't be a million bits.

 Question 1: Can the exchange use OrderID 98761234 on both IBM and MSFT during a trading day? I don't know and i feel it doesn't matter. Here's the reason.

 Case 1: suppose exchange uses an *independent* auto-increment generator for each stock. So IBM and MSFT generators can both generate 98761234. My design would use one array for IBM and one array for MSFT. For basket orders, additional generator instances might be needed.

 Case 2: suppose exchange uses an independent auto-increment generator for each stock, but each stock uses a non-overlap number range. 98761234 will fall into IBM number range. My design would need to know the number range so as to convert orderID to array index and conserve memory.

 Case 3: suppose exchange uses a singleton auto-increment generator across all stocks (bottleneck inside the exchange). My design would use one gigantic array. Given Assumption 1, the numbers would be quasi-continuous rather than sparse -- below 50% of the range is assigned. Suppose the range is S+1, S+2 ... S+N, then my array would be allocated N elements (where S is orderIDBase). There's a limit on N in reality. Every system is designed for a finite N -- no system can handle 10^9999 (that's one followed by ten thousand zeros) orders in a day. Each array element is a pointer. For a 64-bit machine, N elements take 64N bits or 8N bytes. If I have 640GB memory, N can be 80 billion but not higher. To scale out horizontally, we would hope Case 1 or 2 is the case.

 Therefore the answer to Question 1 shows array of pointer is feasible for the purpose of lookup by orderID. In a real system hash table is likely to be time/space efficient. In this exercise, only STL is available, which provides no hash table. Tree based map has logN time complexity -- too slow. My choice is between a built-in array vs a non-expanding vector. I chose array for simplicity.
 */

No comments:

Total Pageviews

my favorite topics (labels)

_fuxi (302) _misLabel (13) _orig? (3) _rm (2) _vague (2) clarified (58) cpp (39) cpp_const (22) cpp_real (76) cpp/java/c# (101) cppBig4 (54) cppSmartPtr (35) cppSTL (33) cppSTL_itr (27) cppSTL_real (26) cppTemplate (28) creditMkt (14) db (65) db_sybase (43) deepUnder (31) dotnet (20) ECN (27) econ/bank` (36) fin/sys_misc (43) finGreek (34) finReal (45) finRisk (30) finTechDesign (46) finTechMisc (32) finVol (66) FixedIncom (28) fMath (7) fMathOption (33) fMathStoch (67) forex (39) gr8IV_Q (46) GTD_skill (15) GUI_event (30) inMemDB (42) intuit_math (41) intuitFinance (57) javaMisc (68) javaServerSide (13) lambda/delegate (22) marketData (28) math (10) mathStat (55) memIssue (8) memMgmt (66) metaProgram` (6) OO_Design (84) original_content (749) polymorphic/vptr (40) productive (21) ptr/ref (48) py (28) reflect (8) script`/unix (82) socket/stream (39) subquery/join (30) subvert (13) swing/wpf (9) sysProgram` (16) thread (164) thread_CAS (15) thread_cpp (28) Thread* (22) timeSaver (80) transactional (23) tune (24) tuneDB (40) tuneLatency (30) z_ajax (9) z_algoDataStruct (41) z_arch (26) z_arch_job (27) z_automateTest (17) z_autoTrad` (19) z_bestPractice (39) z_bold (83) z_bondMath (35) z_book (18) z_boost (19) z_byRef^Val (32) z_c#GUI (43) z_c#misc (80) z_cast/convert (28) z_container (67) z_cStr/arr (39) z_Favorite* (8) z_FIX (15) z_forex (48) z_fwd_Deal (18) z_gz=job (33) z_gzBig20 (13) z_gzMgr (13) z_gzPain (20) z_gzThreat (19) z_hib (19) z_IDE (52) z_ikm (5) z_IR_misc (36) z_IRS (26) z_javaWeb (28) z_jdbc (10) z_jobFinTech (46) z_jobHunt (20) z_jobRealXp (10) z_jobStrength (15) z_jobUS^asia (27) z_letter (42) z_linq (10) z_memberHid` (11) z_MOM (54) z_nestedClass (5) z_oq (24) z_PCP (12) z_pearl (1) z_php (20) z_prodSupport (7) z_py (31) z_quant (14) z_regex (8) z_rv (38) z_skillist (48) z_slic`Problem (6) z_SOA (14) z_spring (25) z_src_code (8) z_swingMisc (50) z_swingTable (26) z_unpublish (2) z_VBA/Excel (8) z_windoz (17) z_wpfCommand (9)

About Me

New York (Time Square), NY, United States
http://www.linkedin.com/in/tanbin