Thursday, May 6, 2010

200,000,000 orders in cache, to be queried in real time (UBS

Say you have up to 200,000,000 orders/ticks each day. Traders need to query them like "all IBM", or "all IBM orders above $50". Real time orders. How do you design the server side to be scalable? (Now i think you need a tick db like kdb)

First establish theoretical limits. entire population + response time + up-to-date results -- choose any 2. google forgoes one of them.

Q: do users really need to query this entire population, or just a subset?

This looks like a reporting app. Drill-down is a standard solution. Keep order details (100+ fields) in a separate server. Queries don't need these details.

Once we confirm the population to be scanned (say, all 200,000,000), the request rate (say a few traders, once a while) and acceptable response time (say 10 seconds), we can allocate threads (say 200 in a pool). We can divide each query into 50 parallel queries ie 50 tasks in the task queue. On average we can work on 4 requests concurrently. Others queue up in a JMS queue.

DB techniques to borrow:
* partitioning to achieve near-perfect parallel execution
* indexing -- to avoid full scans? How about gemfire indexes?
* You can also use a standard DB like sybase or DB2 and give enough RAM so no disk IO needed. But it's even better if we can fit entire population in a single JVM -- no serialization, no disk, no network latency.

L1, L2 caches? What to put in the caches? Most used items such as index root nodes.

See post on The initiator mailbox MOM model.

No comments:

Total Pageviews

my favorite topics (labels)

_fuxi (302) _misLabel (13) _orig? (3) _rm (2) _vague (2) clarified (58) cpp (39) cpp_const (22) cpp_real (76) cpp/java/c# (101) cppBig4 (54) cppSmartPtr (35) cppSTL (33) cppSTL_itr (27) cppSTL_real (26) cppTemplate (28) creditMkt (14) db (65) db_sybase (43) deepUnder (31) dotnet (20) ECN (27) econ/bank` (36) fin/sys_misc (43) finGreek (34) finReal (45) finRisk (30) finTechDesign (46) finTechMisc (32) finVol (66) FixedIncom (28) fMath (7) fMathOption (33) fMathStoch (67) forex (39) gr8IV_Q (46) GTD_skill (15) GUI_event (30) inMemDB (42) intuit_math (41) intuitFinance (57) javaMisc (68) javaServerSide (13) lambda/delegate (22) marketData (28) math (10) mathStat (55) memIssue (8) memMgmt (66) metaProgram` (6) OO_Design (84) original_content (749) polymorphic/vptr (40) productive (21) ptr/ref (48) py (28) reflect (8) script`/unix (82) socket/stream (39) subquery/join (30) subvert (13) swing/wpf (9) sysProgram` (16) thread (164) thread_CAS (15) thread_cpp (28) Thread* (22) timeSaver (80) transactional (23) tune (24) tuneDB (40) tuneLatency (30) z_ajax (9) z_algoDataStruct (41) z_arch (26) z_arch_job (27) z_automateTest (17) z_autoTrad` (19) z_bestPractice (39) z_bold (83) z_bondMath (35) z_book (18) z_boost (19) z_byRef^Val (32) z_c#GUI (43) z_c#misc (80) z_cast/convert (28) z_container (67) z_cStr/arr (39) z_Favorite* (8) z_FIX (15) z_forex (48) z_fwd_Deal (18) z_gz=job (33) z_gzBig20 (13) z_gzMgr (13) z_gzPain (20) z_gzThreat (19) z_hib (19) z_IDE (52) z_ikm (5) z_IR_misc (36) z_IRS (26) z_javaWeb (28) z_jdbc (10) z_jobFinTech (46) z_jobHunt (20) z_jobRealXp (10) z_jobStrength (15) z_jobUS^asia (27) z_letter (42) z_linq (10) z_memberHid` (11) z_MOM (54) z_nestedClass (5) z_oq (24) z_PCP (12) z_pearl (1) z_php (20) z_prodSupport (7) z_py (31) z_quant (14) z_regex (8) z_rv (38) z_skillist (48) z_slic`Problem (6) z_SOA (14) z_spring (25) z_src_code (8) z_swingMisc (50) z_swingTable (26) z_unpublish (2) z_VBA/Excel (8) z_windoz (17) z_wpfCommand (9)

About Me

New York (Time Square), NY, United States
http://www.linkedin.com/in/tanbin