Tuesday, January 17, 2012

columnar DB + time series in-memory DB

(Focus of this write-up is in-memory time series DB, with some comments on columnar DB.)

Time series data have timestamps at fixed intervals. ("Usually fixed" is the correct wording in general purpose definitions of Time Series.) Strictly fixed intervals make system design much cleaner and more efficient.

When a new snapshot (with 9 fields for example) arrives, inserting a single row to disk is efficient for row-oriented. Columnar DB would be slow for such a disk insert, but in-memory is fine.

For bulk insert/updates however, columnar wins. (Exchange data typically come in bursts.)

My idea -- A simplistic and idealistic implementation could use 9 pre-allocated half-empty arrays for that many columns of the table. New snapshot is broken down to the 9 constituents, each inserted to one of the 9 arrays. I'd say each array can (and therefore should) be fixed-width since each element of the "price" array occupies exactly 4 bytes. Such an array is random-access if in-memory. Extreme simplicity begets extreme efficiency. Reconstructing a full snapshot record isn't that hard given the position in the array -- random-access by position hitting all 9 arrays (concurrently?). I feel it's no slower than (the conventional practice of) storing the entire row record in contiguous memory.

(SecDB includes a regular data store + a time-series data store but not sure of its implementation...)
----------
Columnar DB has many applications besides time-series, such as OLAP/business-intelligence, warehouse. Major implementations include Sybase-IQ, MS-SQL, BigTable, Cassandra, HBase.

Benefit #1 -- aggregation -- fast aggregate operations (max,avg,count..) over a single column. You need not read all the other irrelevant columns.

Benefit -- write -- bulk inserts and updates, perhaps concurrently on the 9 arrays.

Benefit -- index -- easier and faster. "columnar structure for the database simplifies indexing and joins, therefore dramatically speeds search performance"

No comments:

Total Pageviews

my favorite topics (labels)

_fuxi (302) _misLabel (13) _orig? (3) _rm (2) _vague (2) clarified (58) cpp (39) cpp_const (22) cpp_real (76) cpp/java/c# (101) cppBig4 (54) cppSmartPtr (35) cppSTL (33) cppSTL_itr (27) cppSTL_real (26) cppTemplate (28) creditMkt (14) db (65) db_sybase (43) deepUnder (31) dotnet (20) ECN (27) econ/bank` (36) fin/sys_misc (43) finGreek (34) finReal (45) finRisk (30) finTechDesign (46) finTechMisc (32) finVol (66) FixedIncom (28) fMath (7) fMathOption (33) fMathStoch (67) forex (39) gr8IV_Q (46) GTD_skill (15) GUI_event (30) inMemDB (42) intuit_math (41) intuitFinance (57) javaMisc (68) javaServerSide (13) lambda/delegate (22) marketData (28) math (10) mathStat (55) memIssue (8) memMgmt (66) metaProgram` (6) OO_Design (84) original_content (749) polymorphic/vptr (40) productive (21) ptr/ref (48) py (28) reflect (8) script`/unix (82) socket/stream (39) subquery/join (30) subvert (13) swing/wpf (9) sysProgram` (16) thread (164) thread_CAS (15) thread_cpp (28) Thread* (22) timeSaver (80) transactional (23) tune (24) tuneDB (40) tuneLatency (30) z_ajax (9) z_algoDataStruct (41) z_arch (26) z_arch_job (27) z_automateTest (17) z_autoTrad` (19) z_bestPractice (39) z_bold (83) z_bondMath (35) z_book (18) z_boost (19) z_byRef^Val (32) z_c#GUI (43) z_c#misc (80) z_cast/convert (28) z_container (67) z_cStr/arr (39) z_Favorite* (8) z_FIX (15) z_forex (48) z_fwd_Deal (18) z_gz=job (33) z_gzBig20 (13) z_gzMgr (13) z_gzPain (20) z_gzThreat (19) z_hib (19) z_IDE (52) z_ikm (5) z_IR_misc (36) z_IRS (26) z_javaWeb (28) z_jdbc (10) z_jobFinTech (46) z_jobHunt (20) z_jobRealXp (10) z_jobStrength (15) z_jobUS^asia (27) z_letter (42) z_linq (10) z_memberHid` (11) z_MOM (54) z_nestedClass (5) z_oq (24) z_PCP (12) z_pearl (1) z_php (20) z_prodSupport (7) z_py (31) z_quant (14) z_regex (8) z_rv (38) z_skillist (48) z_slic`Problem (6) z_SOA (14) z_spring (25) z_src_code (8) z_swingMisc (50) z_swingTable (26) z_unpublish (2) z_VBA/Excel (8) z_windoz (17) z_wpfCommand (9)

About Me

New York (Time Square), NY, United States
http://www.linkedin.com/in/tanbin