Sunday, September 5, 2010

sort a large joined table by any field

Hi ZR,

Let's start with a different challenge -- say I have a big table of 10GB data, and a lot of users. If I peek at all the queries by these users, I see they search by every single column! One day I see a column that's not the first column of any index, I know we have a real problem -- a search by this column is a full table scan. Even if this happens once a month, this is unacceptable, as it means one of the valid queries simply won't work.

My solution -- create indices for every column, but make this table readonly for 23 hours, and update it during the 1-hour window. Having so many indices will slow down insert/update/delete. Writers may lock some data and slow down queries.

Assumption -- if there are enough indices, then search will be fast enough. This is not true if an index has a very fat key and each index page contains just one key value, and the index tree is very very deep. Also, some searches can't run fast -- search by gender, search where age > 0, search for non-null names... Optimizer will be smart enough to ignore indices since FTS is best query plan. I think these are rare and obvious, so assumption is probably safe.

Now let's modify the challenge -- a big "worktable" after a join, not a big physical table. Say table A has columns a1 a2 a3 .., table B has b1, b2, b3 .. and they are joined.

My solution -- create indices on all search columns (in our case all columns). Experiment and keep the physical tables writable all day. Since table A is now much narrower than the big table earlier, writers may be faster. Writes to tables A and B can execute independently. This is similar to the java ConcurrentHashMap design.

From my experience, if table A has 10 columns, it's ok to create that many indices. If a big table has 200 columns, i am not sure.

Now, your question of sorting a large joined table by any field. I believe the last solution may work. Every sort column is covered by an index, so the driver table will be the host table of the sort column, using the index on the sort column. All rows will come out in the right sequence without further sorting.

http://en.wikipedia.org/wiki/Star_schema

No comments:

Total Pageviews

my favorite topics (labels)

_fuxi (302) _misLabel (13) _orig? (3) _rm (2) _vague (2) clarified (58) cpp (39) cpp_const (22) cpp_real (76) cpp/java/c# (101) cppBig4 (54) cppSmartPtr (35) cppSTL (33) cppSTL_itr (27) cppSTL_real (26) cppTemplate (28) creditMkt (14) db (65) db_sybase (43) deepUnder (31) dotnet (20) ECN (27) econ/bank` (36) fin/sys_misc (43) finGreek (34) finReal (45) finRisk (30) finTechDesign (46) finTechMisc (32) finVol (66) FixedIncom (28) fMath (7) fMathOption (33) fMathStoch (67) forex (39) gr8IV_Q (46) GTD_skill (15) GUI_event (30) inMemDB (42) intuit_math (41) intuitFinance (57) javaMisc (68) javaServerSide (13) lambda/delegate (22) marketData (28) math (10) mathStat (55) memIssue (8) memMgmt (66) metaProgram` (6) OO_Design (84) original_content (749) polymorphic/vptr (40) productive (21) ptr/ref (48) py (28) reflect (8) script`/unix (82) socket/stream (39) subquery/join (30) subvert (13) swing/wpf (9) sysProgram` (16) thread (164) thread_CAS (15) thread_cpp (28) Thread* (22) timeSaver (80) transactional (23) tune (24) tuneDB (40) tuneLatency (30) z_ajax (9) z_algoDataStruct (41) z_arch (26) z_arch_job (27) z_automateTest (17) z_autoTrad` (19) z_bestPractice (39) z_bold (83) z_bondMath (35) z_book (18) z_boost (19) z_byRef^Val (32) z_c#GUI (43) z_c#misc (80) z_cast/convert (28) z_container (67) z_cStr/arr (39) z_Favorite* (8) z_FIX (15) z_forex (48) z_fwd_Deal (18) z_gz=job (33) z_gzBig20 (13) z_gzMgr (13) z_gzPain (20) z_gzThreat (19) z_hib (19) z_IDE (52) z_ikm (5) z_IR_misc (36) z_IRS (26) z_javaWeb (28) z_jdbc (10) z_jobFinTech (46) z_jobHunt (20) z_jobRealXp (10) z_jobStrength (15) z_jobUS^asia (27) z_letter (42) z_linq (10) z_memberHid` (11) z_MOM (54) z_nestedClass (5) z_oq (24) z_PCP (12) z_pearl (1) z_php (20) z_prodSupport (7) z_py (31) z_quant (14) z_regex (8) z_rv (38) z_skillist (48) z_slic`Problem (6) z_SOA (14) z_spring (25) z_src_code (8) z_swingMisc (50) z_swingTable (26) z_unpublish (2) z_VBA/Excel (8) z_windoz (17) z_wpfCommand (9)

About Me

New York (Time Square), NY, United States
http://www.linkedin.com/in/tanbin