Wednesday, May 27, 2009

table constraints

For simple data validation like Price > 0, you can implement in the application loading into the table, or in the application reading from the table, or you can use table constraints.

* reusable -- What if another source system (beside LE) needs to load into this table? Table constraints are automatically reusable with zero effort.

* testing -- Table constraints are extremely simple and reliable that we need not test them.

* Table constraints are easier to switch on/off. We can drop/add each constraint individually, without source code migration and regression tests

* flexible -- we can adjust a table constraint more easily than in application. Drop and create a new constraint. No source code change. No test required. No regression test either.

* modular -- table constraints are inherently more modular and less coupled than application modules. Each constraint exists on its own and can be removed and adjusted on its own. They don't interfere with each other.

* risk -- Table constraints are more reliable and there will be less risk of bad data affecting our trailers. Validation in application can fail due to bugs. That's why people measure "test coverage".

* gate keeper -- Table constraints are more reliable than validations in application. They are gate keepers -- There's absolutely no way to bypass a constraint, not even by bcp.

* visible -- Table constraints are more visible and gives us confidence that no data could possibly exist in violation of the rule.

* data quality -- You know .... both suffer from data quality. They know they should validate more, but validation in application is non-trivial. Reality is we are short on resources.

* if we keep a particular validation as a table constraint, then we don't have to check that particular validation inside the loader AND again in the downstream trailer calculator (less testing too). You mentioned reusable valiation module. This way we don't need any.

* hand-work -- In contingency situations, it's extremely valuable to have the option to issue SQL insert/update/bcp. Table constraints will offer some data validation. From my experience, i have not seen many input feed tables that never need hand work. I believe within 6 months after go-live, we will need hand insert/update on this table.

* Informatica -- Informatica is a huge investment waiting for ROI and we might one day consider using it to load lot data. Table constraints work well with Informatica.

* LOE -- The more validation we implement in application, the higher the total LOE. That's one reason we have zero constraint in our tables. We are tight on resources.

* As a principle, people usually validate as early as possible, and avoid inserting any invalid data at all. Folks reading our
tables (perhaps from another team) may not know "Hey this is a raw input table so not everything is usable." Once we load stuff into
a commissions table, people usually think it's usable data. Out of our 100+ tables, do you know which ones can have invalid data?

No comments:

Total Pageviews

my favorite topics (labels)

_fuxi (302) _misLabel (13) _orig? (3) _rm (2) _vague (2) clarified (58) cpp (39) cpp_const (22) cpp_real (76) cpp/java/c# (101) cppBig4 (54) cppSmartPtr (35) cppSTL (33) cppSTL_itr (27) cppSTL_real (26) cppTemplate (28) creditMkt (14) db (65) db_sybase (43) deepUnder (31) dotnet (20) ECN (27) econ/bank` (36) fin/sys_misc (43) finGreek (34) finReal (45) finRisk (30) finTechDesign (46) finTechMisc (32) finVol (66) FixedIncom (28) fMath (7) fMathOption (33) fMathStoch (67) forex (39) gr8IV_Q (46) GTD_skill (15) GUI_event (30) inMemDB (42) intuit_math (41) intuitFinance (57) javaMisc (68) javaServerSide (13) lambda/delegate (22) marketData (28) math (10) mathStat (55) memIssue (8) memMgmt (66) metaProgram` (6) OO_Design (84) original_content (749) polymorphic/vptr (40) productive (21) ptr/ref (48) py (28) reflect (8) script`/unix (82) socket/stream (39) subquery/join (30) subvert (13) swing/wpf (9) sysProgram` (16) thread (164) thread_CAS (15) thread_cpp (28) Thread* (22) timeSaver (80) transactional (23) tune (24) tuneDB (40) tuneLatency (30) z_ajax (9) z_algoDataStruct (41) z_arch (26) z_arch_job (27) z_automateTest (17) z_autoTrad` (19) z_bestPractice (39) z_bold (83) z_bondMath (35) z_book (18) z_boost (19) z_byRef^Val (32) z_c#GUI (43) z_c#misc (80) z_cast/convert (28) z_container (67) z_cStr/arr (39) z_Favorite* (8) z_FIX (15) z_forex (48) z_fwd_Deal (18) z_gz=job (33) z_gzBig20 (13) z_gzMgr (13) z_gzPain (20) z_gzThreat (19) z_hib (19) z_IDE (52) z_ikm (5) z_IR_misc (36) z_IRS (26) z_javaWeb (28) z_jdbc (10) z_jobFinTech (46) z_jobHunt (20) z_jobRealXp (10) z_jobStrength (15) z_jobUS^asia (27) z_letter (42) z_linq (10) z_memberHid` (11) z_MOM (54) z_nestedClass (5) z_oq (24) z_PCP (12) z_pearl (1) z_php (20) z_prodSupport (7) z_py (31) z_quant (14) z_regex (8) z_rv (38) z_skillist (48) z_slic`Problem (6) z_SOA (14) z_spring (25) z_src_code (8) z_swingMisc (50) z_swingTable (26) z_unpublish (2) z_VBA/Excel (8) z_windoz (17) z_wpfCommand (9)

About Me

New York (Time Square), NY, United States
http://www.linkedin.com/in/tanbin