# Latest content was relocated to https://bintanvictor.wordpress.com. This old blog will be shutdown soon.

## Monday, June 9, 2014

### linear correlation - minefield

If 2 "thingies" A and B have a (linear) correlation of 0.3 (rather low), we can easily interpret it incorrectly.

* If A/B have a physical non-linear relationship, they are not independent but corr will be low. Corr coeff measures Linear relationship only. Stat risk has a lot to say about this.
* if sample size is small, then the calculated corr may not reflect the true population corr
** if we take many, many large samples, the true corr would emerge.
* As [[[Prem Mann]] points out, there may not be a causal relationship between A and B even if corr is high.

The most common confusion in my mind is the context. The 0.3 corr is typically calculated from a sample, but when folks say A and B are weakly correlated, they usually refer to the population. They say things like when A increases , we are likely to see B increase too. We automatically assume some causality.

Due to the factors mentioned above, an observed low corr often doesn't mean A/B's independence in the population (a healthy blood pressure reading doesn't prove complete health). However, an observed strong corr often represents a good evidence of real corr within the population, provided the sample is statistically significant. Here's a classic example. Your friend stares at your head from behind. In each experiment, she either stares (1) or looks away (0). You guess.

A = the actual 1/0

For a small sample of 10, you may see strong correlation between A and B. You feel you could sense the stare from behind. In a large sample, corr is probably 0.

Understanding the concept of corr is simpler within natural processes -- we could repeat experiments to infer the population corr. However in most economic problems the A/B thingies are influenced by human decisions. So it's harder to "manage". The population corr may change over time, or change with gender, or change with nationality. For example, if we take 2 samples, then sample 1 may be from a population with mostly young Asian girls, and sample2 may be from a different population. Then unknown to us, the first population's corr could differ from the 2nd population's.