Lesson 6: Hypothesis Testing & Correlation
๐น Video Overviewโ
๐ What We're Learningโ
-
Hypothesis Testing - How to test if claims about data are true
-
Relationships Between Variables - How two things are connected
-
Correlation & Covariance - Measuring the strength of relationships
๐ฏ PART 1: HYPOTHESIS TESTINGโ
The Big Pictureโ
The Problem: We can't check EVERYONE in a population, so we use a sample. But samples have errors!
The Solution: Design a test that minimizes the chance of making wrong decisions.
๐๏ธ The Court Analogy (Best Way to Understand!)โ
Think of hypothesis testing like a court trial:
-c8a8ec3d5676f037947e95ce83e6cf0a.png)
The Two Hypothesesโ
| Hypothesis | Symbol | Meaning | Court Example |
|---|---|---|---|
| Null Hypothesis | Hโ | The "default" assumption | Person is innocent |
| Alternative Hypothesis | Hโ | What we're trying to prove | Person is guilty |
Memory Hack: "Innocent Until Proven Guilty"โ
- Hโ = Status quo (nothing special happening)
- Hโ = The exciting claim (something IS happening)
โ ๏ธ Two Types of Errorsโ
Type I Error (ฮฑ - Alpha)โ
What it is: Rejecting Hโ when it's actually TRUE
Memory Hack: "False Alarm" ๐จ
- Like sending an innocent person to jail
- We SET this probability ourselves (usually ฮฑ = 0.05 or 5%)
Example: Saying a medicine works when it actually doesn't
Type II Error (ฮฒ - Beta)โ
What it is: NOT rejecting Hโ when it's actually FALSE
Memory Hack: "Missed Detection" ๐
- Like letting a guilty person go free
- Harder to control directly
Example: Saying a medicine doesn't work when it actually does
Error Comparison Tableโ
| Reality โ Decision โ | Hโ is TRUE | Hโ is TRUE |
|---|---|---|
| Don't Reject Hโ | โ Correct | โ Type II Error (ฮฒ) |
| Reject Hโ | โ Type I Error (ฮฑ) | โ Correct (Power = 1-ฮฒ) |
๐ข Key Terms Explained Simplyโ
Level of Significance (ฮฑ)โ
- The probability of Type I error we're willing to accept
- Usually 0.05 (5%) or 0.01 (1%)
- Memory Hack: "How much risk of false alarm we tolerate"
Power of Test (1 - ฮฒ)โ
- Probability of correctly rejecting Hโ when it's false
- Memory Hack: "How good our test is at catching the truth"
Critical Valueโ
- The dividing line: reject Hโ on one side, don't reject on the other
- Memory Hack: "The fence between guilty and not guilty"
๐ The 5 Steps of Hypothesis Testingโ
Step-by-Step Example: Do Men and Women Earn Different Salaries?โ
Step 1: State Hypothesesโ
- Hโ: W(m) = W(f) โ Salaries are the SAME
- Hโ: W(m) โ W(f) โ Salaries are DIFFERENT
Step 2: Choose ฮฑโ
- Let's use ฮฑ = 0.05 (5% chance of false alarm)
Step 3: Pick Test Statisticโ
- We use two-sample t-test
Step 4: Calculate T-statisticโ
T = Signal / Noise = Difference between groups / Variability of groups
T = |xฬโ - xฬโ| / โ(sโยฒ/nโ + sโยฒ/nโ)
Memory Hack for T-Test: "Signal vs Noise Radio"โ
- Signal (numerator): How different are the group averages?
- Noise (denominator): How much do the groups vary internally?
- Big T = Strong signal โ Groups ARE different!
- Small T = Weak signal โ Can't tell them apart from noise
Step 5: Make Decisionโ
- If T is big enough โ Reject Hโ (salaries ARE different!)
- If T is small โ Don't reject Hโ (not enough evidence)
๐ฏ PART 2: RELATIONSHIPS BETWEEN VARIABLESโ
What is a Statistical Relationship?โ
Simple Definition: When one variable changes, the other one changes too (in a predictable way)
IMPORTANT: Statistical relationship โ Causal relationship!
Memory Hack: "Ice Cream and Drowning"โ
- Ice cream sales and drowning deaths are correlated
- But ice cream doesn't CAUSE drowning!
- Both happen more in summer (hidden variable)
๐ Scatter Plots (The Visual Way)โ
Example: Experience vs Salaryโ
| Employee | Experience (x) | Salary (y) |
|---|---|---|
| 1 | 2 | 7,000 |
| 2 | 4 | 10,000 |
| 3 | 5 | 8,000 |
| 4 | 7 | 11,000 |
| 5 | 8 | 13,000 |
| 6 | 9 | 15,000 |
| 7 | 12 | 13,000 |
| 8 | 14 | 16,000 |
| 9 | 20 | 17,000 |
| 10 | 25 | 19,000 |
When you plot this, you see points going UP-RIGHT โ positive relationship!
๐จ Reading Scatter Plotsโ
Visual Guideโ
| Pattern | Relationship | Example |
|---|---|---|
| โ๏ธ Tight line | Perfect positive (r = 1) | Age vs Height (kids) |
| โ๏ธ Scattered up | Weak positive (r โ 0.3) | Study time vs Grade |
| โ๏ธ Tight line | Perfect negative (r = -1) | Gas in tank vs Distance driven |
| โ๏ธ Scattered down | Weak negative (r โ -0.3) | TV time vs Grade |
| ๐ Random | No relationship (r = 0) | Shoe size vs IQ |
| ๐ Curve | Non-linear (r = 0) | Age vs Reaction time |
๐ Covarianceโ
What It Measuresโ
How two variables move TOGETHER (but in their original units)
Formulaโ
cov(x,y) = ฮฃ[(xแตข - xฬ)(yแตข - ศณ)] / n
Memory Hack: "The Direction Detector"โ
- cov(x,y) > 0 โ Positive relationship (both increase together)
- cov(x,y) < 0 โ Negative relationship (one up, one down)
- cov(x,y) = 0 โ No linear relationship
The Problem with Covarianceโ
It depends on the units! Hard to interpret and compare.
โญ Pearson Correlation Coefficient (r)โ
What It Isโ
Standardized covariance โ Same as covariance but on a scale of -1 to 1
Formulaโ
r(x,y) = cov(x,y) / (sโ ร sแตง)
Memory Hack: "Covariance with Training Wheels"โ
- Takes covariance
- Divides by both standard deviations
- Now it's always between -1 and 1!
๐ฏ Interpreting Correlation (r)โ
The Scaleโ
-1 โ------- 0 -------โ +1
Perfect No Perfect
Negative Relationship Positive
Direction (Look at Sign)โ
- r > 0 โ Positive relationship โ๏ธ
- r < 0 โ Negative relationship โ๏ธ
- r = 0 โ No LINEAR relationship
Strength (Look at Absolute Value)โ
| |r| value | Strength | Interpretation | |-----------|----------|----------------| | 0.00 - 0.10 | Negligible | Basically no relationship | | 0.10 - 0.39 | Weak | Slight tendency | | 0.40 - 0.69 | Medium | Noticeable pattern | | 0.70 - 0.89 | Strong | Clear relationship | | 0.90 - 1.00 | Very Strong | Almost perfect |
Memory Hack: "The Absolute Rule"โ
- r = 0.85 โ STRONG positive
- r = -0.85 โ STRONG negative (same strength, opposite direction!)
๐ Complete Example: Stock Returnsโ
Given Dataโ
| Day | Stock A (x) | Stock B (y) |
|---|---|---|
| 1 | 1.0% | 3.0% |
| 2 | 1.5% | 4.5% |
| 3 | 2.2% | 4.7% |
| 4 | 1.4% | 4.0% |
| 5 | 0.2% | 3.5% |
Step 1: Calculate Meansโ
- xฬ = (1.0 + 1.5 + 2.2 + 1.4 + 0.2) / 5 = 1.26%
- ศณ = (3.0 + 4.5 + 4.7 + 4.0 + 3.5) / 5 = 3.94%
Step 2: Calculate Covarianceโ
cov(x,y) = [(1.0-1.26)(3.0-3.94) + (1.5-1.26)(4.5-3.94) + ... + (0.2-1.26)(3.5-3.94)] / 5
cov(x,y) = 0.31
Interpretation: Positive covariance โ stocks tend to move together!
Step 3: Calculate Standard Deviationsโ
- sโ = 0.66
- sแตง = 0.62
Step 4: Calculate Correlationโ
r(x,y) = 0.31 / (0.66 ร 0.62) = 0.31 / 0.41 = 0.76
Interpretation: r = 0.76 โ STRONG positive correlation
- When Stock A goes up, Stock B tends to go up too!
๐ฐ BONUS: Portfolio Risk & Returnโ
Key Conceptsโ
Return Formulaโ
Return = (Price(t+1) + Dividend - Price(t)) / Price(t)
Memory Hack: "What you gained divided by what you paid"
Variance of a SUM (IMPORTANT!)โ
var(x + y) = var(x) + var(y) + 2รcov(x,y)
Why This Matters for Investingโ
Memory Hack: "Don't Put All Eggs in One Basket"
If you invest in two stocks:
- Positive correlation: They move together โ more risk!
- Negative correlation: They move opposite โ LESS risk! (diversification)
๐ Real Example: Portfolio Calculationโ
Scenario: WAR, RECESSION, STABLE, PROSPERITY, PEACEโ
| State | Prob | Stock C Return | Stock D Return |
|---|---|---|---|
| WAR | 15% | 67% | -60% |
| RECESSION | 25% | -20% | -40% |
| STABLE | 35% | 7% | 13% |
| PROSPERITY | 15% | 27% | 67% |
| PEACE | 10% | -33% | 233% |
Calculate Expected Returnsโ
Stock C:
E(rC) = 15%ร67% + 25%ร(-20%) + 35%ร7% + 15%ร27% + 10%ร(-33%)
E(rC) = 8%
Stock D:
E(rD) = 15%ร(-60%) + 25%ร(-40%) + 35%ร13% + 15%ร67% + 10%ร233%
E(rD) = 19%
Calculate Varianceโ
VAR(rC) = 15%ร(67%-8%)ยฒ + ... + 10%ร(-33%-8%)ยฒ
VAR(rC) = 0.0069
VAR(rD) = 0.0152
Calculate Covarianceโ
COV(rC, rD) = 15%ร(67%-8%)ร(-60%-19%) + ... + 10%ร(-33%-8%)ร(233%-19%)
COV(rC, rD) = -0.00176
Negative covariance = GOOD for portfolio! Stocks move in opposite directions.
Calculate Correlationโ
ฯ = -0.00176 / โ(0.0069 ร 0.0152)
ฯ = -0.171
Interpretation: Weak negative correlation โ some diversification benefit!
๐ง Memory Hacks Summaryโ
| Concept | Memory Trick |
|---|---|
| Hypothesis Testing | "Court trial - innocent until proven guilty" |
| Type I Error | "False alarm - innocent in jail" |
| Type II Error | "Missed detection - guilty goes free" |
| Alpha (ฮฑ) | "How much false alarm risk we accept" |
| T-test | "Signal vs Noise radio" |
| Covariance | "Direction detector (with units)" |
| Correlation | "Covariance with training wheels (-1 to 1)" |
| Portfolio Risk | "Don't put all eggs in one basket" |
๐ฏ Quick Reference Formulasโ
Two-Sample T-Test:
T = |xฬโ - xฬโ| / โ(sโยฒ/nโ + sโยฒ/nโ)
Covariance:
cov(x,y) = ฮฃ[(xแตข - xฬ)(yแตข - ศณ)] / n
Pearson Correlation:
r(x,y) = cov(x,y) / (sโ ร sแตง)
Variance of Sum:
var(x + y) = var(x) + var(y) + 2รcov(x,y)
Return:
Return = (Price(t+1) + Dividend - Price(t)) / Price(t)
๐ Decision Flow Chartโ
๐ก Pro Tips for Examsโ
-
Hypothesis Testing:
- Always start with Hโ (null)
- Hโ is what you're trying to prove
- Be careful: "Don't reject" โ "Accept"
-
Correlation:
- Sign shows direction (+ or -)
- Absolute value shows strength
- r = 0 means NO LINEAR relationship (might be curved!)
-
Common Mistakes:
- Don't confuse correlation with causation
- Don't forget to take square root when going from variance to SD
- Remember covariance has units, correlation doesn't
-
Calculator Check:
- Correlation must be between -1 and 1
- If you get r = 2.5, you made a mistake!
Good luck! Remember: Statistics is about making the best decision with incomplete information! ๐ฒ๐