Skip to main content

Lesson 6: Hypothesis Testing & Correlation

๐Ÿ“น Video Overviewโ€‹

๐Ÿ“Š What We're Learningโ€‹

  1. Hypothesis Testing - How to test if claims about data are true

  2. Relationships Between Variables - How two things are connected

  3. Correlation & Covariance - Measuring the strength of relationships


๐ŸŽฏ PART 1: HYPOTHESIS TESTINGโ€‹

The Big Pictureโ€‹

The Problem: We can't check EVERYONE in a population, so we use a sample. But samples have errors!

The Solution: Design a test that minimizes the chance of making wrong decisions.


๐Ÿ›๏ธ The Court Analogy (Best Way to Understand!)โ€‹

Think of hypothesis testing like a court trial:

Court Analogy for Hypothesis Errors (Type I & Type II)

The Two Hypothesesโ€‹

HypothesisSymbolMeaningCourt Example
Null HypothesisHโ‚€The "default" assumptionPerson is innocent
Alternative HypothesisHโ‚What we're trying to provePerson is guilty

Memory Hack: "Innocent Until Proven Guilty"โ€‹

  • Hโ‚€ = Status quo (nothing special happening)
  • Hโ‚ = The exciting claim (something IS happening)

โš ๏ธ Two Types of Errorsโ€‹

Type I Error (ฮฑ - Alpha)โ€‹

What it is: Rejecting Hโ‚€ when it's actually TRUE

Memory Hack: "False Alarm" ๐Ÿšจ

  • Like sending an innocent person to jail
  • We SET this probability ourselves (usually ฮฑ = 0.05 or 5%)

Example: Saying a medicine works when it actually doesn't


Type II Error (ฮฒ - Beta)โ€‹

What it is: NOT rejecting Hโ‚€ when it's actually FALSE

Memory Hack: "Missed Detection" ๐Ÿ™ˆ

  • Like letting a guilty person go free
  • Harder to control directly

Example: Saying a medicine doesn't work when it actually does


Error Comparison Tableโ€‹

Reality โ†’ Decision โ†“Hโ‚€ is TRUEHโ‚ is TRUE
Don't Reject Hโ‚€โœ“ Correctโœ— Type II Error (ฮฒ)
Reject Hโ‚€โœ— Type I Error (ฮฑ)โœ“ Correct (Power = 1-ฮฒ)

๐Ÿ”ข Key Terms Explained Simplyโ€‹

Level of Significance (ฮฑ)โ€‹

  • The probability of Type I error we're willing to accept
  • Usually 0.05 (5%) or 0.01 (1%)
  • Memory Hack: "How much risk of false alarm we tolerate"

Power of Test (1 - ฮฒ)โ€‹

  • Probability of correctly rejecting Hโ‚€ when it's false
  • Memory Hack: "How good our test is at catching the truth"

Critical Valueโ€‹

  • The dividing line: reject Hโ‚€ on one side, don't reject on the other
  • Memory Hack: "The fence between guilty and not guilty"

๐Ÿ“‹ The 5 Steps of Hypothesis Testingโ€‹

Step-by-Step Example: Do Men and Women Earn Different Salaries?โ€‹

Step 1: State Hypothesesโ€‹

  • Hโ‚€: W(m) = W(f) โ†’ Salaries are the SAME
  • Hโ‚: W(m) โ‰  W(f) โ†’ Salaries are DIFFERENT

Step 2: Choose ฮฑโ€‹

  • Let's use ฮฑ = 0.05 (5% chance of false alarm)

Step 3: Pick Test Statisticโ€‹

  • We use two-sample t-test

Step 4: Calculate T-statisticโ€‹

T = Signal / Noise = Difference between groups / Variability of groups

T = |xฬ„โ‚ - xฬ„โ‚‚| / โˆš(sโ‚ยฒ/nโ‚ + sโ‚‚ยฒ/nโ‚‚)

Memory Hack for T-Test: "Signal vs Noise Radio"โ€‹

  • Signal (numerator): How different are the group averages?
  • Noise (denominator): How much do the groups vary internally?
  • Big T = Strong signal โ†’ Groups ARE different!
  • Small T = Weak signal โ†’ Can't tell them apart from noise

Step 5: Make Decisionโ€‹

  • If T is big enough โ†’ Reject Hโ‚€ (salaries ARE different!)
  • If T is small โ†’ Don't reject Hโ‚€ (not enough evidence)

๐ŸŽฏ PART 2: RELATIONSHIPS BETWEEN VARIABLESโ€‹

What is a Statistical Relationship?โ€‹

Simple Definition: When one variable changes, the other one changes too (in a predictable way)

IMPORTANT: Statistical relationship โ‰  Causal relationship!

Memory Hack: "Ice Cream and Drowning"โ€‹

  • Ice cream sales and drowning deaths are correlated
  • But ice cream doesn't CAUSE drowning!
  • Both happen more in summer (hidden variable)

๐Ÿ“ˆ Scatter Plots (The Visual Way)โ€‹

Example: Experience vs Salaryโ€‹

EmployeeExperience (x)Salary (y)
127,000
2410,000
358,000
4711,000
5813,000
6915,000
71213,000
81416,000
92017,000
102519,000

When you plot this, you see points going UP-RIGHT โ†’ positive relationship!


๐ŸŽจ Reading Scatter Plotsโ€‹

Visual Guideโ€‹

PatternRelationshipExample
โ†—๏ธ Tight linePerfect positive (r = 1)Age vs Height (kids)
โ†—๏ธ Scattered upWeak positive (r โ‰ˆ 0.3)Study time vs Grade
โ†˜๏ธ Tight linePerfect negative (r = -1)Gas in tank vs Distance driven
โ†˜๏ธ Scattered downWeak negative (r โ‰ˆ -0.3)TV time vs Grade
๐ŸŒ RandomNo relationship (r = 0)Shoe size vs IQ
๐Ÿ“ˆ CurveNon-linear (r = 0)Age vs Reaction time

๐Ÿ”— Covarianceโ€‹

What It Measuresโ€‹

How two variables move TOGETHER (but in their original units)

Formulaโ€‹

cov(x,y) = ฮฃ[(xแตข - xฬ„)(yแตข - ศณ)] / n

Memory Hack: "The Direction Detector"โ€‹

  • cov(x,y) > 0 โ†’ Positive relationship (both increase together)
  • cov(x,y) < 0 โ†’ Negative relationship (one up, one down)
  • cov(x,y) = 0 โ†’ No linear relationship

The Problem with Covarianceโ€‹

It depends on the units! Hard to interpret and compare.


โญ Pearson Correlation Coefficient (r)โ€‹

What It Isโ€‹

Standardized covariance โ†’ Same as covariance but on a scale of -1 to 1

Formulaโ€‹

r(x,y) = cov(x,y) / (sโ‚“ ร— sแตง)

Memory Hack: "Covariance with Training Wheels"โ€‹

  • Takes covariance
  • Divides by both standard deviations
  • Now it's always between -1 and 1!

๐ŸŽฏ Interpreting Correlation (r)โ€‹

The Scaleโ€‹

-1 โ†------- 0 -------โ†’ +1

Perfect No Perfect
Negative Relationship Positive

Direction (Look at Sign)โ€‹

  • r > 0 โ†’ Positive relationship โ†—๏ธ
  • r < 0 โ†’ Negative relationship โ†˜๏ธ
  • r = 0 โ†’ No LINEAR relationship

Strength (Look at Absolute Value)โ€‹

| |r| value | Strength | Interpretation | |-----------|----------|----------------| | 0.00 - 0.10 | Negligible | Basically no relationship | | 0.10 - 0.39 | Weak | Slight tendency | | 0.40 - 0.69 | Medium | Noticeable pattern | | 0.70 - 0.89 | Strong | Clear relationship | | 0.90 - 1.00 | Very Strong | Almost perfect |

Memory Hack: "The Absolute Rule"โ€‹

  • r = 0.85 โ†’ STRONG positive
  • r = -0.85 โ†’ STRONG negative (same strength, opposite direction!)

๐Ÿ“Š Complete Example: Stock Returnsโ€‹

Given Dataโ€‹

DayStock A (x)Stock B (y)
11.0%3.0%
21.5%4.5%
32.2%4.7%
41.4%4.0%
50.2%3.5%

Step 1: Calculate Meansโ€‹

  • xฬ„ = (1.0 + 1.5 + 2.2 + 1.4 + 0.2) / 5 = 1.26%
  • ศณ = (3.0 + 4.5 + 4.7 + 4.0 + 3.5) / 5 = 3.94%

Step 2: Calculate Covarianceโ€‹

cov(x,y) = [(1.0-1.26)(3.0-3.94) + (1.5-1.26)(4.5-3.94) + ... + (0.2-1.26)(3.5-3.94)] / 5

cov(x,y) = 0.31

Interpretation: Positive covariance โ†’ stocks tend to move together!

Step 3: Calculate Standard Deviationsโ€‹

  • sโ‚“ = 0.66
  • sแตง = 0.62

Step 4: Calculate Correlationโ€‹

r(x,y) = 0.31 / (0.66 ร— 0.62) = 0.31 / 0.41 = 0.76

Interpretation: r = 0.76 โ†’ STRONG positive correlation

  • When Stock A goes up, Stock B tends to go up too!

๐Ÿ’ฐ BONUS: Portfolio Risk & Returnโ€‹

Key Conceptsโ€‹

Return Formulaโ€‹

Return = (Price(t+1) + Dividend - Price(t)) / Price(t)

Memory Hack: "What you gained divided by what you paid"

Variance of a SUM (IMPORTANT!)โ€‹

var(x + y) = var(x) + var(y) + 2ร—cov(x,y)

Why This Matters for Investingโ€‹

Memory Hack: "Don't Put All Eggs in One Basket"

If you invest in two stocks:

  • Positive correlation: They move together โ†’ more risk!
  • Negative correlation: They move opposite โ†’ LESS risk! (diversification)

๐Ÿ“ Real Example: Portfolio Calculationโ€‹

Scenario: WAR, RECESSION, STABLE, PROSPERITY, PEACEโ€‹

StateProbStock C ReturnStock D Return
WAR15%67%-60%
RECESSION25%-20%-40%
STABLE35%7%13%
PROSPERITY15%27%67%
PEACE10%-33%233%

Calculate Expected Returnsโ€‹

Stock C:

E(rC) = 15%ร—67% + 25%ร—(-20%) + 35%ร—7% + 15%ร—27% + 10%ร—(-33%)
E(rC) = 8%

Stock D:

E(rD) = 15%ร—(-60%) + 25%ร—(-40%) + 35%ร—13% + 15%ร—67% + 10%ร—233%
E(rD) = 19%

Calculate Varianceโ€‹

VAR(rC) = 15%ร—(67%-8%)ยฒ + ... + 10%ร—(-33%-8%)ยฒ
VAR(rC) = 0.0069

VAR(rD) = 0.0152

Calculate Covarianceโ€‹

COV(rC, rD) = 15%ร—(67%-8%)ร—(-60%-19%) + ... + 10%ร—(-33%-8%)ร—(233%-19%)
COV(rC, rD) = -0.00176

Negative covariance = GOOD for portfolio! Stocks move in opposite directions.

Calculate Correlationโ€‹

ฯ = -0.00176 / โˆš(0.0069 ร— 0.0152)
ฯ = -0.171

Interpretation: Weak negative correlation โ†’ some diversification benefit!


๐Ÿง  Memory Hacks Summaryโ€‹

ConceptMemory Trick
Hypothesis Testing"Court trial - innocent until proven guilty"
Type I Error"False alarm - innocent in jail"
Type II Error"Missed detection - guilty goes free"
Alpha (ฮฑ)"How much false alarm risk we accept"
T-test"Signal vs Noise radio"
Covariance"Direction detector (with units)"
Correlation"Covariance with training wheels (-1 to 1)"
Portfolio Risk"Don't put all eggs in one basket"

๐ŸŽฏ Quick Reference Formulasโ€‹

Two-Sample T-Test:
T = |xฬ„โ‚ - xฬ„โ‚‚| / โˆš(sโ‚ยฒ/nโ‚ + sโ‚‚ยฒ/nโ‚‚)

Covariance:
cov(x,y) = ฮฃ[(xแตข - xฬ„)(yแตข - ศณ)] / n

Pearson Correlation:
r(x,y) = cov(x,y) / (sโ‚“ ร— sแตง)

Variance of Sum:
var(x + y) = var(x) + var(y) + 2ร—cov(x,y)

Return:
Return = (Price(t+1) + Dividend - Price(t)) / Price(t)

๐Ÿ“Š Decision Flow Chartโ€‹


๐Ÿ’ก Pro Tips for Examsโ€‹

  1. Hypothesis Testing:

    • Always start with Hโ‚€ (null)
    • Hโ‚ is what you're trying to prove
    • Be careful: "Don't reject" โ‰  "Accept"
  2. Correlation:

    • Sign shows direction (+ or -)
    • Absolute value shows strength
    • r = 0 means NO LINEAR relationship (might be curved!)
  3. Common Mistakes:

    • Don't confuse correlation with causation
    • Don't forget to take square root when going from variance to SD
    • Remember covariance has units, correlation doesn't
  4. Calculator Check:

    • Correlation must be between -1 and 1
    • If you get r = 2.5, you made a mistake!

Good luck! Remember: Statistics is about making the best decision with incomplete information! ๐ŸŽฒ๐Ÿ“Š