Location and Dispersion Metrics
๐น Video Overviewโ
๐ฏ What We're Learning Todayโ
Main Topics:
-
Review: Distribution shapes and central measures
-
Location Metrics: Quartiles, Deciles, Percentiles
-
Dispersion Metrics: How spread out is the data?
-
Range & Interquartile Range
-
Variance & Standard Deviation
-
Part 1: Quick Review - Distribution Shapesโ
The 5 Main Distribution Types:โ
Quick Reference Table:โ
| Distribution Type | Relationship | Visual |
|---|---|---|
| Normal (Bell) | Mode = Median = Mean | Perfect symmetry ๐ |
| Dual Peak | Two Modes, Med = Mean | Two humps ๐ซ |
| Uniform | No clear mode, Med = Mean | Flat line ๐ |
| Positive Skew (Right) | Mode < Median < Mean | Tail โ right ๐ |
| Negative Skew (Left) | Mean < Median < Mode | Tail โ left ๐ |
๐ก Memory hack: "Mean follows the tail like a puppy!"
๐งฉ Classic Exam Question (Must Know!)โ
Q: "University graduate wages are positively skewed. Therefore, the percentage earning above average is greater than the percentage earning below average."
TRUE or FALSE?
Answer: FALSE!
Why?
-
Positively skewed = tail extends to the right (high earners)
-
Mean gets pulled UP by extreme high values
-
Mean > Median
-
Median splits data 50-50:
-
50% below median
-
50% above median
-
-
Since Mean > Median:
-
MORE than 50% earn BELOW the mean
-
LESS than 50% earn ABOVE the mean
-
๐ก Memory hack: "In positive skew, most people are below average because the rich people pull the average up!"
Part 2: Location Metricsโ
๐ฏ What Are Location Metrics?โ
Question they answer: "Where does a specific percentage of the data fall?"
Think of them as dividing lines in your data!
๐ Quartiles (Qโ, Qโ, Qโ, Qโ)โ
Definition: Values that divide data into 4 equal parts (25% each)
The Four Quartiles:โ
| Quartile | Name | Meaning |
|---|---|---|
| Qโ | First/Lower Quartile | 25% of data โค Qโ 75% of data โฅ Qโ |
| Qโ | Second Quartile | = MEDIAN! 50% below, 50% above |
| Qโ | Third/Upper Quartile | 75% of data โค Qโ 25% of data โฅ Qโ |
| Qโ | Fourth Quartile | = MAXIMUM value 100% โค Qโ |
๐ก Memory hack: Qโ, Qโ, Qโ, Qโ = 25%, 50%, 75%, 100%
๐ข Calculating Quartiles - Discrete Variableโ
Example: Daily coffee consumption (200 patients)โ
| Cups (x) | f(x) | F(x) |
|---|---|---|
| 0 | 10 | 10 |
| 1 | 16 | 26 |
| 2 | 12 | 38 |
| 3 | 27 | 65 |
| 4 | 55 | 120 |
| 5 | 48 | 168 |
| 6 | 15 | 183 |
| 7 | 17 | 200 |
Step-by-Step Process:โ
Step 1: Calculate cumulative frequency F(x) โ (already done)
Step 2: Calculate the positions
For Qโ (First Quartile):
For Qโ (Third Quartile):
Step 3: Find where F(x) first exceeds these values
For Qโ = 50:
-
F(3) = 65 > 50 โ (first time!)
-
F(2) = 38 < 50
-
Qโ = 3 cups
For Qโ = 150:
-
F(5) = 168 > 150 โ (first time!)
-
F(4) = 120 < 150
-
Qโ = 5 cups
๐ก Memory hack: "Keep climbing the F(x) stairs until you pass the target!"
๐ข Calculating Quartiles - Continuous Variableโ
Example: Test scores (117 students)โ
| Scores (x) | f(x) | F(x) |
|---|---|---|
| 40-60 | 5 | 5 |
| 60-70 | 31 | 36 |
| 70-75 | 25 | 61 |
| 75-85 | 42 | 103 |
| 85-100 | 14 | 117 |
Calculate positions:
-
Qโ position: n/4 = 117/4 = 29.25
-
Qโ position: 3n/4 = 3(117)/4 = 87.75
Find the classes:
Qโ:
-
F(60-70) = 36 > 29.25 โ
-
Qโ is in class 60-70
Qโ:
-
F(75-85) = 103 > 87.75 โ
-
Qโ is in class 75-85
๐ก Memory hack: For continuous, just identify the CLASS, not exact value (unless you interpolate).
๐ Deciles (Pโโ, Pโโ, ..., Pโโ)โ
Definition: Values that divide data into 10 equal parts (10% each)
Key Deciles:
| Decile | Symbol | Meaning |
|---|---|---|
| First Decile | Pโโ | 10% below, 90% above |
| Second Decile | Pโโ | 20% below, 80% above |
| Ninth Decile | Pโโ | 90% below, 10% above |
Connection to Quartiles:
-
Pโโ = Qโ
-
Pโ โ = Qโ = Median
-
Pโโ = Qโ
๐ Percentiles (Pz)โ
Definition: For any percentage z, Pz is the value where:
-
z% of data is โค Pz
-
(100-z)% of data is โฅ Pz
๐ข Calculating Percentiles - Discrete Variableโ
Example: Find Pโโ (65th percentile) for coffee data
Step 1: F(x) already calculated โ
Step 2: Calculate position
Step 3: Find where F(x) first exceeds 130
| Cups (x) | f(x) | F(x) |
|---|---|---|
| 4 | 55 | 120 |
| 5 | 48 | 168 |
Pโโ = 5 cups
๐ก Memory hack: Formula is n ร (z/100) or just n ร z%
๐ข Calculating Percentiles - Continuous Variableโ
Example: Find Pโโ (35th percentile) for test scores (n = 117)
Step 1: Calculate position
Step 2: Find the class
| Scores (x) | f(x) | F(x) |
|---|---|---|
| 40-60 | 5 | 5 |
| 60-70 | 31 | 36 |
| 70-75 | 25 | 61 |
Pโโ is in class 70-75
๐ Part 3: Dispersion Metricsโ
๐ฏ What is Dispersion?โ
Question: How SPREAD OUT is the data?
Two datasets, same mean = 9:
-
Dataset 1: 9, 9, 9, 9, 9 (no spread!)
-
Dataset 2: 1, 4, 9, 12, 19 (lots of spread!)
๐ก Memory hack: Dispersion = "How scattered is the data?"
๐ Measure 1: Range (R)โ
Definition: Difference between max and min
Formula:
Example: Coffee data
-
xโโโ = 7 cups
-
xโแตขโ = 0 cups
-
R = 7 - 0 = 7 cups
๐ Characteristics:โ
โ Very simple to calculate
โ Easy to understand
โ Affected by extreme outliers (one extreme value changes everything!)
โ Ignores all middle values
๐ก Memory hack: Range looks at the "edges" only, ignores the "middle"
๐ Measure 2: Interquartile Range (IQR or Q)โ
Definition: Distance between Qโ and Qโ (covers the middle 50% of data)
Formula:
Example: Coffee data
-
Qโ = 5 cups
-
Qโ = 3 cups
-
IQR = 5 - 3 = 2 cups
๐ Characteristics:โ
โ Not affected by extremes (only looks at middle 50%)
โ Better than range for skewed data
โ Ignores outer 50% of data
๐ก Memory hack: IQR = "The middle stretch" where most normal people are
๐งฉ Special Cases - Range vs IQRโ
Case 1: Range = 0, what about IQR?โ
Data: 50, 70, 70, 70, 70, 70, 70, 70, 70, 90
-
Range = 90 - 50 = 40
-
Qโ = 70, Qโ = 70
-
IQR = 0 (even though range โ 0!)
Why? Most data is at 70, so middle 50% has no spread.
Case 2: IQR = 0, what about Range?โ
Can be 0 or greater!
If all middle values are the same but extremes differ:
-
IQR = 0
-
Range > 0
๐ก Key insight: IQR focuses on middle, Range looks at extremes!
๐ IQR and Distribution Shapeโ
Normal Distribution:โ
(Symmetric - equal distances)
Positive Skew:โ
(Upper half more spread than lower half)
Negative Skew:โ
(Lower half more spread than upper half)
๐ Measure 3: Variance (sยฒ)โ
Definition: Average of squared deviations from the mean
Why squared? So negative deviations don't cancel positive ones!
๐ Formula (Raw Data):โ
Plain English:
-
Find mean (xฬ)
-
For each value: (xแตข - xฬ)ยฒ
-
Sum all squared differences
-
Divide by n
๐ก Memory hack: "Square the differences so negatives don't cancel!"
Example 1: No Spreadโ
Data: 9, 9, 9, 9, 9
Step 1: Mean
Step 2: Calculate (xแตข - xฬ)ยฒ
| xแตข | xแตข - xฬ | (xแตข - xฬ)ยฒ |
|---|---|---|
| 9 | 0 | 0 |
| 9 | 0 | 0 |
| 9 | 0 | 0 |
| 9 | 0 | 0 |
| 9 | 0 | 0 |
| Sum | 0 |
Step 3: Variance
Interpretation: No spread = variance is 0!
Example 2: With Spreadโ
Data: 1, 4, 9, 12, 19
Step 1: Mean
Step 2: Calculate (xแตข - xฬ)ยฒ
| xแตข | xแตข - xฬ | (xแตข - xฬ)ยฒ |
|---|---|---|
| 1 | -8 | 64 |
| 4 | -5 | 25 |
| 9 | 0 | 0 |
| 12 | +3 | 9 |
| 19 | +10 | 100 |
| Sum | 198 |
Step 3: Variance
Interpretation: Lots of spread = high variance!
๐ Formula (Frequency Table):โ
Plain English: Weight each squared difference by its frequency!
๐ก Memory hack: "If a value appears 5 times, its deviation counts 5 times!"
๐ Measure 4: Standard Deviation (s)โ
Definition: Square root of variance
Formula:
Why do we need it?
-
Variance is in "squared units" (e.g., cupsยฒ)
-
Standard deviation is in original units (e.g., cups)
-
Easier to interpret!
Example: From Example 2
-
sยฒ = 39.6
-
s = โ39.6 โ 6.29
๐ก Memory hack: "Standard deviation = variance in units I can understand!"
๐ Characteristics of Variance & Standard Deviation:โ
โ Uses ALL data points
โ Most widely used dispersion measure
โ Foundation for advanced statistics
โ Heavily affected by outliers (because we square deviations!)
โ Always โฅ 0 (can be 0 only if all values identical)
๐ Linear Transformations & Dispersionโ
Rule 1: Adding/Subtracting a Constant (ยฑa)โ
What happens?
If we add/subtract the same amount to every value:
-
Mean changes: xฬ' = xฬ ยฑ a
-
Variance UNCHANGED: s'ยฒ = sยฒ
-
Standard deviation UNCHANGED: s' = s
Why? Adding a constant shifts everything together - doesn't change spread!
Proof:
๐ก Memory hack: "Shift everyone together = spread stays same!"
Visual:
Original: [1, 2, 3, 4, 5] โ spread = 2
Add 10: [11, 12, 13, 14, 15] โ spread still = 2
Rule 2: Multiplying/Dividing by a Constant (รb or รทb)โ
What happens?
If we multiply/divide every value by the same amount:
-
Mean changes: xฬ' = b ร xฬ
-
Variance changes: s'ยฒ = bยฒ ร sยฒ
-
Standard deviation changes: s' = b ร s (or |b| ร s)
Proof:
๐ก Memory hack:
-
Variance gets multiplied by bยฒ
-
Standard deviation gets multiplied by b
๐ Transformation Summary:โ
| Transformation | Mean | Variance | Std Dev |
|---|---|---|---|
| x + a | xฬ + a | sยฒ | s |
| x - a | xฬ - a | sยฒ | s |
| b ร x | b ร xฬ | bยฒ ร sยฒ | b ร s |
| x รท b | xฬ รท b | sยฒ รท bยฒ | s รท b |
| a + bx | a + bxฬ | bยฒ ร sยฒ | b ร s |
๐งฎ Example: Grade Transformationโ
Original grades (7 students): 91, 77, 65, 83, 88, 71, 98
Given:
-
Mean: xฬ = 80
-
Variance: sยฒ = 120 (assume)
-
Standard deviation: s = โ120 โ 10.95
Scenario 1: Add 2 points to everyoneโ
New statistics:
-
Mean: xฬ' = 80 + 2 = 82
-
Variance: s'ยฒ = 120 (unchanged!)
-
Std Dev: s' = 10.95 (unchanged!)
Scenario 2: Multiply all grades by 1.05 (5% bonus)โ
New statistics:
-
Mean: xฬ' = 1.05 ร 80 = 84
-
Variance: s'ยฒ = (1.05)ยฒ ร 120 = 1.1025 ร 120 = 132.3
-
Std Dev: s' = 1.05 ร 10.95 = 11.50
๐ก Memory hack: "Add = shift only, Multiply = stretch everything!"
๐ Quick Comparison Chartโ
| Measure | Formula | Units | Affected by Outliers? | Interpretation |
|---|---|---|---|---|
| Range | xโโโ - xโแตขโ | Same as data | YES (very!) | Total spread |
| IQR | Qโ - Qโ | Same as data | NO | Middle 50% spread |
| Variance | ฮฃ(xแตข-xฬ)ยฒ/n | Squared units | YES | Average squared deviation |
| Std Dev | โVariance | Same as data | YES | Typical deviation from mean |
๐ฏ Decision Guide: Which Measure?โ
General rules:
-
Symmetric data, no outliers: Standard deviation
-
Skewed or outliers: IQR
-
Quick overview: Range
-
Academic/scientific: Almost always variance/std dev
๐ Key Formulas Summaryโ
Location Metrics:โ
| Metric | Position Formula | Finding Method |
|---|---|---|
| Qโ | n/4 | First F(x) > n/4 |
| Qโ | n/2 | Median |
| Qโ | 3n/4 | First F(x) > 3n/4 |
| Pz | n ร z/100 | First F(x) > nรz% |
Dispersion Metrics:โ
| Metric | Formula |
|---|---|
| Range | R = xโโโ - xโแตขโ |
| IQR | Q = Qโ - Qโ |
| Variance | |
| Std Dev | |
| Variance (freq) |
Transformations:โ
| Transform | Variance | Std Dev |
|---|---|---|
| ยฑa | No change | No change |
| รb | รbยฒ | รb |
๐ก Master Memory Hacksโ
-
Quartiles = Quarter marks at 25%, 50%, 75%, 100%
-
Percentiles = Percent-tiles - the z% mark
-
Range = Edge to edge (min to max)
-
IQR = The middle crowd (ignores extremes)
-
Variance = Squared differences (so negatives don't cancel)
-
Std Dev = Variance in real units (take โ to fix squaring)
-
Adding = shifts, no spread change
-
Multiplying = stretches the spread
-
Mean chases tail, outliers affect it
-
IQR ignores tail, robust to outliers
๐ฏ Quick Reference Mind Mapโ
โ ๏ธ Common Exam Mistakesโ
โ Confusing Qโ - Qโ with Q (should be Qโ - Qโ)
โ Forgetting to square the deviations in variance
โ Thinking adding a constant changes variance (it doesn't!)
โ Using Range when outliers are present
โ Forgetting variance is multiplied by bยฒ, not just b
โ Not squaring b in transformation (s'ยฒ = bยฒ ร sยฒ, not b ร sยฒ)
โ Thinking positive skew means more above average (it's the opposite!)
๐ Pro Exam Tipsโ
-
Quartiles? Calculate n/4 and 3n/4, then find in F(x)
-
Percentiles? Calculate n ร z%, then find in F(x)
-
Variance question? Check if it's asking for raw or transformed data
-
Linear transformation?
-
Adding/subtracting: dispersion unchanged
-
Multiplying: variance รbยฒ, std dev รb
-
-
Outliers present? Use IQR, not range or std dev
-
Skewed distribution? Remember mean โ median โ mode relationship
-
Always check units: variance in squared units, std dev in original units
Final tip: Distribution shape tells you about outliers AND about which measures to trust!