Skip to main content

Location and Dispersion Metrics

๐Ÿ“น Video Overviewโ€‹

๐ŸŽฏ What We're Learning Todayโ€‹

Main Topics:

  1. Review: Distribution shapes and central measures

  2. Location Metrics: Quartiles, Deciles, Percentiles

  3. Dispersion Metrics: How spread out is the data?

    • Range & Interquartile Range

    • Variance & Standard Deviation


Part 1: Quick Review - Distribution Shapesโ€‹

The 5 Main Distribution Types:โ€‹

Quick Reference Table:โ€‹

Distribution TypeRelationshipVisual
Normal (Bell)Mode = Median = MeanPerfect symmetry ๐Ÿ””
Dual PeakTwo Modes, Med = MeanTwo humps ๐Ÿซ
UniformNo clear mode, Med = MeanFlat line ๐Ÿ“
Positive Skew (Right)Mode < Median < MeanTail โ†’ right ๐Ÿ“ˆ
Negative Skew (Left)Mean < Median < ModeTail โ†’ left ๐Ÿ“‰

๐Ÿ’ก Memory hack: "Mean follows the tail like a puppy!"


๐Ÿงฉ Classic Exam Question (Must Know!)โ€‹

Q: "University graduate wages are positively skewed. Therefore, the percentage earning above average is greater than the percentage earning below average."

TRUE or FALSE?

Answer: FALSE!

Why?

  1. Positively skewed = tail extends to the right (high earners)

  2. Mean gets pulled UP by extreme high values

  3. Mean > Median

  4. Median splits data 50-50:

    • 50% below median

    • 50% above median

  5. Since Mean > Median:

    • MORE than 50% earn BELOW the mean

    • LESS than 50% earn ABOVE the mean

๐Ÿ’ก Memory hack: "In positive skew, most people are below average because the rich people pull the average up!"


Part 2: Location Metricsโ€‹

๐ŸŽฏ What Are Location Metrics?โ€‹

Question they answer: "Where does a specific percentage of the data fall?"

Think of them as dividing lines in your data!


๐Ÿ“Š Quartiles (Qโ‚, Qโ‚‚, Qโ‚ƒ, Qโ‚„)โ€‹

Definition: Values that divide data into 4 equal parts (25% each)

The Four Quartiles:โ€‹

QuartileNameMeaning
Qโ‚First/Lower Quartile25% of data โ‰ค Qโ‚
75% of data โ‰ฅ Qโ‚
Qโ‚‚Second Quartile= MEDIAN!
50% below, 50% above
Qโ‚ƒThird/Upper Quartile75% of data โ‰ค Qโ‚ƒ
25% of data โ‰ฅ Qโ‚ƒ
Qโ‚„Fourth Quartile= MAXIMUM value
100% โ‰ค Qโ‚„

๐Ÿ’ก Memory hack: Qโ‚, Qโ‚‚, Qโ‚ƒ, Qโ‚„ = 25%, 50%, 75%, 100%


๐Ÿ”ข Calculating Quartiles - Discrete Variableโ€‹

Example: Daily coffee consumption (200 patients)โ€‹

Cups (x)f(x)F(x)
01010
11626
21238
32765
455120
548168
615183
717200

Step-by-Step Process:โ€‹

Step 1: Calculate cumulative frequency F(x) โœ“ (already done)

Step 2: Calculate the positions

For Qโ‚ (First Quartile):

n4=2004=50\frac{n}{4} = \frac{200}{4} = 50

For Qโ‚ƒ (Third Quartile):

3n4=3ร—2004=150\frac{3n}{4} = \frac{3 \times 200}{4} = 150

Step 3: Find where F(x) first exceeds these values

For Qโ‚ = 50:

  • F(3) = 65 > 50 โœ“ (first time!)

  • F(2) = 38 < 50

  • Qโ‚ = 3 cups

For Qโ‚ƒ = 150:

  • F(5) = 168 > 150 โœ“ (first time!)

  • F(4) = 120 < 150

  • Qโ‚ƒ = 5 cups

๐Ÿ’ก Memory hack: "Keep climbing the F(x) stairs until you pass the target!"


๐Ÿ”ข Calculating Quartiles - Continuous Variableโ€‹

Example: Test scores (117 students)โ€‹

Scores (x)f(x)F(x)
40-6055
60-703136
70-752561
75-8542103
85-10014117

Calculate positions:

  • Qโ‚ position: n/4 = 117/4 = 29.25

  • Qโ‚ƒ position: 3n/4 = 3(117)/4 = 87.75

Find the classes:

Qโ‚:

  • F(60-70) = 36 > 29.25 โœ“

  • Qโ‚ is in class 60-70

Qโ‚ƒ:

  • F(75-85) = 103 > 87.75 โœ“

  • Qโ‚ƒ is in class 75-85

๐Ÿ’ก Memory hack: For continuous, just identify the CLASS, not exact value (unless you interpolate).


๐Ÿ“Š Deciles (Pโ‚โ‚€, Pโ‚‚โ‚€, ..., Pโ‚‰โ‚€)โ€‹

Definition: Values that divide data into 10 equal parts (10% each)

Key Deciles:

DecileSymbolMeaning
First DecilePโ‚โ‚€10% below, 90% above
Second DecilePโ‚‚โ‚€20% below, 80% above
Ninth DecilePโ‚‰โ‚€90% below, 10% above

Connection to Quartiles:

  • Pโ‚‚โ‚… = Qโ‚

  • Pโ‚…โ‚€ = Qโ‚‚ = Median

  • Pโ‚‡โ‚… = Qโ‚ƒ


๐Ÿ“Š Percentiles (Pz)โ€‹

Definition: For any percentage z, Pz is the value where:

  • z% of data is โ‰ค Pz

  • (100-z)% of data is โ‰ฅ Pz

๐Ÿ”ข Calculating Percentiles - Discrete Variableโ€‹

Example: Find Pโ‚†โ‚… (65th percentile) for coffee data

Step 1: F(x) already calculated โœ“

Step 2: Calculate position

nร—z100=200ร—65100=200ร—0.65=130n \times \frac{z}{100} = 200 \times \frac{65}{100} = 200 \times 0.65 = 130

Step 3: Find where F(x) first exceeds 130

Cups (x)f(x)F(x)
455120
548168

Pโ‚†โ‚… = 5 cups

๐Ÿ’ก Memory hack: Formula is n ร— (z/100) or just n ร— z%


๐Ÿ”ข Calculating Percentiles - Continuous Variableโ€‹

Example: Find Pโ‚ƒโ‚… (35th percentile) for test scores (n = 117)

Step 1: Calculate position

35100ร—117=0.35ร—117=40.95\frac{35}{100} \times 117 = 0.35 \times 117 = 40.95

Step 2: Find the class

Scores (x)f(x)F(x)
40-6055
60-703136
70-752561

Pโ‚ƒโ‚… is in class 70-75


๐Ÿ“ Part 3: Dispersion Metricsโ€‹

๐ŸŽฏ What is Dispersion?โ€‹

Question: How SPREAD OUT is the data?

Two datasets, same mean = 9:

  • Dataset 1: 9, 9, 9, 9, 9 (no spread!)

  • Dataset 2: 1, 4, 9, 12, 19 (lots of spread!)

๐Ÿ’ก Memory hack: Dispersion = "How scattered is the data?"


๐Ÿ“Š Measure 1: Range (R)โ€‹

Definition: Difference between max and min

Formula:

R=xmaxโˆ’xminR = x_{max} - x_{min}

Example: Coffee data

  • xโ‚˜โ‚โ‚“ = 7 cups

  • xโ‚˜แตขโ‚™ = 0 cups

  • R = 7 - 0 = 7 cups

๐Ÿ” Characteristics:โ€‹

โœ“ Very simple to calculate

โœ“ Easy to understand

โœ— Affected by extreme outliers (one extreme value changes everything!)

โœ— Ignores all middle values

๐Ÿ’ก Memory hack: Range looks at the "edges" only, ignores the "middle"


๐Ÿ“Š Measure 2: Interquartile Range (IQR or Q)โ€‹

Definition: Distance between Qโ‚ƒ and Qโ‚ (covers the middle 50% of data)

Formula:

IQR=Q=Q3โˆ’Q1IQR = Q = Q_3 - Q_1

Example: Coffee data

  • Qโ‚ƒ = 5 cups

  • Qโ‚ = 3 cups

  • IQR = 5 - 3 = 2 cups

๐Ÿ” Characteristics:โ€‹

โœ“ Not affected by extremes (only looks at middle 50%)

โœ“ Better than range for skewed data

โœ— Ignores outer 50% of data

๐Ÿ’ก Memory hack: IQR = "The middle stretch" where most normal people are


๐Ÿงฉ Special Cases - Range vs IQRโ€‹

Case 1: Range = 0, what about IQR?โ€‹

Data: 50, 70, 70, 70, 70, 70, 70, 70, 70, 90

  • Range = 90 - 50 = 40

  • Qโ‚ = 70, Qโ‚ƒ = 70

  • IQR = 0 (even though range โ‰  0!)

Why? Most data is at 70, so middle 50% has no spread.

Case 2: IQR = 0, what about Range?โ€‹

Can be 0 or greater!

If all middle values are the same but extremes differ:

  • IQR = 0

  • Range > 0

๐Ÿ’ก Key insight: IQR focuses on middle, Range looks at extremes!


๐Ÿ“Š IQR and Distribution Shapeโ€‹

Normal Distribution:โ€‹

Q3โˆ’Q2=Q2โˆ’Q1Q_3 - Q_2 = Q_2 - Q_1

(Symmetric - equal distances)

Positive Skew:โ€‹

Q3โˆ’Q2>Q2โˆ’Q1Q_3 - Q_2 > Q_2 - Q_1

(Upper half more spread than lower half)

Negative Skew:โ€‹

Q2โˆ’Q1>Q3โˆ’Q2Q_2 - Q_1 > Q_3 - Q_2

(Lower half more spread than upper half)


๐Ÿ“Š Measure 3: Variance (sยฒ)โ€‹

Definition: Average of squared deviations from the mean

Why squared? So negative deviations don't cancel positive ones!

๐Ÿ“ Formula (Raw Data):โ€‹

s2=โˆ‘i=1n(xiโˆ’xห‰)2ns^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n}

Plain English:

  1. Find mean (xฬ„)

  2. For each value: (xแตข - xฬ„)ยฒ

  3. Sum all squared differences

  4. Divide by n

๐Ÿ’ก Memory hack: "Square the differences so negatives don't cancel!"


Example 1: No Spreadโ€‹

Data: 9, 9, 9, 9, 9

Step 1: Mean

xห‰=9+9+9+9+95=9\bar{x} = \frac{9+9+9+9+9}{5} = 9

Step 2: Calculate (xแตข - xฬ„)ยฒ

xแตขxแตข - xฬ„(xแตข - xฬ„)ยฒ
900
900
900
900
900
Sum0

Step 3: Variance

s2=05=0s^2 = \frac{0}{5} = 0

Interpretation: No spread = variance is 0!


Example 2: With Spreadโ€‹

Data: 1, 4, 9, 12, 19

Step 1: Mean

xห‰=1+4+9+12+195=455=9\bar{x} = \frac{1+4+9+12+19}{5} = \frac{45}{5} = 9

Step 2: Calculate (xแตข - xฬ„)ยฒ

xแตขxแตข - xฬ„(xแตข - xฬ„)ยฒ
1-864
4-525
900
12+39
19+10100
Sum198

Step 3: Variance

s2=1985=39.6s^2 = \frac{198}{5} = 39.6

Interpretation: Lots of spread = high variance!


๐Ÿ“ Formula (Frequency Table):โ€‹

s2=โˆ‘i=1kfi(xiโˆ’xห‰)2โˆ‘i=1kfis^2 = \frac{\sum_{i=1}^{k} f_i(x_i - \bar{x})^2}{\sum_{i=1}^{k} f_i}

Plain English: Weight each squared difference by its frequency!

๐Ÿ’ก Memory hack: "If a value appears 5 times, its deviation counts 5 times!"


๐Ÿ“Š Measure 4: Standard Deviation (s)โ€‹

Definition: Square root of variance

Formula:

s=s2=โˆ‘i=1n(xiโˆ’xห‰)2ns = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n}}

Why do we need it?

  • Variance is in "squared units" (e.g., cupsยฒ)

  • Standard deviation is in original units (e.g., cups)

  • Easier to interpret!

Example: From Example 2

  • sยฒ = 39.6

  • s = โˆš39.6 โ‰ˆ 6.29

๐Ÿ’ก Memory hack: "Standard deviation = variance in units I can understand!"


๐Ÿ” Characteristics of Variance & Standard Deviation:โ€‹

โœ“ Uses ALL data points

โœ“ Most widely used dispersion measure

โœ“ Foundation for advanced statistics

โœ— Heavily affected by outliers (because we square deviations!)

โœ— Always โ‰ฅ 0 (can be 0 only if all values identical)


๐Ÿ”„ Linear Transformations & Dispersionโ€‹

Rule 1: Adding/Subtracting a Constant (ยฑa)โ€‹

What happens?

If we add/subtract the same amount to every value:

  • Mean changes: xฬ„' = xฬ„ ยฑ a

  • Variance UNCHANGED: s'ยฒ = sยฒ

  • Standard deviation UNCHANGED: s' = s

Why? Adding a constant shifts everything together - doesn't change spread!

Proof:

sโ€ฒ2=โˆ‘i=1n[(xiยฑa)โˆ’(xห‰ยฑa)]2n=โˆ‘i=1n(xiโˆ’xห‰)2n=s2s'^2 = \frac{\sum_{i=1}^{n} [(x_i \pm a) - (\bar{x} \pm a)]^2}{n} = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n} = s^2

๐Ÿ’ก Memory hack: "Shift everyone together = spread stays same!"

Visual:

Original: [1, 2, 3, 4, 5]  โ†’ spread = 2

Add 10: [11, 12, 13, 14, 15] โ†’ spread still = 2

Rule 2: Multiplying/Dividing by a Constant (ร—b or รทb)โ€‹

What happens?

If we multiply/divide every value by the same amount:

  • Mean changes: xฬ„' = b ร— xฬ„

  • Variance changes: s'ยฒ = bยฒ ร— sยฒ

  • Standard deviation changes: s' = b ร— s (or |b| ร— s)

Proof:

sโ€ฒ2=โˆ‘i=1n[bxiโˆ’bxห‰]2n=b2โˆ‘i=1n(xiโˆ’xห‰)2n=b2s2s'^2 = \frac{\sum_{i=1}^{n} [bx_i - b\bar{x}]^2}{n} = \frac{b^2\sum_{i=1}^{n} (x_i - \bar{x})^2}{n} = b^2 s^2

sโ€ฒ=b2s2=โˆฃbโˆฃโ‹…ss' = \sqrt{b^2 s^2} = |b| \cdot s

๐Ÿ’ก Memory hack:

  • Variance gets multiplied by bยฒ

  • Standard deviation gets multiplied by b


๐Ÿ“‹ Transformation Summary:โ€‹

TransformationMeanVarianceStd Dev
x + axฬ„ + asยฒs
x - axฬ„ - asยฒs
b ร— xb ร— xฬ„bยฒ ร— sยฒb ร— s
x รท bxฬ„ รท bsยฒ รท bยฒs รท b
a + bxa + bxฬ„bยฒ ร— sยฒb ร— s

๐Ÿงฎ Example: Grade Transformationโ€‹

Original grades (7 students): 91, 77, 65, 83, 88, 71, 98

Given:

  • Mean: xฬ„ = 80

  • Variance: sยฒ = 120 (assume)

  • Standard deviation: s = โˆš120 โ‰ˆ 10.95

Scenario 1: Add 2 points to everyoneโ€‹

New statistics:

  • Mean: xฬ„' = 80 + 2 = 82

  • Variance: s'ยฒ = 120 (unchanged!)

  • Std Dev: s' = 10.95 (unchanged!)

Scenario 2: Multiply all grades by 1.05 (5% bonus)โ€‹

New statistics:

  • Mean: xฬ„' = 1.05 ร— 80 = 84

  • Variance: s'ยฒ = (1.05)ยฒ ร— 120 = 1.1025 ร— 120 = 132.3

  • Std Dev: s' = 1.05 ร— 10.95 = 11.50

๐Ÿ’ก Memory hack: "Add = shift only, Multiply = stretch everything!"


๐Ÿ“Š Quick Comparison Chartโ€‹

MeasureFormulaUnitsAffected by Outliers?Interpretation
Rangexโ‚˜โ‚โ‚“ - xโ‚˜แตขโ‚™Same as dataYES (very!)Total spread
IQRQโ‚ƒ - Qโ‚Same as dataNOMiddle 50% spread
Varianceฮฃ(xแตข-xฬ„)ยฒ/nSquared unitsYESAverage squared deviation
Std DevโˆšVarianceSame as dataYESTypical deviation from mean

๐ŸŽฏ Decision Guide: Which Measure?โ€‹

General rules:

  • Symmetric data, no outliers: Standard deviation

  • Skewed or outliers: IQR

  • Quick overview: Range

  • Academic/scientific: Almost always variance/std dev


๐ŸŽ“ Key Formulas Summaryโ€‹

Location Metrics:โ€‹

MetricPosition FormulaFinding Method
Qโ‚n/4First F(x) > n/4
Qโ‚‚n/2Median
Qโ‚ƒ3n/4First F(x) > 3n/4
Pzn ร— z/100First F(x) > nร—z%

Dispersion Metrics:โ€‹

MetricFormula
RangeR = xโ‚˜โ‚โ‚“ - xโ‚˜แตขโ‚™
IQRQ = Qโ‚ƒ - Qโ‚
Variances2=โˆ‘(xiโˆ’xห‰)2ns^2 = \frac{\sum(x_i - \bar{x})^2}{n}
Std Devs=s2s = \sqrt{s^2}
Variance (freq)s2=โˆ‘fi(xiโˆ’xห‰)2โˆ‘fis^2 = \frac{\sum f_i(x_i - \bar{x})^2}{\sum f_i}

Transformations:โ€‹

TransformVarianceStd Dev
ยฑaNo changeNo change
ร—bร—bยฒร—b

๐Ÿ’ก Master Memory Hacksโ€‹

  1. Quartiles = Quarter marks at 25%, 50%, 75%, 100%

  2. Percentiles = Percent-tiles - the z% mark

  3. Range = Edge to edge (min to max)

  4. IQR = The middle crowd (ignores extremes)

  5. Variance = Squared differences (so negatives don't cancel)

  6. Std Dev = Variance in real units (take โˆš to fix squaring)

  7. Adding = shifts, no spread change

  8. Multiplying = stretches the spread

  9. Mean chases tail, outliers affect it

  10. IQR ignores tail, robust to outliers


๐ŸŽฏ Quick Reference Mind Mapโ€‹


โš ๏ธ Common Exam Mistakesโ€‹

โŒ Confusing Qโ‚ƒ - Qโ‚‚ with Q (should be Qโ‚ƒ - Qโ‚)

โŒ Forgetting to square the deviations in variance

โŒ Thinking adding a constant changes variance (it doesn't!)

โŒ Using Range when outliers are present

โŒ Forgetting variance is multiplied by bยฒ, not just b

โŒ Not squaring b in transformation (s'ยฒ = bยฒ ร— sยฒ, not b ร— sยฒ)

โŒ Thinking positive skew means more above average (it's the opposite!)


๐Ÿ† Pro Exam Tipsโ€‹

  1. Quartiles? Calculate n/4 and 3n/4, then find in F(x)

  2. Percentiles? Calculate n ร— z%, then find in F(x)

  3. Variance question? Check if it's asking for raw or transformed data

  4. Linear transformation?

    • Adding/subtracting: dispersion unchanged

    • Multiplying: variance ร—bยฒ, std dev ร—b

  5. Outliers present? Use IQR, not range or std dev

  6. Skewed distribution? Remember mean โ‰  median โ‰  mode relationship

  7. Always check units: variance in squared units, std dev in original units

Final tip: Distribution shape tells you about outliers AND about which measures to trust!