Skip to main content

Frequency Tables and Data Visualization

šŸ“¹ Video Overview​

šŸŽÆ What We're Learning Today​

Main Topics:

  1. How to organize messy data into frequency tables

  2. How to visualize data with graphs

  3. Different methods for different variable types


šŸ—‚ļø What is a Frequency Table?​

Imagine: You have 50 test scores written randomly on paper. How do you make sense of it?

Frequency Table = A way to organize data so you can actually understand it

Why Bother?​

āœ“ Takes huge messy data → makes it clean and organized

āœ“ Helps you see patterns ("Most students scored 80-90")

āœ“ Foundation for all future statistical analysis

āœ“ Helps choose the right statistical methods

šŸ’” Memory hack: Think of it like organizing your closet - instead of clothes everywhere, you sort by type (shirts, pants, shoes) and count how many of each you have.


šŸ”€ The Golden Rule​

Remember: Different variables → Different tables → Different graphs!


šŸ“Š Method 1: Qualitative Variables​

Example: "Which social network would you keep?"​

Raw data: 73 students answered: Instagram, Facebook, Instagram, Twitter, Instagram...

The Frequency Table:​

Social Network (x)f(x) - Frequencyp(x) - Relative Frequency
Instagram4359%
Facebook1622%
Twitter45%
TikTok23%
LinkedIn11%
None710%
Total73100%

šŸ“ The Key Formula:​

p(x)=f(x)np(x) = \frac{f(x)}{n}

Where:

  • p(x) = relative frequency (percentage)

  • f(x) = absolute frequency (count)

  • n = total number of observations

Example calculation:

  • Instagram: p(x) = 43/73 = 0.589 = 59%

  • Facebook: p(x) = 16/73 = 0.219 = 22%

šŸ’” Memory hack: p(x) is just "what PORTION out of everyone" - that's why it's between 0% and 100%!

šŸ“ˆ Graphs for Qualitative Variables:​

Two options:

  1. Bar Chart - Each category gets its own bar

  2. Pie Chart - Shows proportions of a whole

When to use which?

  • Bar chart: When comparing categories (which is bigger?)

  • Pie chart: When showing parts of a whole (Instagram = 59% of the pie)

āš ļø Important: For qualitative variables, NO cumulative frequency! (Can't say "up to Facebook" - what does that even mean?)


šŸ“Š Method 2: Discrete Quantitative Variables​

Example: "Number of people in 10 families"​

Raw data: 2, 2, 6, 5, 3, 5, 5, 4, 3, 2

Step 1: Count the frequencies​

People in Family (x)f(x) - Frequencyp(x) - Relative FrequencyF(x) - Cumulative FrequencyF(x)/n - Relative Cumulative
2330%330%
3220%550%
4110%660%
5330%990%
6110%10100%
Total10100%--

šŸ”‘ New Concept: Cumulative Frequency F(x)​

What is F(x)? = "How many observations are up to and including x?"

Calculation:

  • F(2) = 3 (three families have 2 people)

  • F(3) = 3 + 2 = 5 (five families have up to 3 people)

  • F(4) = 3 + 2 + 1 = 6 (six families have up to 4 people)

  • And so on...

šŸ’” Memory hack: F(x) is like climbing stairs - each step you ADD the previous steps!

Practice Questions:​

Q1: How many families have at most 3 people?

  • Answer: F(3) = 5 families

Q2: How many families have at least 3 people?

  • Answer: Total - F(2) = 10 - 3 = 7 families

  • (Or: everyone EXCEPT those with only 2)

šŸ“ˆ Graphs for Discrete Quantitative:​

Two main options:

  1. Bar Chart - Similar to qualitative, but order matters!

  2. Frequency Polygon - Connect the tops of bars with lines

Frequency Polygon shows: The general trend in the data (is it increasing? decreasing? where's the peak?)


šŸ“Š Method 3: Continuous Quantitative Variables​

šŸŽÆ Why Group into Classes?​

Problem: You have 72 students' heights: 167.3 cm, 172.8 cm, 169.5 cm...

If you list every single height, your table would be HUGE and useless!

Solution: Group into classes (ranges)

šŸ’” Memory hack: Like organizing books by "100-200 pages", "200-300 pages" instead of listing every exact page count.


Example 1: Student Heights (Uniform Class Width)​

Height (x)f(x)Class Width (l)Density (d)p(x)F(x)F(x)/n
150-155250.42.8%22.8%
155-160350.64.2%56.9%
160-165951.812.5%1419.4%
165-170751.49.7%2129.2%
170-1751352.618.1%3447.2%
175-1801653.222.2%5069.4%
180-1851352.618.1%6387.5%
185-190651.28.3%6995.8%
190-195350.64.2%72100%
Total72--100%--

šŸ“ Key Formulas for Continuous Variables:​

1. Class Width (l):​

l=l1āˆ’l0l = l_1 - l_0

Where:

  • l₁ = upper limit of class

  • lā‚€ = lower limit of class

Example: Class 170-175

  • l = 175 - 170 = 5 cm

šŸ’” Memory hack: Class width = how WIDE is the range?


2. Density (d):​

d=f(x)ld = \frac{f(x)}{l}

What is density? = Frequency per unit width

Why do we need it? Because when class widths are different, you can't compare frequencies directly!

Example: Class 170-175

  • d = 13/5 = 2.6 students per cm

šŸ’” Memory hack: Density = how PACKED/CROWDED is the class? Think of population density in cities!


3. Percentage Density (d%):​

d%=p(x)ld\% = \frac{p(x)}{l}

Example: Class 170-175

  • d% = 18.1%/5 = 3.62% per cm

šŸŽØ Histogram - The Main Graph for Continuous Variables​

CRITICAL RULE: When class widths are DIFFERENT, you MUST use density (d) for the y-axis!

Why? Look at this example:

Example 2: Test Scores (UNEVEN Class Width)​

Scores (x)f(x)ldp(x)d%
40-605200.2512.5%0.625%
60-705100.512.5%1.25%
70-75105225%5%
75-851010125%2.5%
85-1001515125%1.67%
Total40--100%-

What happens if we use f(x) instead of d?

āŒ WRONG: 85-100 would look like the tallest bar (15 students)

āœ“ CORRECT: 70-75 is actually the most DENSE (2 students per point)

The area of each rectangle in the histogram = the frequency!


šŸ“Š Histogram vs Frequency Polygon​

FeatureHistogramFrequency Polygon
ShapeRectangles (bars)Connected line
ShowsExact frequencies by classOverall trend/pattern
Use whenWant to see distributionWant to see shape of data

Frequency Polygon: Connect the midpoints of the top of each histogram bar

šŸ’” Memory hack: Polygon = the "skyline" of your histogram!


šŸ“ˆ Cumulative Frequency Graph​

What is it? A line graph showing F(x) - how many observations are up to x

Key feature: Always goes UP (never down) and ends at 100%

Use case: "What percentage of students scored below 80?"


šŸŽÆ Quick Decision Tree​


šŸ“ Complete Example Walkthrough​

Problem: Manhattan temperatures over 24 days in October

Temperature (°C)f(x)ldp(x)d%
5-8431.3316.7%5.6%
8-11832.6733.3%11.1%
11-1463225%8.3%
14-17531.6720.8%6.9%
17-20130.334.2%1.4%
Total24--100%-

Step-by-step solution:​

Step 1: Calculate class width

  • All classes: l = 3°C (uniform!)

Step 2: Calculate p(x)

  • 5-8: p(x) = 4/24 = 16.7%

  • 8-11: p(x) = 8/24 = 33.3%

  • etc.

Step 3: Calculate density

  • 5-8: d = 4/3 = 1.33

  • 8-11: d = 8/3 = 2.67

  • etc.

Step 4: Calculate d%

  • 5-8: d% = 16.7%/3 = 5.6%

  • 8-11: d% = 33.3%/3 = 11.1%

  • etc.

Step 5: Draw histogram

  • Since widths are equal, can use f(x) OR d for y-axis

  • Highest bar: 8-11°C (8 days)

Step 6: Draw frequency polygon

  • Connect midpoints: 6.5°C, 9.5°C, 12.5°C, 15.5°C, 18.5°C

šŸŽ“ Key Formulas Summary​

FormulaWhat it calculatesWhen to use
p(x) = f(x)/nRelative frequencyAny variable type
l = l₁ - lā‚€Class widthContinuous variables
d = f(x)/lDensityContinuous with classes
d% = p(x)/lPercentage densityContinuous with classes
F(x) = Σf(x)Cumulative frequencyQuantitative variables only

šŸ’” Memory Hacks Summary​

  1. Frequency table = organized closet - everything sorted and counted

  2. p(x) = PORTION of the whole - always 0-100%

  3. F(x) = climbing STAIRS - keep adding as you go up

  4. Density = how PACKED/CROWDED - like population density

  5. At most = ≤ (use F(x) directly)

  6. At least = ≄ (use Total - F(x-1))

  7. Different widths = MUST use density! - Can't compare different-sized boxes


āš ļø Common Mistakes to Avoid​

āŒ Using cumulative frequency for qualitative variables

āŒ Forgetting to use density when class widths differ

āŒ Mixing up "at most" and "at least"

āŒ Drawing histogram with f(x) when widths are unequal

āŒ Forgetting that total p(x) must equal 100%


šŸŽÆ Quick Reference Chart​


šŸ† Pro Exam Tips​

  1. First question: What type of variable? (This determines EVERYTHING)

  2. Check class widths: Equal or not? (Affects histogram)

  3. Density is your friend: When in doubt with continuous variables, calculate it

  4. Double-check totals: f(x) should sum to n, p(x) should sum to 100%

  5. Cumulative = running total: Keep adding!

  6. Area = frequency: In histograms, the area of bars represents frequency


Remember: The type of variable is the boss - it tells you what table and graph to use!