Research Methods Formulas
Complete reference guide for all formulas covered in the Research Methods course.
๐ Frequency Tables and Data Organizationโ
Relative Frequencyโ
Explanation: The proportion of observations that fall into a specific category. Multiply by 100 to get percentage.
- p(x) = relative frequency (proportion)
- f(x) = absolute frequency (count)
- n = total number of observations
Class Widthโ
Explanation: The range covered by each class interval in a frequency table.
- l = class width
- lโ = upper limit of class
- lโ = lower limit of class
Densityโ
Explanation: Frequency per unit width. Essential when comparing classes with different widths in histograms.
- d = density
- f(x) = frequency
- l = class width
Percentage Densityโ
Explanation: Relative frequency per unit width. Used for percentage-based density calculations.
- d% = percentage density
- p(x) = relative frequency
- l = class width
Cumulative Frequencyโ
Explanation: The running total of frequencies up to and including value x. Shows how many observations are "at most" x.
- F(x) = cumulative frequency
- f(x) = frequency at each value
๐ Measures of Central Locationโ
Mean (Arithmetic Average)โ
Raw Data:
From Frequency Table:
Explanation: The sum of all values divided by the count. Most common measure of central tendency. Uses all data points but is sensitive to outliers.
- xฬ = sample mean
- xแตข = individual value
- n = number of observations
- fแตข = frequency
- pแตข = relative frequency
Medianโ
For Odd n:
For Even n:
Explanation: The middle value when data is sorted. Splits data 50-50. Robust to outliers - not affected by extreme values.
- xฬ = median
- n = number of observations
Modeโ
Definition: The value (or category) that appears most frequently.
For Continuous Variables: Use the class with highest density (d), not frequency!
Explanation: Most common value. Can have multiple modes (bimodal, multimodal). Not affected by outliers but may not represent center well.
๐ Measures of Dispersionโ
Rangeโ
Explanation: The difference between maximum and minimum values. Simple but heavily affected by outliers.
- R = range
- x_max = maximum value
- x_min = minimum value
Interquartile Range (IQR)โ
Explanation: The spread of the middle 50% of data. Robust to outliers - only uses values between first and third quartiles.
- IQR = interquartile range
- Qโ = third quartile (75th percentile)
- Qโ = first quartile (25th percentile)
Varianceโ
Raw Data:
From Frequency Table:
Explanation: Average of squared deviations from the mean. Measures spread but in squared units. Always โฅ 0. Squaring ensures negative deviations don't cancel positive ones.
- sยฒ = sample variance
- xแตข = individual value
- xฬ = sample mean
- n = number of observations
- fแตข = frequency
Standard Deviationโ
Explanation: Square root of variance. Returns to original units, making it easier to interpret than variance. Most widely used measure of dispersion.
- s = standard deviation
- sยฒ = variance
Coefficient of Variationโ
Explanation: Relative measure of spread. Allows comparison between datasets with different units or scales. Lower CV = more homogeneous (less spread).
- CV = coefficient of variation
- s = standard deviation
- xฬ = mean
๐ Location Metricsโ
Quartile Positionsโ
First Quartile (Qโ):
Third Quartile (Qโ):
Explanation: Quartiles divide data into four equal parts. Qโ is the median (50th percentile). Find the value where cumulative frequency F(x) first exceeds these positions.
- n = number of observations
Percentile Positionโ
Explanation: Finds the position of the z-th percentile. The value where z% of data falls below and (100-z)% falls above.
- n = number of observations
- z = percentile (0-100)
โญ Standardizationโ
Z-Score (Standard Score)โ
Explanation: Number of standard deviations a value is from the mean. Standardizes data to mean=0 and SD=1, allowing comparison across different scales.
- z = z-score
- xแตข = individual value
- xฬ = mean
- s = standard deviation
Properties:
- Mean of all z-scores = 0
- Standard deviation of all z-scores = 1
Reverse Z-Scoreโ
Explanation: Converts a z-score back to the original value. Useful for finding values at specific percentile positions.
- xแตข = original value
- z = z-score
- xฬ = mean
- s = standard deviation
๐ Linear Transformationsโ
Adding/Subtracting a Constantโ
If zแตข = xแตข ยฑ a:
Explanation: Shifting all values by a constant changes the mean but NOT the variance or standard deviation. The spread remains unchanged.
- a = constant
- zแตข = transformed values
Multiplying/Dividing by a Constantโ
If zแตข = b ร xแตข:
Explanation: Scaling all values multiplies the mean and standard deviation by the constant, but variance is multiplied by the constant squared.
- b = constant
- zแตข = transformed values
๐ Relationships Between Variablesโ
Covarianceโ
Explanation: Measures how two variables move together. Positive = both increase together, negative = one increases while other decreases, zero = no linear relationship. Depends on units.
- cov(x,y) = covariance
- xแตข, yแตข = paired observations
- xฬ, ศณ = means
Pearson Correlation Coefficientโ
Explanation: Standardized covariance. Always between -1 and +1. Sign shows direction, absolute value shows strength. Independent of units.
- r = correlation coefficient (-1 to +1)
- cov(x,y) = covariance
- sโ, sแตง = standard deviations
Interpretation:
- |r| = 0.00-0.10: Negligible
- |r| = 0.10-0.39: Weak
- |r| = 0.40-0.69: Medium
- |r| = 0.70-0.89: Strong
- |r| = 0.90-1.00: Very Strong
Variance of a Sumโ
Explanation: When adding two variables, their combined variance includes individual variances plus twice their covariance. Important for portfolio risk analysis.
- var(x), var(y) = individual variances
- cov(x,y) = covariance
๐ Regression Analysisโ
Simple Linear Regression Equationโ
Explanation: Predicts dependent variable y from independent variable x using a straight line.
- y = dependent variable (predicted)
- ฮฒโ = intercept (y when x=0)
- ฮฒโ = slope (change in y per unit change in x)
- x = independent variable
Slope (ฮฒโ)โ
Alternative form:
Explanation: Rate of change - how much y changes for each 1-unit increase in x. Positive = upward trend, negative = downward trend.
- ฮฒโ = slope coefficient
- xแตข, yแตข = paired observations
- xฬ, ศณ = means
Intercept (ฮฒโ)โ
Explanation: The predicted value of y when x = 0. Ensures the regression line passes through the point (xฬ, ศณ).
- ฮฒโ = intercept
- ศณ = mean of y
- ฮฒโ = slope
- xฬ = mean of x
Residual (Error)โ
Where:
Explanation: The difference between actual and predicted values. Positive = point above line, negative = point below line. Sum of squared residuals is minimized in least squares regression.
- ฮตแตข = residual for observation i
- yแตข = actual value
- ลทแตข = predicted value
๐ Sigma (Summation) Rulesโ
Rule 1: Sum of a Constantโ
Explanation: Adding the same constant n times equals n multiplied by that constant.
Rule 2: Constant Times Variableโ
Explanation: You can factor out a constant from a summation.
Rule 3: Sum of Additionโ
Explanation: Sum of sums equals sum of each separately.
Rule 4: Sum of Multiplication (โ ๏ธ Cannot Split!)โ
Explanation: You CANNOT split multiplication! Must multiply first, then sum.
Rule 5: Sum of Squares (โ ๏ธ Cannot Split!)โ
Explanation: Square each value first, THEN sum. Not the other way around!
๐ฐ Financial Applicationsโ
Return Calculationโ
Explanation: Percentage gain or loss from an investment. Includes both price change and dividends.
- Price(t) = price at time t
- Price(t+1) = price at time t+1
- Dividend = dividend payment
๐ฏ Quick Reference Summaryโ
| Category | Key Formulas |
|---|---|
| Central Location | Mean: xฬ = ฮฃx/n, Median: middle value, Mode: most frequent |
| Dispersion | Range: max-min, IQR: Qโ-Qโ, Variance: sยฒ, SD: โsยฒ |
| Standardization | Z-score: z = (x-xฬ)/s, CV: s/xฬ |
| Relationships | Covariance: cov(x,y), Correlation: r = cov/(sโsแตง) |
| Regression | y = ฮฒโ + ฮฒโx, Slope: ฮฒโ = cov/var(x), Intercept: ฮฒโ = ศณ - ฮฒโxฬ |
| Transformations | Add ยฑa: mean changes, variance unchanged. Multiply รb: mean รb, variance รbยฒ |
๐ก Important Notesโ
- Always check units: Variance is in squared units, standard deviation in original units
- Use density (d) for continuous variables when class widths differ
- Correlation โ Causation: r measures association, not cause
- Extrapolation warning: Don't predict outside your data range
- Outliers affect: Mean and variance are sensitive, median and IQR are robust
Last Updated: Based on Lectures 1-7 of Research Methods course