Skip to main content

Research Methods Formulas

Complete reference guide for all formulas covered in the Research Methods course.


๐Ÿ“Š Frequency Tables and Data Organizationโ€‹

Relative Frequencyโ€‹

p(x)=f(x)np(x) = \frac{f(x)}{n}

Explanation: The proportion of observations that fall into a specific category. Multiply by 100 to get percentage.

  • p(x) = relative frequency (proportion)
  • f(x) = absolute frequency (count)
  • n = total number of observations

Class Widthโ€‹

l=l1โˆ’l0l = l_1 - l_0

Explanation: The range covered by each class interval in a frequency table.

  • l = class width
  • lโ‚ = upper limit of class
  • lโ‚€ = lower limit of class

Densityโ€‹

d=f(x)ld = \frac{f(x)}{l}

Explanation: Frequency per unit width. Essential when comparing classes with different widths in histograms.

  • d = density
  • f(x) = frequency
  • l = class width

Percentage Densityโ€‹

d%=p(x)ld\% = \frac{p(x)}{l}

Explanation: Relative frequency per unit width. Used for percentage-based density calculations.

  • d% = percentage density
  • p(x) = relative frequency
  • l = class width

Cumulative Frequencyโ€‹

F(x)=โˆ‘f(x)F(x) = \sum f(x)

Explanation: The running total of frequencies up to and including value x. Shows how many observations are "at most" x.

  • F(x) = cumulative frequency
  • f(x) = frequency at each value

๐Ÿ“ˆ Measures of Central Locationโ€‹

Mean (Arithmetic Average)โ€‹

Raw Data: xห‰=โˆ‘i=1nxin=x1+x2+...+xnn\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}

From Frequency Table: xห‰=โˆ‘i=1kxiร—fiโˆ‘i=1kfi=โˆ‘i=1kxiร—pi\bar{x} = \frac{\sum_{i=1}^{k} x_i \times f_i}{\sum_{i=1}^{k} f_i} = \sum_{i=1}^{k} x_i \times p_i

Explanation: The sum of all values divided by the count. Most common measure of central tendency. Uses all data points but is sensitive to outliers.

  • xฬ„ = sample mean
  • xแตข = individual value
  • n = number of observations
  • fแตข = frequency
  • pแตข = relative frequency

Medianโ€‹

For Odd n: x~=xn+12\tilde{x} = x_{\frac{n+1}{2}}

For Even n: x~=xn2+xn2+12\tilde{x} = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2}

Explanation: The middle value when data is sorted. Splits data 50-50. Robust to outliers - not affected by extreme values.

  • xฬƒ = median
  • n = number of observations

Modeโ€‹

Definition: The value (or category) that appears most frequently.

For Continuous Variables: Use the class with highest density (d), not frequency!

Explanation: Most common value. Can have multiple modes (bimodal, multimodal). Not affected by outliers but may not represent center well.


๐Ÿ“ Measures of Dispersionโ€‹

Rangeโ€‹

R=xmaxโˆ’xminR = x_{max} - x_{min}

Explanation: The difference between maximum and minimum values. Simple but heavily affected by outliers.

  • R = range
  • x_max = maximum value
  • x_min = minimum value

Interquartile Range (IQR)โ€‹

IQR=Q=Q3โˆ’Q1IQR = Q = Q_3 - Q_1

Explanation: The spread of the middle 50% of data. Robust to outliers - only uses values between first and third quartiles.

  • IQR = interquartile range
  • Qโ‚ƒ = third quartile (75th percentile)
  • Qโ‚ = first quartile (25th percentile)

Varianceโ€‹

Raw Data: s2=โˆ‘i=1n(xiโˆ’xห‰)2ns^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n}

From Frequency Table: s2=โˆ‘i=1kfi(xiโˆ’xห‰)2โˆ‘i=1kfis^2 = \frac{\sum_{i=1}^{k} f_i(x_i - \bar{x})^2}{\sum_{i=1}^{k} f_i}

Explanation: Average of squared deviations from the mean. Measures spread but in squared units. Always โ‰ฅ 0. Squaring ensures negative deviations don't cancel positive ones.

  • sยฒ = sample variance
  • xแตข = individual value
  • xฬ„ = sample mean
  • n = number of observations
  • fแตข = frequency

Standard Deviationโ€‹

s=s2=โˆ‘i=1n(xiโˆ’xห‰)2ns = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n}}

Explanation: Square root of variance. Returns to original units, making it easier to interpret than variance. Most widely used measure of dispersion.

  • s = standard deviation
  • sยฒ = variance

Coefficient of Variationโ€‹

CV=sxห‰CV = \frac{s}{\bar{x}}

Explanation: Relative measure of spread. Allows comparison between datasets with different units or scales. Lower CV = more homogeneous (less spread).

  • CV = coefficient of variation
  • s = standard deviation
  • xฬ„ = mean

๐Ÿ“ Location Metricsโ€‹

Quartile Positionsโ€‹

First Quartile (Qโ‚): Position=n4\text{Position} = \frac{n}{4}

Third Quartile (Qโ‚ƒ): Position=3n4\text{Position} = \frac{3n}{4}

Explanation: Quartiles divide data into four equal parts. Qโ‚‚ is the median (50th percentile). Find the value where cumulative frequency F(x) first exceeds these positions.

  • n = number of observations

Percentile Positionโ€‹

Position=nร—z100\text{Position} = n \times \frac{z}{100}

Explanation: Finds the position of the z-th percentile. The value where z% of data falls below and (100-z)% falls above.

  • n = number of observations
  • z = percentile (0-100)

โญ Standardizationโ€‹

Z-Score (Standard Score)โ€‹

z=xiโˆ’xห‰sz = \frac{x_i - \bar{x}}{s}

Explanation: Number of standard deviations a value is from the mean. Standardizes data to mean=0 and SD=1, allowing comparison across different scales.

  • z = z-score
  • xแตข = individual value
  • xฬ„ = mean
  • s = standard deviation

Properties:

  • Mean of all z-scores = 0
  • Standard deviation of all z-scores = 1

Reverse Z-Scoreโ€‹

xi=xห‰+(zร—s)x_i = \bar{x} + (z \times s)

Explanation: Converts a z-score back to the original value. Useful for finding values at specific percentile positions.

  • xแตข = original value
  • z = z-score
  • xฬ„ = mean
  • s = standard deviation

๐Ÿ”„ Linear Transformationsโ€‹

Adding/Subtracting a Constantโ€‹

If zแตข = xแตข ยฑ a:

zห‰=xห‰ยฑa\bar{z} = \bar{x} \pm a

sz2=sx2s_z^2 = s_x^2

sz=sxs_z = s_x

Explanation: Shifting all values by a constant changes the mean but NOT the variance or standard deviation. The spread remains unchanged.

  • a = constant
  • zแตข = transformed values

Multiplying/Dividing by a Constantโ€‹

If zแตข = b ร— xแตข:

zห‰=bร—xห‰\bar{z} = b \times \bar{x}

sz2=b2ร—sx2s_z^2 = b^2 \times s_x^2

sz=โˆฃbโˆฃร—sxs_z = |b| \times s_x

Explanation: Scaling all values multiplies the mean and standard deviation by the constant, but variance is multiplied by the constant squared.

  • b = constant
  • zแตข = transformed values

๐Ÿ”— Relationships Between Variablesโ€‹

Covarianceโ€‹

cov(x,y)=โˆ‘i=1n(xiโˆ’xห‰)(yiโˆ’yห‰)ncov(x,y) = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{n}

Explanation: Measures how two variables move together. Positive = both increase together, negative = one increases while other decreases, zero = no linear relationship. Depends on units.

  • cov(x,y) = covariance
  • xแตข, yแตข = paired observations
  • xฬ„, ศณ = means

Pearson Correlation Coefficientโ€‹

r(x,y)=cov(x,y)sxร—syr(x,y) = \frac{cov(x,y)}{s_x \times s_y}

Explanation: Standardized covariance. Always between -1 and +1. Sign shows direction, absolute value shows strength. Independent of units.

  • r = correlation coefficient (-1 to +1)
  • cov(x,y) = covariance
  • sโ‚“, sแตง = standard deviations

Interpretation:

  • |r| = 0.00-0.10: Negligible
  • |r| = 0.10-0.39: Weak
  • |r| = 0.40-0.69: Medium
  • |r| = 0.70-0.89: Strong
  • |r| = 0.90-1.00: Very Strong

Variance of a Sumโ€‹

var(x+y)=var(x)+var(y)+2ร—cov(x,y)var(x + y) = var(x) + var(y) + 2 \times cov(x,y)

Explanation: When adding two variables, their combined variance includes individual variances plus twice their covariance. Important for portfolio risk analysis.

  • var(x), var(y) = individual variances
  • cov(x,y) = covariance

๐Ÿ“ˆ Regression Analysisโ€‹

Simple Linear Regression Equationโ€‹

y=ฮฒ0+ฮฒ1xy = \beta_0 + \beta_1 x

Explanation: Predicts dependent variable y from independent variable x using a straight line.

  • y = dependent variable (predicted)
  • ฮฒโ‚€ = intercept (y when x=0)
  • ฮฒโ‚ = slope (change in y per unit change in x)
  • x = independent variable

Slope (ฮฒโ‚)โ€‹

ฮฒ1=โˆ‘i=1n(xiโˆ’xห‰)(yiโˆ’yห‰)โˆ‘i=1n(xiโˆ’xห‰)2\beta_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}

Alternative form: ฮฒ1=cov(x,y)var(x)\beta_1 = \frac{cov(x,y)}{var(x)}

Explanation: Rate of change - how much y changes for each 1-unit increase in x. Positive = upward trend, negative = downward trend.

  • ฮฒโ‚ = slope coefficient
  • xแตข, yแตข = paired observations
  • xฬ„, ศณ = means

Intercept (ฮฒโ‚€)โ€‹

ฮฒ0=yห‰โˆ’ฮฒ1xห‰\beta_0 = \bar{y} - \beta_1 \bar{x}

Explanation: The predicted value of y when x = 0. Ensures the regression line passes through the point (xฬ„, ศณ).

  • ฮฒโ‚€ = intercept
  • ศณ = mean of y
  • ฮฒโ‚ = slope
  • xฬ„ = mean of x

Residual (Error)โ€‹

ฮตi=yiโˆ’y^i\varepsilon_i = y_i - \hat{y}_i

Where: y^i=ฮฒ0+ฮฒ1xi\hat{y}_i = \beta_0 + \beta_1 x_i

Explanation: The difference between actual and predicted values. Positive = point above line, negative = point below line. Sum of squared residuals is minimized in least squares regression.

  • ฮตแตข = residual for observation i
  • yแตข = actual value
  • ลทแตข = predicted value

๐Ÿ“ Sigma (Summation) Rulesโ€‹

Rule 1: Sum of a Constantโ€‹

โˆ‘i=1na=nร—a\sum_{i=1}^{n} a = n \times a

Explanation: Adding the same constant n times equals n multiplied by that constant.


Rule 2: Constant Times Variableโ€‹

โˆ‘i=1naร—xi=aร—โˆ‘i=1nxi\sum_{i=1}^{n} a \times x_i = a \times \sum_{i=1}^{n} x_i

Explanation: You can factor out a constant from a summation.


Rule 3: Sum of Additionโ€‹

โˆ‘i=1n(xi+yi)=โˆ‘i=1nxi+โˆ‘i=1nyi\sum_{i=1}^{n} (x_i + y_i) = \sum_{i=1}^{n} x_i + \sum_{i=1}^{n} y_i

Explanation: Sum of sums equals sum of each separately.


Rule 4: Sum of Multiplication (โš ๏ธ Cannot Split!)โ€‹

โˆ‘i=1n(xiร—yi)โ‰ โˆ‘i=1nxiร—โˆ‘i=1nyi\sum_{i=1}^{n} (x_i \times y_i) \neq \sum_{i=1}^{n} x_i \times \sum_{i=1}^{n} y_i

Explanation: You CANNOT split multiplication! Must multiply first, then sum.


Rule 5: Sum of Squares (โš ๏ธ Cannot Split!)โ€‹

โˆ‘i=1nxi2โ‰ (โˆ‘i=1nxi)2\sum_{i=1}^{n} x_i^2 \neq \left(\sum_{i=1}^{n} x_i\right)^2

Explanation: Square each value first, THEN sum. Not the other way around!


๐Ÿ’ฐ Financial Applicationsโ€‹

Return Calculationโ€‹

Return=Price(t+1)+Dividendโˆ’Price(t)Price(t)\text{Return} = \frac{\text{Price}(t+1) + \text{Dividend} - \text{Price}(t)}{\text{Price}(t)}

Explanation: Percentage gain or loss from an investment. Includes both price change and dividends.

  • Price(t) = price at time t
  • Price(t+1) = price at time t+1
  • Dividend = dividend payment

๐ŸŽฏ Quick Reference Summaryโ€‹

CategoryKey Formulas
Central LocationMean: xฬ„ = ฮฃx/n, Median: middle value, Mode: most frequent
DispersionRange: max-min, IQR: Qโ‚ƒ-Qโ‚, Variance: sยฒ, SD: โˆšsยฒ
StandardizationZ-score: z = (x-xฬ„)/s, CV: s/xฬ„
RelationshipsCovariance: cov(x,y), Correlation: r = cov/(sโ‚“sแตง)
Regressiony = ฮฒโ‚€ + ฮฒโ‚x, Slope: ฮฒโ‚ = cov/var(x), Intercept: ฮฒโ‚€ = ศณ - ฮฒโ‚xฬ„
TransformationsAdd ยฑa: mean changes, variance unchanged. Multiply ร—b: mean ร—b, variance ร—bยฒ

๐Ÿ’ก Important Notesโ€‹

  1. Always check units: Variance is in squared units, standard deviation in original units
  2. Use density (d) for continuous variables when class widths differ
  3. Correlation โ‰  Causation: r measures association, not cause
  4. Extrapolation warning: Don't predict outside your data range
  5. Outliers affect: Mean and variance are sensitive, median and IQR are robust

Last Updated: Based on Lectures 1-7 of Research Methods course