Regression and Big Data Basics

Simple linear regression output, assumption checks, and introductory big-data concepts for Level I.

On this page

Level I regression is not a machine-learning competition. It is a structured way to describe how one variable changes with another and how much confidence you should place in that relationship. The exam usually tests interpretation: what the slope says, what the intercept means, whether the fit is useful, and what kind of assumption problem is present.

The Core Model

In simple linear regression, the basic model is:

$$ Y = b_0 + b_1 X + \varepsilon $$

You do not need to treat this as abstract notation. It simply says:

(b_0) is the intercept
(b_1) is the estimated change in (Y) for a one-unit change in (X)
(\varepsilon) is the unexplained part left in the residual

What The Output Is Really Saying

Output item	Practical interpretation	Common trap
Intercept	Estimated value of (Y) when (X = 0)	Assuming it is always economically meaningful
Slope coefficient	Direction and magnitude of the estimated relationship	Calling it causal automatically
(R^2)	Fraction of variation in (Y) explained by the model	Reading a high (R^2) as proof of correctness
Standard error of estimate	Typical size of residual error	Ignoring it when the fit looks visually neat

Level I is especially likely to test the difference between association and causation. A regression can describe a relationship without proving why it exists.

Assumptions Matter Because Residual Problems Change The Conclusion

When assumptions break, the result may become less reliable even if the coefficient sign looks attractive. At Level I, the important habit is to recognize the type of problem:

heteroskedasticity means error variance is not constant
serial correlation means residuals are related across time
omitted-variable bias means the model left out a relevant driver

You are usually not being asked to fix the full model. You are being asked to identify why the reported inference may be weaker than it first appears.

Where Big Data Enters At Level I

The curriculum’s big-data introduction is intentionally light. The exam usually wants vocabulary and judgment, not production data engineering. You should be ready to distinguish:

structured from unstructured data
supervised from unsupervised learning at a high level
more data from better data
prediction accuracy from economic usefulness

The strongest answers stay skeptical. Large data sets can reveal patterns, but they can also amplify bad assumptions, poor labels, and overfitting.

How CFA-Style Questions Usually Test This

by asking what the slope coefficient means in plain language
by checking whether you confuse explanatory power with causation
by presenting a residual problem and asking what reliability concern it creates
by contrasting big-data promise with model-risk reality

Mini-Case

An analyst regresses stock returns on market returns and reports a positive slope with a high (R^2). A weak candidate concludes the market return fully causes the stock return. A stronger candidate says the market return explains a meaningful share of observed variation in the sample, but that statement alone does not prove a complete causal mechanism.

Common Traps

turning correlation into causation without justification
treating a high (R^2) as proof that the model is correct
ignoring residual-pattern warnings because the coefficient sign matches intuition
assuming that more data automatically means less model risk

Sample CFA-Style Question

A candidate says a regression with a statistically significant positive slope proves that increases in (X) cause increases in (Y). What is the strongest reply?

Best answer: Statistical significance shows evidence of an association in the sample, not automatic proof of causation.

Why: Level I often checks whether you can interpret regression output without overstating what the model demonstrates.

Continue In This Chapter

Previous lesson: Inference and Hypothesis Testing
Back to chapter root: Quantitative Methods
Return to guide root: CFA Level I Study Guide

1.3 Inference & Testing

Browse CFA Level I Study Guide