Sampling, confidence intervals, test decisions, p-values, and independence logic for Level I.
Inference is what turns a sample into a decision. Level I is usually not asking whether you can memorize a long test catalog. It is asking whether you understand the decision structure: what is being tested, what evidence you observed, and what conclusion the evidence actually supports.
A sample mean is not the population mean. It is an estimate, and the exam wants you to respect the uncertainty around that estimate.
A common building block is the standard error:
$$ SE = \frac{s}{\sqrt{n}} $$
The interpretation matters more than the formula itself. Larger samples usually shrink sampling uncertainty. More uncertainty means a wider confidence interval and a weaker basis for sharp conclusions.
| Piece | What it means | Frequent trap |
|---|---|---|
| Null hypothesis | The baseline claim being challenged | Treating it as the claim you want to prove true |
| Alternative hypothesis | The competing claim | Forgetting the direction in one-tailed tests |
| Significance level | The tolerated Type I error rate | Calling it the probability the null is true |
| p-value | Evidence against the null under the null assumption | Reading it as the probability the null is correct |
| Type I error | Rejecting a true null | Confusing it with power |
| Type II error | Failing to reject a false null | Treating “fail to reject” as “accept” |
The test statistic logic is conceptually simple:
$$ \text{test statistic} = \frac{\text{estimate} - \text{hypothesized value}}{SE} $$
You compare observed evidence against a threshold. If the evidence is strong enough, you reject the null. If it is not, you fail to reject. That wording matters. Level I often penalizes candidates who overstate the conclusion.
When the curriculum moves into parametric and non-parametric tests of independence, the exam is usually asking whether two variables move together in a statistically meaningful way or whether category membership appears related across groups.
The important habit is to identify the data structure first:
Suppose an analyst tests whether a manager’s mean excess return is greater than zero. The p-value is larger than the chosen significance level. A weak candidate says the manager has no skill. A stronger candidate says the sample did not provide enough evidence to reject the null at that significance level.
That is the Level I habit: draw the conclusion the evidence supports, not the conclusion you wish the test had proven.
A test result is not statistically significant at the 5% level. Which interpretation is strongest?
Best answer: The sample does not provide enough evidence to reject the null hypothesis at the 5% significance level.
Why: Level I routinely tests the language of inference. “Fail to reject” is deliberately weaker than “accept.”