How this works
Standard deviation measures the spread of a dataset around its mean — how much, on average, individual values deviate from the average. A small standard deviation means the data is tightly clustered around the mean; a large one means it's spread out. Two datasets can have the same mean but very different shapes, and standard deviation is the most common single number used to describe that difference. The classic example: salaries at a startup with mean $80,000 — if standard deviation is $5,000, everyone earns close to $80k; if it's $40,000, the founders make $200k while the early employees scrape by at $50k. The mean alone hides the inequality.
There are two flavours of standard deviation, and choosing the right one matters. Population standard deviation (σ, divides by n) is used when you have data on every member of the group you care about — e.g. the heights of every player on one specific basketball team. Sample standard deviation (s, divides by n−1) is used when your dataset is a random sample from a larger population and you want to estimate the population's spread — e.g. measuring the heights of 100 randomly-selected people to estimate the spread in the whole country. The n−1 (Bessel's correction) compensates for the fact that the sample mean is itself an estimate, which slightly under-states the true population variance if you naively use n. Use sample stdev for almost any inferential statistics work; use population stdev when you genuinely have the whole population (rare in practice).
Variance is just standard deviation squared — same information, different units. Standard deviation is in the same units as the original data (dollars, seconds, kg), which makes it interpretable; variance is in squared units (dollars², seconds², kg²), which is mathematically convenient for many proofs and computations but harder to reason about. The empirical rule (68-95-99.7) gives an intuition for "normal" distributions: about 68% of values fall within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3. So in our salary example, if salaries are normally distributed with mean $80k and stdev $10k, you'd expect ~68% of employees to make between $70k and $90k. Real-world data is often not perfectly normal (especially income, which is right-skewed), but the rule is a useful sanity check.
The formula
xᵢ are individual data points. n is the count of data points. μ (mu) is the population mean; x̄ (x-bar) is the sample mean — they're computed identically but conventionally use different symbols depending on whether the data is treated as a population or a sample. Σ means sum across all data points. The n−1 in the sample formula is Bessel's correction.
Example calculation
- Dataset: 4, 8, 6, 5, 3, 7. n = 6.
- Mean = (4 + 8 + 6 + 5 + 3 + 7) / 6 = 33 / 6 = 5.5.
- Squared deviations from the mean: (4−5.5)² = 2.25, (8−5.5)² = 6.25, (6−5.5)² = 0.25, (5−5.5)² = 0.25, (3−5.5)² = 6.25, (7−5.5)² = 2.25. Sum = 17.5.
- Population variance = 17.5 / 6 ≈ 2.917; population stdev = √2.917 ≈ 1.708. Sample variance = 17.5 / 5 = 3.5; sample stdev = √3.5 ≈ 1.871. The sample value is always slightly larger because of the n−1 denominator.
Frequently asked questions
Should I use the sample (n−1) or population (n) version?
Default to sample (n−1) unless you genuinely have data on every member of the group you care about. The n−1 formula compensates for the fact that the sample mean is itself an estimate, which slightly under-states the true population variance if you use n. In practice: scientific analysis, A/B test results, polling data, machine-learning feature engineering, finance return analysis — all sample. Population formulas only apply if your dataset literally is the entire population (a class of 30 students, every game played by one team, every transaction in a closed period). When in doubt, use sample — the difference vanishes for large n anyway, and using sample when you should've used population is much less wrong than the reverse.
How is standard deviation different from standard error?
Standard deviation describes the spread of individual data points within a single sample. Standard error describes the spread of sample means across many hypothetical samples — it answers "how much would the mean change if I drew a different random sample?". Mathematically, standard error = standard deviation / √n, so it shrinks as your sample gets bigger (more data, more confidence in the mean). They're used for different purposes: standard deviation when you want to describe variability in your data; standard error when you want to express uncertainty about an estimate (e.g. "the population mean is 50 ± 2", where 2 is the standard error). Confidence intervals and p-values are built from standard errors, not standard deviations.
Can standard deviation be negative?
No. Standard deviation is the square root of variance, and variance is a sum of squared deviations divided by a positive count — both intermediate quantities are always non-negative, so the result must be ≥ 0. Standard deviation is exactly 0 when every data point equals the mean (i.e. all values are identical, no variability). Negative results from a calculator always indicate an input or computation error. The "deviation" of an individual point from the mean (xᵢ − μ) can be negative or positive, but the standard deviation is the typical magnitude of those deviations, expressed as a positive number.