A2oz

How to Calculate Outliers Using Standard Deviation?

Published in Data Analysis 2 mins read

You can calculate outliers using standard deviation by employing the z-score method. This method involves calculating the z-score for each data point and then identifying those that fall outside a predefined range.

Understanding Z-Scores

A z-score represents how many standard deviations a data point is away from the mean. A positive z-score indicates the data point is above the mean, while a negative z-score indicates it is below the mean.

Calculating Z-Scores

  1. Calculate the mean (average) of the data set.
  2. Calculate the standard deviation of the data set.
  3. For each data point, subtract the mean from the data point and then divide the result by the standard deviation. This gives you the z-score for that data point.

Identifying Outliers

  1. Define a threshold. A common threshold is ±2 standard deviations from the mean. This means any data point with a z-score greater than 2 or less than -2 is considered an outlier.
  2. Compare z-scores to the threshold. Any data point with a z-score outside the defined threshold is identified as an outlier.

Example

Let's say you have a data set of test scores: 70, 75, 80, 85, 90, 95, 100, 105, 110, 120.

  1. Mean: 90
  2. Standard Deviation: 15
  3. Z-score for 120: (120 - 90) / 15 = 2

Since the z-score for 120 is 2, which is equal to the threshold, it is considered an outlier.

Practical Insights

  • The choice of threshold depends on the specific data and application. For example, in some cases, a threshold of ±3 standard deviations may be more appropriate.
  • Outlier detection is an important step in data analysis, as outliers can significantly affect statistical calculations and model performance.
  • It's crucial to investigate the cause of outliers before removing them from the data set. Sometimes, outliers might be due to errors in data collection or entry.

Related Articles