Measurement of Dispersion of Data using Range, Variance and Standard Deviation

Vishal Kumar
3 min readJul 17, 2021

--

Image Courtesy: KDnuggets

In data science domain when we see data we first try to measure the central tendency of the data then in second step we try to measure the dispersion of the data in other words we try to find how data is spread. There are some techniques available to do so. Before knowing these techniques you should know about measure of central tendency if you are not aware of it the read this article first.

Range: Range is the difference between the highest and lowest values of data.

E.g. Find range of x = [12, 18, 13, 14, 15, 16,13, 14, 21, 13]

Range = 21–12 = 9

Variance: Variance is use to measure the dispersion of data points around their mean value. There are two ways of variance measurement:

Population Variance: Population variance is equal to the sum of squared differences between the observed values and the population mean, divided by the total number of observations. It is denoted by sigma squared

Image courtesy: 365datascience

Here,
σ2 represents population variance
x represents observed value
μ represents mean of observed values
N represents total number of observed values

E.g. Find the population variance of x = [21,42,37,16,31,28,33,41,12]

Sample Variance: Some times size of data is so huge that measuring population variance becomes a hectic task so statisticians did a study and came up with a technique of Sample Variance which gives a good idea of dispersion of data by taking few samples from the whole population data. Sample variance is equal to the sum of squared differences between observed sample values and the sample mean, divided by the number of sample observations minus 1. It is denoted by s squared.

Image courtesy: 365datascience

Here,
s2 represents sample variance
x represents observed values from sample
xˉ represents mean of observed values from sample
n represents total number of observed values from sample

E.g. Find sample variance of x = [131,148,139,142,152]

Standard Deviation: Variance is a common measure of data dispersion, in most cases the figure you will obtain is pretty large. Moreover, it is hard to compare because the unit of measurement is squared. To make it easy we can calculate its square root and obtain a statistic known as Standard Deviation.

Standard Deviation is makes it easy to understand the data dispersion. In case of standard deviation also there are two ways of measuring it:

Population Standard Deviation: Population Standard Deviation can be calculated by taking the squared root of population variance.

Image Courtesy: 365datascience

Sample Standard Deviation: Sample Standard Deviation can be calculated by taking the squared root of sample variance.

Image Courtesy: 365datascience

Thanks for reading the article give a clap if this article helped you understanding measurement of dispersion😊

--

--

Vishal Kumar

Data Scientist, Data Science Enthusiast working on NLP, Knowledge Graphs and deep learning.