Statistics 101 統計學入門 --2

Statistics 101 統計學入門 --2

In this post, I'll share about how to measure the central tendency(集中趨勢).

When we order the values of a variable from least to greatest, we will get a distribution of our values. As researchers ourselves, we want to tell others how the data looks like. And one aspect that we are interested in is the central tendency, which can be descirbed by mean, mode, median and......

Central Tendency

Median(中位數): The value that sits on the 50th percentile. Meaning that there are 50% of the values that are larger than the median and vice versa.

Mode(眾數):Probabily is the statistic that we use the least, since it does not provide much information. Only which value appears most frequectly, or...has the highest frequency.

Mean(均值): Probabily is the easiest statitic that people like to use. It provides a simple figure of how our values distribute. But it neither tells us how our values are dispersed nor does it tell us how many values are close to the mean.

--Before we start calculating the mean, let's recall population and sample.

Remember population and sample?

We use two different symbols to represent that statistic and parameter we gather from a sample and a population.

Image Description

How we calculate the Mean:


N—numbers of values in a population

n—numbers of values in a sample.

X—the individual value we gather in population and sample.

More on Outliers(異常值):

An outlier is a value in a distribution that is either too large or too small. Now that we understand how to calculate the mean, skewness can be introduced a little here.

Suppose we now have an outlier that is too large, then it's intuitive to think that the new mean will be larger than the original mean. We called the distribution right skewed or postively skewed.(正偏)

Conversely, an outlier too small will cause the distribution to be left-skewed or negatively skewed. (負偏)

Note that our distribution should be ordered from the least to greatest for the inferrence above to work.

How to find median and mode?


We use a symbol to represent the median:

Image Description

If the numbers of values we have in a distribution is odd, then the value in the middle is our median.

If the numbers of values we have in a distribution is even, then the average of the two numbers next to the space in the middle, is our median.


It's pretty easy, just count how many times each value appears in the distribution.

If there is only one mode, we say the distribution is unimodal. (單峰)

If there are two modes, we say the distribution is bimodal.(雙峰)

And if there are lots of modes, we say the distribution is multimodal.(多峰)