# Statistics 101 統計學入門 --1

When we first learn about statistics, it may be a bit confusing, so I'm just gonna cut the crap and get in to it.

#### Some terms to know before we start

Population（總體）: The group of all items of interest.

Parameter（參數）: A property of the data of a population

Sample（樣本）: A subset of population.

Statistic（統計量）: A property of the data of a sample.

##### Example 1:

Suppose there are 50,000 students on campus and we want to know the average bottles of beer they drink each week. We then conduct a survey by asking 500 students how many bottles of beer they drink each week.

(**Population**: 50,000 students, **Parameter**: the average bottles of beer drank by 50,000 students each week, **Sample**: 500 students, **Statistc**: the average bottles of beer drank by 500 students each week)

#### Descriptive statistics and Inferential statistics （描述性統計、推論性統計）

Descriptive statistics only uses the data that we collected to describe a sample or a population. E.g.Mean, Standard deviation......

Inferential statistics supposes that the sample can represent the population, and therefore uses some statistics to conclude the chracteristics of the population. (Similar to what we do in Example 1)

#### Sampling

Random sampling（隨機抽樣）: Random means that every subject in the population has the same probability to be selected into the sample.

Representative sampling（典型抽樣）: The researchers are intentionnally choosing subjects that match the characteristics of the population.

Convenience sampling（方便抽樣）:Researchers choose their subjects according to the availability of the subjects.

##### Example 2:

Suppose that we are looking at the average income of people living in HK, and we want 500 people in our sample. *Random sampling* will randomly draw 500 subjects from the population to calculate the results. *Representative sampling* may first observe the characteristics (or the subsets: ethnic groups, adults, elderly, gender), and then try to let our 500 subjects match the characteristics of the population as much as possible. *Convenience Sampling* is rather simple, say our research is based in Kowloon, then we may just find 500 people on the street of Kowloon.

#### Variable, observation and data

Variable（變量）: A variable is a characteristic of a population or a sample.

Observation/case/subject/participant: Names given to our individual subjects.

Data（資料）: The observed value of a variable.

#### Types of variable and data

###### Variables:

**Categorical variable (Qualitative)**: Columns in a table that records a common attribute (e.g. Gender, Brand...)

Nominal variable: Colors of a flag---groups with no rank

Ordinal variable: Finishing place of a contest---groups that are ranked in an order

Binary variable: Head/Tail in a coin flip---Yes or No

**Numerical variable (Quantitative)**: Columns in a table that records a numerical data (e.g. 1,2,3,4,5 from least liked to most liked)

Descrete variable: Numbers of people in a place---represents the counts of individual items.

Continuous variable: Distance from A to B---represents the measurement of continuous values.

###### Data:

**Categorical data (Qualitative)**: Columns in a table that records a common attribute (e.g. Gender, Brand...)

Nominal data: Red, Blue, White---groups with no rank

Ordinal data: Ali(1st), Thomas(2nd), Matt(3rd)---groups that are ranked in an order

Binary data: Head or Tail, 0 or 1---Yes or No

**Numerical variable (Quantitative)**: Columns in a table that records a numerical data (e.g. 1,2,3,4,5 from least liked to most liked)

Descrete data: 5 people in a restaurant---represents the counts of individual items.

Continuous data: 51.6km from A to B---represents the measurement of continuous values.

### References:

Statistics for Business decisionmaking and Analysis