Inferential Statistics
Introduction: Many a times, we can only afford to collect data from samples, because it is too difficult or expensive to acquire data from the whole population that we are interested in. While using descriptive statistics can only summarize a sample’s characteristics, but inferential statistics use your sample to make reasonable guesses about the larger population. This is called inference making and so it is popular by the name inferential statistics. While using inferential statistics, it’s important to use random and unbiased sampling methods. If your sample isn’t representative of your population, then you can’t build right inference on the population.
Example: Let us suppose that you wanted to demystify the average salary of a typical Indian. According to descriptive statistics, In order to find the mean salary of the population, we need the salary information of almost 1.3B people. The cost of the task is quiet huge and it also seems practically impossible to fetch such a tremendous information. In order to approach this kind of problems we can use Inferential statistics which uses the sample data to make estimates and develop inferences on the whole population based on the sample data.
Sampling Error:
In Inferential Statistics, we use sample data which is always way smaller than the actual population and this usually creates sampling error which is the difference between the actual population (parameters) and the measured sample values(statistics).
- A measure of population data is called parameter. Eg: Population mean, standard deviation
- A measure of sample data is called statistic. Eg: sampled mean.
There are two important types of approximations you can make about the population:
- A point estimate is a single value estimate of a parameter. For instance, a sample mean is a point estimate of a population mean.
- An interval estimate gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.
Central Limit Theorem: Let X1, . . , Xn be a random sample from some population with mean μ and variance σ2. Then for large n:
Confidence Interval:
Confidence Interval is an interval of reasonable values for our parameters. They are used to give an interval estimation for our parameter of choice.
The margin of error is calculated by multiplying the standard error of the mean and the z-score.
Margin of error = (z. σ)/ √n
Confidence interval is defined as:
Confidence intervals in the above image depicts how probable is a value to be found in the distribution.
- We are 68.26% confident that the 68% of the data can be found inside one standard deviation away from the mean in the normal distribution curve.
- Similarly, we are 95.44% confident that 95% of the data can be found inside the two standard deviation away from the mean in the normal distribution curve.