Difference between Population and Sampling with Example
The literal meaning of population is a number of people living in a defined geographical location and sample are part of the population. In statistics population includes all members of a defined group that we are studying or collecting information for data-driven decisions, for example, all the votes casted in an electoral portal and sampling can be biased or unbiased known as random sample, for example, a portion of votes collected to predict the election outcome through the “Exit Poll”.
Let’s take a very basic example to understand population vs sample. To calculate the average height of male persons in India. What will be the approach to this statement? Collecting the data of all over India is not at all possible we will start collecting samples first, we can divide India into four zones like north south east west and even then we can divide into more demographic regions like Platu, plain, river, what we did we created some samples. But now the question arises are these samples certified, should I take only these samples answer is no, so we are taking the random samples out of that. Like we have taken 100 random peoples height from every region. Now are we sure have we taken the correct data….No, but we are sure we are close to 90% of accurate data. Like 90 people out of 100 people from the north have the average height of this number that I can say, on this basis, I can give you the answer.
Whenever we have problem statement, on the whole, we can not answer on the population basis we answer only the basis on the sample. So sample means a part of the population. Whenever we crop some subset from entire data we call it a sample. This is also a fact is we cannot rely on one sample we will have to take certain number of samples so that we can consider different cases also other vise there is a possibility of outlier pattern in single pattern outlier means abnormal or extreme pattern there is a possibility of extremity in single subset or single sample. When we take a number of samples we can be sure it represents the 90% accuracy. This is the first part of statistics whenever we come across any of the data we will first start with sampling our observation. We can set the accuracy level in any of the statistical tools like SAS, R, Python Stata no matters what we are using to apply statistics. Sampling is the essential topic of data analytics which covers in Data Science training.