## Population, Sample, Central limit theorem

** **

Population is a superset, for example the population of Delhi, or Mumbai etc. and whatever entities are there, they make up a population. Supposing we were told to find out the population of people using a Tablet in Delhi. Since it will not be physically possible to go to each house and find out, a random survey of say 50-100 people in different localities of the city would be taken and an we can arrive at a conclusion that if the samples are reflecting a particular trend then the same would be the case with the rest of the population. The important criteria is that the selection of people surveyed should be random. It cannot be biased.

There are three attributes with Population and three attributes with Samples. Whenever we talk of data, we deal with two attributes and the third attribute is taken for granted which Is number of items. If we take an average of the population of Delhi that is also a number of terms. It is a superset and not infinite. The number of people in the population is represented by capital N. and the mean of those N items is represented by U. and all the N items which had a mean U is represented by Sigma.

Whereas the number of items in a Sample are represented by small n.

And the mean of these items is represented by x bar. And their standard deviation is s.

Suppose we have collected data from five different people who represented 5 different areas of Delhi.

It is also important that a representing sample should not be too small as to not give a correct picture of the population that it is representing.

A compilation of these different samples will reflect the actual population. This population will have its mean but we will be having only a sample of different random experiments which were performed to assimilate the sample. Each sample would have its own mean which could be different from the mean of any other sample.

Central limit theorem says that if you have conducted random surveys, you will have received many samples, and these samples will be having a different mean. If you plot all these different means you will always get a normal distribution. That means if we use all these means as a data point then the result will look like the figure below.

This is a very powerful theory and with the help of this theory we can do the hypothical testing, also draw the statistical significance.

The above is the link to a very popular website which describes the theory well.

Suppose we pick a sample with 5 items randomly.

Now if we were to take another sample of 5 items randomly and the result is like this

Now we have two averages, as we have done two experiments.

Everytime we experiment we will get different results. Suppose we did 10,000 experiments and we got a result like this figure below.

Supposing one team was asked to conduct a survey with 10 people and the other team was asked to conduct a survey with 25 people and the result would look like this.

When we take the mean of the data of the population we get a mean which is 17.00 whilst the mean of the two samples have a different mean, the mean of the survey with 10 people comes to 17.04 and the survey with 25 people comes to 17.07. The standard deviation of the main population is 8.21 whereas for the two samples it is different. There is a kind of relationship between the means of the two different surveys. The standard deviation of the population is divided by the square root of the sample size. It is important that the sample size is big so that the results are nearer to accuracy.

Here take for example we have two teams surveying 10,000 people each. One team surveys 16 people at one time whereas the other team surveys 25 people at a time.

You will notice that the mean of both the surveys is nearly the same as the original population, but there is a vast difference in the standard deviation.

Whenever you find standard deviations for any distribution which is normally distributed, always remember that if you move -3.4 and +3.4 steps you will find the whole data will be accommodated in these steps. One step means one standard deviation forward or one standard deviation backwards. If you were to be asked how many standard deviations away is 39, the answer is it is -1 step away, similarly 61 is +1 step away.

The number of steps that you need to move from the mean to the number is known as the Z-Score. For example the Z-Score of 83 is 3. Or the Z-Score for 45.5 will be 0.5. so till now whatever diagrams of normal distribution that we saw, we knew the probabilities of the Z-Scores.

In our example it means from negative infinity to whatever numbers are there upto the number 70 the probability of those numbers being present in our data will be only 0.1%. from negative infinity to number 28 the probability of those numbers being present in our data is 2.2%.

Similarly from negative infinity upto the mean, the probability of all the numbers up to the mean being present in our data set is 50%.and the probability of the numbers being present from negative infinity upto 61 is 84.1 %.

So to check the probability of these numbers from negative infinity to a particular number there is a table. You need to check the Z-Score from the table. This Z-Score table can be downloaded.

What is the population in the above case? Here 36 boxes is the sample size and not the population. The mean of all the samples when plotted together will give you a normal distribution. Here the population is 72 and the standard deviation is 3.

Suppose we take the upper limit of weight permissible and divide it by the number of boxes, we will get an average weight permissible per box, which in this case will be 73.05 kg. and our normal distribution is 72. But this value has an error of 3 kg. So we have to check as to what is the probability of getting this number. The Z-Score of this will be 73.05 minus 72 divided by 0.5 which comes to 2.1

Now we will look at the Z Table.

** **

**STANDARD NORMAL PROBABILITIES**

In a Z table there are numbers on the left side followed by a one single decimal place and other values are written on the right side. For example if we want to see the value of -3.45 then we can see it under .05 in the line which gives the probability as .0003.

Note that all probabilities mentioned are negative infinities.

Therefore for Z-Score 2.1 the probability of the weight being 73.05 is .9821 which means that we can say with a confidence of 98.21% that we can make the shipment of the 36 boxes.

The first step in such given problems will be to identify the population and the sample.

If we look up the Z table we can see the probability

To be able to make a trade of $ 20,000 with 100 trades a week, the average comes to $200 per trade or more is required to be achieved.

So we will subtract 95.7 from 200 and divide it by 124.7 which is equal to 0.836 and the Z-Score of this comes to .7995 which means there is 20.05% probability that your team will be making a trade of $20,000 or more in any given week.

Now we come back to the first question, what is the probability of your team making a loss in any given week?

In this case the net profit will be calculated as zero. We have to calculate the probability of making a loss. So we will have to take the Z-Score of zero first. Zero minus 95.7 divided by 124.7 equals to -0.767.

Which means there is a 22.36 % probability that your team will make a loss. These examples are used to do hypothetical testing.