Business Statistics
“Quick refresher”
“Quick refresher”
By
Muhammad
Usman Qazi
This quick refresher is for
beginners or those who need a quick refresher
Table of Contents
- Bsic Concepts
- What is Statistics?
- Kinds of Statistics
- Constant
- Variable
- Section 1 - Collection Of Data
- Section 2 - Presentation Of Data
- Section 3 - Analysis of Dsata
- Section 4- Interpretation Of Data
Today statistics has become an
important tool in the work of many academic disciplines such as Medicine,
Psychology, Education, Sociology, Engineering and Physics, just to name a few.
Statistics is also important in many
aspects of society such as Business, Industry and Government. Because of the
increasing use of statistics so many areas of our lives, it has become
very desirable to understand and practice statistical thinking. This is an
important even if you don't use statistical method directly.
Here we are not going to
discuss in detail about the statistics history and definitions in different
era. We will study the statistics just go through the basic concepts
Basic
Concepts:
Before discussing
further, first we need to understand few basis concepts to develop better
understanding about the subject.
What is Statistics?
The science of
collection, presentation, analysis and interpretations of numerical data.ist
Kinds of Statistics
There are following two
kinds of statistics:
- Descriptive Statistics
- Inferential Statistics
- Descriptive Statistics: Descriptive statistics give
information that describes the data in some various forms,e.g tables,
graph, diagram and other tools which help describing the data.
- nferential Statistics: makes inferences
about populations using data drawn from the population. Instead of using
the entire population to gather the data, the statistician will collect a
sample or samples from the millions of residents and make inferences about
the entire population using the sample.
Note:This quick refresher is all about
Descriptive Statistics.
Constant: is an identical value from person to person,
place to place or time to time. variable can be classified of
the following two classification:
Variable: if attains different values from person to
person, place to place or time to time is called variable e.g weight of
individuals, price of rice.
- Qualitative variable:take on
values that are names or labels. The color of a ball (e.g., red, green,
blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be
examples of qualitative or categorical variables.
- Quantitative Variable: are
numeric. They represent a measurable quantity. For example, when we speak
of the population of a city, we are talking about the number of people in
the city - a measurable attribute of the city. Therefore, population would
be a quantitative variable, quantitative variable have following further
two classification:
- Discrete Variable: If a
variable can not take on any value between its minimum value and its
maximum value i.e counting process, it is called a discrete variable e.g
numbers of family members i.e five brothers, three sisters, ten rupees..
- Continuous Variable: If a
variable can take on any value between its minimum value and its maximum
value i.e ever possible value, it is called a continuous variable e.g
height of individual in feet.
The above def of
Statistics (What is Statistics?) tells about following four phases of statistics, which need to be
required in statistical examinations / investigation :
- Collection of data↴
- Presentation of data i.e collected data prensented into readable , understandable form.↴
- Analysis of data i,e presented data analysed ↴
- Interpretation of data i.e determine the conclusion / findings
The above phases will be
discussed in following four sections in detail.
Section-1
Collection Of Data

There are two sources
for the collection of data
(1) Primary Data (2)
Secondary Data
(1) Primary Data:
The primary data are the first hand information
collected, compiled and published by organization for some purpose. They are
most original data in character and have not undergone any sort of statistical
treatment.
Example: Population census reports are primary data
because these are collected, complied and published by the population census
organization.
Methods of Collecting Primary Data:
Primary data are collected by the following methods:
1. Personal Investigation: The researcher conducts the
survey him/herself and collects data from it. The data collected in this way is
usually accurate and reliable. This method of collecting data is only
applicable in case of small research projects.
2. Through Investigation: Trained investigators are
employed to collect the data. These investigators contact the individuals and
fill in questionnaire after asking the required information. Most of the
organizing implied this method.
3. Collection through Questionnaire: The researchers get
the data from local representation or agents that are based upon their own
experience. This method is quick but gives only rough estimate.
4. Through Telephone: The researchers get information
through telephone this method is quick and give accurate information.
(2) Secondary Data:
The secondary data are the second hand information which
are already collected by some one (organization) for some purpose and are
available for the present study. The secondary data are not pure in character
and have undergone some treatment at least once.
Example: Economics survey of England is secondary
data because these are collected by more than one organization like Bureau of
statistics, Board of Revenue, the Banks etc…
Methods of Collecting Secondary Data:
The secondary data are collected by the following
sources:
1. Official: e.g. The publications of the Statistical
Division, Ministry of Finance, the Federal Bureaus of Statistics, Ministries of
Food, Agriculture, Industry, Labor etc
…
2. Semi-Official: e.g. State Bank, Railway Board, Central
Cotton Committee, Boards of Economic Enquiry etc…
3. Publication of Trade Associations, Chambers of
Commerce etc…
4. Technical and Trade Journals and Newspapers.
5. Research Organizations such as Universities and other
institutions.
Difference between Primary and Secondary Data:
The difference between primary and secondary data is only
a change of hand. The primary data are the first hand data information which is
directly collected form one source. They are most original data in character
and have not undergone any sort of statistical treatment while the secondary
data are obtained from some other sources or agencies. They are not pure in
character and have undergone some treatment at least once.
For Example: Suppose we interested to find the
average age of MS students. We collect the age’s data by two methods; either by
directly collecting from each student himself personally or getting their ages
from the university record. The data collected by the direct personal
investigation is called primary data and the data obtained from the university
record is called secondary data.
Editing of Data:
After collecting the data either from primary or
secondary source, the next step is its editing. Editing means the examination
of collected data to discover any error and mistake before presenting it. It
has to be decided before hand what degree of accuracy is wanted and what extent
of errors can be tolerated in the inquiry. The editing of secondary data is
simpler than that of primary data.
Section-2
Presentation Of Data

- Array
- Tables
- Graphs. diagrams and charts.
Array Form:
In this form collected
data will be presented into ascending or decending order of magnitude.
Tables:
Collected data will be
presented into rows and columns. This is used for almost all subjects like
Economics, Accounting etc. To manage data in this form, there are following two
procedures which are interlinked with other:
- Classification
- Tabulation
1) Classification:
In classification
collected data are distributed into different classes or groups according to
their resemblances
in other words “the
process of arranging things in groups or classes according to their
resemblances and affinities and gives expression to the unity of attributes
that may subsist amongst a diversity of individuals”.
The raw data, collected
in real situations and arranged haphazardly, do not give a clear picture.Thus
to locate similarities and reduce mental strain we resort to classification.
Classification condenses the data by dropping out unnecessary details. It
facilitates comparison between different sets of data clearly showing the
different points of agreement and disagreement.
During population
census, people in the country are classified according to sex (males/females),
marital status (married/unmarried), place of residence (rural/urban), Age (0–5
years, 6– 10 years, 11–15 years, etc.), profession (agriculture, production,
commerce, transport, doctor, others) in differnet cities etc.
Usually there are four
following bases to classify data:
- Spatial classification:
Data are classified on the basis of locational i.e city wise prices of
sugar.
- Temporal classification
Arrangement of data according to its time of occurrence i.e yearly
imports of Pakistan
- Qualitative classification
Data classified on the basis of some quality i.e colour of eyes, sex etc.
- Quantitative classification
Data classified discrete or continuous on the basis of magnitude i.e height of
individual, numbers of brothers
2) Tabulation
is the process of
managing data into tabular form according to their classes i.e data in rows and
columns
Tables are a
standard method of presenting qualitative or categorical data, but they can
also be used to summarize quantitative data (see Table 1).
For example:
Table
1. Birth Rate and Death Rate
in
different County for 2017
Country
Birth Rate
Death Rate
China
33
24
Australia
25
14
USA
40
10
Japan
35
19
Frequency Distribution:
Frequency is how often something occurs.
A frequency can also be defined as how often something happens. For example, the number of dogs that
people own in a neighborhood is a frequency.
A
distribution refers to the pattern of these frequencies.

In
other words " Frequency distribution is a tabular form of quantitative
data into different classes along with the class frequencies i.e number of
values in each class"
In
our example above, you might do a survey of your neighborhood to see how many
dogs each household owns.
Let’s
say you obtain the following set of scores from your sample:
1, 0, 1, 4, 1, 2, 0, 3, 0 2, 1, 1,
2, 0, 1, 1, 3
The
first step in turning this into a frequency distribution is to create a table.
Label one column the items you are counting, in this case the number of dogs in
households in your neighborhood.
Next,
create a column where you can tally the responses. Place a line for each
instance the number occurs.
Finally,
total your tallies and add the final number to a third column.
Number
of Dogs in Household
|
Tally
|
Frequency
|
0
|
||||
|
4
|
1
|
|||||
||
|
7
|
2
|
|||
|
3
|
3
|
||
|
2
|
4
or more
|
|
|
1
|
Using
a frequency distribution, you can look for patterns in the data. Looking at the
table above you can quickly see that out of the 17 households surveyed, 7
families had one dog while 4 families did not have a dog.
Another Example of a Frequency Distribution
For
example, let’s suppose that you are collecting data on how many hours of sleep
college students get each night.
After
conducting a survey of 29 of your classmates, you are left with the following
set of scores:
7, 5, 8, 9, 4, 10, 7, 9, 9, 6, 5,
11, 6, 5, 9, 9, 8, 6, 9, 7, 8, 4, 7, 8, 7, 6, 10, 4, 8
In
order to make sense of this information, you need to find a way to organize the
data. A frequency distribution is commonly used to categorize information so
that it can be interpreted quickly in a visual way.
In
our example above, the number of hours each week serves as the categories and
the occurrences of each number are then tallied.
The
above information could be presented in a table:
Hours
of Sleep
|
Tally
|
Frequency
|
4
|
|||
|
3
|
5
|
|||
|
3
|
6
|
||||
|
4
|
7
|
|||||
|
5
|
8
|
|||||
|
5
|
9
|
|||||
|
|
6
|
10
|
||
|
2
|
11
|
|
|
1
|
Looking
at the table, you can quickly see that 7 people reported sleeping sleeping for
9 hours while only 3 people reported sleeping for 4 hours.
Un grouped Data: Any untreated primary data is called un-grouped data.
Grouped Data: Secondary data (frequency distribution form) is called
grouped data
Example (Un-grouped Data/Grouped Data):
Let us consider the marks obtained by 100 students of a class in Economics.Example (Un-grouped Data/Grouped Data):
Table 1.2: Marks of 100 Students of a Class in Economics
If the raw-data of Table 1.2 are arranged in either ascending, or, descending order of magnitude,
we get a better way of presentation, usually called an “array” (Table 1.3).
Now let us present the above data in the form of a simple (or, ungrouped) frequency distribution using the tally marks. A tally mark is an upward slanted stroke (/) which is put against a value each time it occurs in the raw data. The fifth occurrence of the value is represented by a cross tally mark (\) as shown across the first four tally marks.
Finally, the tally marks are counted and the total of the tally marks against each value is its
frequency.
Let us now represent the data in Table 1.3 as simple (or, ungrouped) frequency distribution.
Grouped Frequency Distribution:
The data in Table 1.3 can be further condensed by putting them into smaller groups, or, classes called “class-Intervals”. The number of items which fall in a class-interval is called its “class frequency”.
The tabulation of raw data by dividing the whole range of observations into a number of classes and indicating the corresponding class-frequencies against the class-intervals, is called “grouped frequency distribution”.
Let us now represent the data in Table 1.3 as grouped frequency distribution. We find that the lowest value is 56 and the highest value is 73. Thus for approximately 10 classes the difference of values between two consecutive classes will be 73-56/10 =17/10=1.7=2 and the nine class-intervals will be 56–57, 58–59, ..., etc. (Table 1.5).
Thus the steps in preparing the grouped frequency distribution are:
1. Determining the class intervals.
2. Recording the data using tally marks.
3. Finding frequency of each class by counting the tally marks.
Several Important Terms:
(a) Class-limits: The maximum and minimum values of a class-interval are called upper class limit and lower class-limit respectively. In Table 1.5 the lower class-limits of nine classes are 56, 58, 60, 62, 64, 66, 68, 70, 72 and the upper class-limits are 57, 59, 61, 63, 65, 67, 69, 71, 73.
(b) Class-mark, or, Mid-value: The class-mark, or, mid-value of the class-interval lies exactly at the middle of the class-interval and is given by:
(i) Class-mark, or, Mid-value = (lower class limit + upper class limit) /2
or, (lower class boundary + upper class boundary)/2
or, Lower class-limit + 1/2 (upper class-limit − lower class-limit)
(c) Class boundaries: Class boundaries are the true-limits of a class interval. It is associated with grouped frequency distribution, where there is a gap between the upper class-limit and the lower class-limit of the next class.
Class intervals Class Boundaries
Marks
56- 57(upper class limit) 55.5-57.5
(lower class limit)58⤢-59 57.5-59.5
60-61 59.5-61.5
62-63 61.5-63.5
64-65 63.5-65.5
66-67 65.5-67.5
68-69 67.5-69.5
70-71 69.5-71.5
72-73 71.5-73.5
The above common difference between the upper class-limit of a class-interval and the lower class limit of the next higher class interval i.e 58 – 57 = 1. This difference 1 is denoted as d.
Now the above class boundaries can be determined by using the following formula:
Lower class boundary = lower class-limit −1/2 d i.e lower class boundry = 56-1/2 (1)=55.5
Upper class boundary = upper class-limit +1/2 d i.e upper class boundary- 57+1/2(1)=57.5
The class-boundaries of the class-intervals of Table 1.5 will be 55.5 – 57.5; 57.5 – 59.5; 59.5 – 61.5; etc.,as above, since d = The class-boundaries convert a grouped frequency distribution (inclusive type) into a continuous frequency distribution.
Graphs
Graphs and charts can quickly convey to the reader the essential points or trends in the data. Graphs and charts are particularly useful when data are being presented to an audience, because information has to be conveyed in a limited time period.
There are some general common sense recommendations to follow when presenting
data:
i) The presentation should be as simple as possible. Avoid the trap of adding too much information. It is not the aim to include all the information you have but only a summary of the essential feature(s) you are tying to illustrate. A good rule of thumb is to only present one idea or to have only one purpose for each graph or chart you create.
ii) The presentation should be self-explanatory. A chart or graph is not serving its purpose if the reader cannot comprehend the legends or has to refer to the text in order to understand it. There is a careful balance between too much information which makes the graph or chart too complicated and too little information that makes the chart difficult to comprehend or worse misleading.
iii) The title should be clear, and concise indicating what?, when?, and where? the data were obtained.
iv) Codes, legends and labels should be clear and concise, following standard formats if possible.
v) The use of footnotes is advised to explain essential features of the data that are critical for the correct interpretation of the graph or chart.
Section-3
Analysis Of Data
In previous section we have discussed the 2nd phase i.e presentation of data in tabular form. Now this section will discuss basic technique of data analysis i.e study of central tendency means numarical value mostly fall in the centre of data and also represent the whole data. Central tendency means average value of the data. There are following common averages:
- Arithmetic Mean
- Mode
- Median
The above averages will be computed by adopting any of the following method according to the nature of data i.e Ungrouped / Grouped Data:
Averages Nature of Data
Un-grouped data Grouped data
Arithmetic Mean ⅃ㄡ= ∑x / n ㄡ= ∑fx / ∑f
Mode Most frequently choose max frequency
occurring value pick corresponding value
(for discrete variable)
- L+(fm-f1) /(fm-f1)+(fm-f2)xh
(for continuous variable)
Median (n+1/2) th term L+h / f (n / 2-c)
(for discrete / (for continuous variable)
continuous
variable)
Please Note:
- x denotes to the values of the variable X
- n denotes number of values of X
- f denotes frequency of different groups
- L denotes lower class boundary
- fm denotes frequency
- f1 denotes previous frequency
- f2 denotes next frequency
- h denotes size of class
- n/2 denotes ∑f / 2m
Examples:
Arithmetic Mean (Un-grouped data):
Example 1: Marks obtained by ten students of first year in economics are : 65, 60, 57, 40, 38, 50, 51, 44,62, 53 calculate mean
Sol:
lets marks denoted by x
x
65
60
57
40
38
50
51
44
62
53
520= ∑x
Arithmetic Mean : ㄡ= ∑x / n = 520/10=52
Arithmetic Mean (Grouped data):
Example 2: Weight of 125 students is given below, calculate A.M
Weight Frequency
95-100 7
100-105 17
105-110 29
110-115 35
115-120 22
120-125 15
Sol:
Weight Frequency * x (mid point) fx
95-100 7 97.5 682.5
100-105 17 102.5 1742.5
105-110 29 107.5 3117.5
110-115 35 112.5 3937.5
115-120 22 117.5 2585
120-125 15 122.5 1837.5
∑f = 125 - ∑fx= 13902.5
*x= mid point = (lower class + upper class) / 2 = (95+100)/2 = 97.5 and so on ......
Arithmetic Mean= A.M= ㄡ= ∑fx / ∑f = 13902.5/125=111.22
Mode (Un-grouped data):
Example 3: Daily wage received by 8 child labours: Rs. 20,25.50,30,35,50,35,50 calculate mode.
Sol: Most frequently occurring value = in above most occurring value is Rs. 50, so mode will be Rs. 50 per day.
Mode (Grouped data - for discrete variable):
Example 4: shoe shop sold ladies shoes ,fol are the shoe sizes and pairs sold, calculate mode:
Size of ladies shoe No. of pairs
5 20
5.5 35
6 15
4 18
Sol: choose max frequency pick corresponding value as mode
Size of ladies shoe(x) No. of pairs (f)
5 20
5.5 corresponding value ← 35 = Max frequency
6 15
4 18
Thus mode of lady shoes is 5.5
Mode (Grouped data - for continuous variable):
Example 5: Time taken by different labours to complete the task is given below, find mode
Time Labours
10-15 20
15-20 30
20-25 40
25-30 25
Sol:
Time (C.B) Labours
10-15 20
15-20 30 = f1
L↴
20 -25 =h= 25-20= 5 40 = fm
20 -25 =h= 25-20= 5 40 = fm
25-30 25 =f2
Mode= L+(fm-f1) /(fm-f1)+(fm-f2)xh = 20+(40-30)/ (40-30)+(40-25)x5= 20+2= 22 minutes
Median (Un-grouped data - for discrete / continuous variable):
Example 6: no.of brothers and sisters of the employees of the company noted below find median:
6,5,3,4,2,1,4,3,7,5,5
Sol: above data rearrange as array: 1,2,3,3,4,4,5,5,5,6,7
Median = (n+1/2) th term = (11+1/2) = 12/2=6th term in the arranged data that is 4.
Median (Grouped data - for continuous variable): Ages of the employees of a company noted below:
Age No.of Employees
Below 20 11
20-25 25
25-30 40
30-35 50
35-40 100
40-45 80
Above 45 60
Sol:
Age No.of Employees (f) commulative frequency (c.f)
Below 20 11 11
20-25 25 36
25-30 40 76
30-35 50 126= c
L↴35-40 = h= 40-35=5 100 = f 226
40-45 80 306
Above 45 60 366
n= ∑f =366
n/2= ∑f/2=366/2=183 ,
Median = L+h / f (n / 2-c) = 35+5/100(183-126)= 35+5x57/100=35+2.85=37.85 years
Section-4
Interpretation Of Data

Data interpretation is the decision-making process that follows the analysis of collected data and drives future action. It is often the final stage of decision-making strategies.
Data interpretation relies on data gathered during the collection period to make informed decisions about how to proceed.
No comments:
Post a Comment