Business Statistics

Business Statistics
“Quick refresher”

Muhammad Usman Qazi

This quick refresher is for beginners or those who need a quick refresher

Bsic Concepts

What is Statistics?

Kinds of Statistics

Constant

Variable

Section 1 - Collection Of Data

Section 2 - Presentation Of Data

Section 3 - Analysis of Dsata

Section 4- Interpretation Of Data

Today statistics has become an important tool in the work of many academic disciplines such as Medicine, Psychology, Education, Sociology, Engineering and Physics, just to name a few.

Statistics is also important in many aspects of society such as Business, Industry and Government. Because of the increasing use of statistics so many areas of our lives, it has become very desirable to understand and practice statistical thinking. This is an important even if you don't use statistical method directly.

Here we are not going to discuss in detail about the statistics history and definitions in different era. We will study the statistics just go through the basic concepts

Basic Concepts:

Before discussing further, first we need to understand few basis concepts to develop better understanding about the subject.

What is Statistics?

The science of collection, presentation, analysis and interpretations of numerical data.ist

Kinds of Statistics

There are following two kinds of statistics:

Descriptive Statistics

Inferential Statistics

Descriptive Statistics: Descriptive statistics give information that describes the data in some various forms,e.g tables, graph, diagram and other tools which help describing the data.

nferential Statistics: makes inferences about populations using data drawn from the population. Instead of using the entire population to gather the data, the statistician will collect a sample or samples from the millions of residents and make inferences about the entire population using the sample.

Note:This quick refresher is all about Descriptive Statistics.

Constant: is an identical value from person to person, place to place or time to time. variable can be classified of the following two classification:

Variable: if attains different values from person to person, place to place or time to time is called variable e.g weight of individuals, price of rice.

Qualitative variable:take on values that are names or labels. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of qualitative or categorical variables.
Quantitative Variable: are numeric. They represent a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable, quantitative variable have following further two classification:

Discrete Variable: If a variable can not take on any value between its minimum value and its maximum value i.e counting process, it is called a discrete variable e.g numbers of family members i.e five brothers, three sisters, ten rupees..

Continuous Variable: If a variable can take on any value between its minimum value and its maximum value i.e ever possible value, it is called a continuous variable e.g height of individual in feet.

The above def of Statistics (What is Statistics?) tells about following four phases of statistics, which need to be required in statistical examinations / investigation :

Collection of data↴
Presentation of data i.e collected data prensented into readable , understandable form.↴
Analysis of data i,e presented data analysed ↴
Interpretation of data i.e determine the conclusion / findings

The above phases will be discussed in following four sections in detail.

Section-1

Collection Of Data

The first step in any enquiry (investigation) is collection of data. The data may be collected for the whole population or for a sample only. It is mostly collected on sample basis. Collection of data is very difficult job. The investigator is well trained person who collect the statistical data.

There are two sources for the collection of data

(1) Primary Data (2) Secondary Data

(1) Primary Data:

The primary data are the first hand information collected, compiled and published by organization for some purpose. They are most original data in character and have not undergone any sort of statistical treatment.

Example: Population census reports are primary data because these are collected, complied and published by the population census organization.

Methods of Collecting Primary Data:

Primary data are collected by the following methods:

1. Personal Investigation: The researcher conducts the survey him/herself and collects data from it. The data collected in this way is usually accurate and reliable. This method of collecting data is only applicable in case of small research projects.

2. Through Investigation: Trained investigators are employed to collect the data. These investigators contact the individuals and fill in questionnaire after asking the required information. Most of the organizing implied this method.

3. Collection through Questionnaire: The researchers get the data from local representation or agents that are based upon their own experience. This method is quick but gives only rough estimate.

4. Through Telephone: The researchers get information through telephone this method is quick and give accurate information.

(2) Secondary Data:

The secondary data are the second hand information which are already collected by some one (organization) for some purpose and are available for the present study. The secondary data are not pure in character and have undergone some treatment at least once.

Example: Economics survey of England is secondary data because these are collected by more than one organization like Bureau of statistics, Board of Revenue, the Banks etc…

Methods of Collecting Secondary Data:

The secondary data are collected by the following sources:

1. Official: e.g. The publications of the Statistical Division, Ministry of Finance, the Federal Bureaus of Statistics, Ministries of Food, Agriculture, Industry, Labor etc

…

2. Semi-Official: e.g. State Bank, Railway Board, Central Cotton Committee, Boards of Economic Enquiry etc…

3. Publication of Trade Associations, Chambers of Commerce etc…

4. Technical and Trade Journals and Newspapers.

5. Research Organizations such as Universities and other institutions.

Difference between Primary and Secondary Data:

The difference between primary and secondary data is only a change of hand. The primary data are the first hand data information which is directly collected form one source. They are most original data in character and have not undergone any sort of statistical treatment while the secondary data are obtained from some other sources or agencies. They are not pure in character and have undergone some treatment at least once.

For Example: Suppose we interested to find the average age of MS students. We collect the age’s data by two methods; either by directly collecting from each student himself personally or getting their ages from the university record. The data collected by the direct personal investigation is called primary data and the data obtained from the university record is called secondary data.

Editing of Data:

After collecting the data either from primary or secondary source, the next step is its editing. Editing means the examination of collected data to discover any error and mistake before presenting it. It has to be decided before hand what degree of accuracy is wanted and what extent of errors can be tolerated in the inquiry. The editing of secondary data is simpler than that of primary data.

Section-2

Presentation Of Data

We have already discussed the first phase of statistical enquiry i.e collection of data either from primary or secondary source in previous section. After collection of data, it needs to be sorted in an easily understandable form. The next phase of a statistical enquiry is presentation of collected data The presentation of collected data usually use in the form of

Array
Tables
Graphs. diagrams and charts.

Array Form:

In this form collected data will be presented into ascending or decending order of magnitude.

Tables:

Collected data will be presented into rows and columns. This is used for almost all subjects like Economics, Accounting etc. To manage data in this form, there are following two procedures which are interlinked with other:

Classification
Tabulation

1) Classification:

In classification collected data are distributed into different classes or groups according to their resemblances

in other words “the process of arranging things in groups or classes according to their resemblances and affinities and gives expression to the unity of attributes that may subsist amongst a diversity of individuals”.

The raw data, collected in real situations and arranged haphazardly, do not give a clear picture.Thus to locate similarities and reduce mental strain we resort to classification. Classification condenses the data by dropping out unnecessary details. It facilitates comparison between different sets of data clearly showing the different points of agreement and disagreement.

During population census, people in the country are classified according to sex (males/females), marital status (married/unmarried), place of residence (rural/urban), Age (0–5 years, 6– 10 years, 11–15 years, etc.), profession (agriculture, production, commerce, transport, doctor, others) in differnet cities etc.

Usually there are four following bases to classify data:

Spatial classification:

Data are classified on the basis of locational i.e city wise prices of sugar.

Temporal classification

Arrangement of data according to its time of occurrence i.e yearly imports of Pakistan

Qualitative classification

Data classified on the basis of some quality i.e colour of eyes, sex etc.

Quantitative classification

Data classified discrete or continuous on the basis of magnitude i.e height of individual, numbers of brothers

2) Tabulation

is the process of managing data into tabular form according to their classes i.e data in rows and columns

Tables are a standard method of presenting qualitative or categorical data, but they can also be used to summarize quantitative data (see Table 1).

For example:

Table 1. Birth Rate and Death Rate

in different County for 2017

Country Birth Rate Death Rate

China 33 24

Australia 25 14

USA 40 10

Japan 35 19

Frequency Distribution:

Frequency is how often something occurs.

A frequency can also be defined as how often something happens. For example, the number of dogs that people own in a neighborhood is a frequency.

A distribution refers to the pattern of these frequencies.

So a frequency distribution looks at how frequently certain things happen within a sample of values.

In other words " Frequency distribution is a tabular form of quantitative data into different classes along with the class frequencies i.e number of values in each class"

In our example above, you might do a survey of your neighborhood to see how many dogs each household owns.

Let’s say you obtain the following set of scores from your sample:

1, 0, 1, 4, 1, 2, 0, 3, 0 2, 1, 1, 2, 0, 1, 1, 3

The first step in turning this into a frequency distribution is to create a table. Label one column the items you are counting, in this case the number of dogs in households in your neighborhood.

Next, create a column where you can tally the responses. Place a line for each instance the number occurs.

Finally, total your tallies and add the final number to a third column.

Number of Dogs in Household	Tally	Frequency
0	\|\|\|\|	4
1	\|\|\|\|\| \|\|	7
2	\|\|\|	3
3	\|\|	2
4 or more	\|	1

Using a frequency distribution, you can look for patterns in the data. Looking at the table above you can quickly see that out of the 17 households surveyed, 7 families had one dog while 4 families did not have a dog.

Another Example of a Frequency Distribution

For example, let’s suppose that you are collecting data on how many hours of sleep college students get each night.

After conducting a survey of 29 of your classmates, you are left with the following set of scores:

7, 5, 8, 9, 4, 10, 7, 9, 9, 6, 5, 11, 6, 5, 9, 9, 8, 6, 9, 7, 8, 4, 7, 8, 7, 6, 10, 4, 8

In order to make sense of this information, you need to find a way to organize the data. A frequency distribution is commonly used to categorize information so that it can be interpreted quickly in a visual way.

In our example above, the number of hours each week serves as the categories and the occurrences of each number are then tallied.

The above information could be presented in a table:

Hours of Sleep	Tally	Frequency
4	\|\|\|	3
5	\|\|\|	3
6	\|\|\|\|	4
7	\|\|\|\|\|	5
8	\|\|\|\|\|	5
9	\|\|\|\|\| \|	6
10	\|\|	2
11	\|	1

Looking at the table, you can quickly see that 7 people reported sleeping sleeping for 9 hours while only 3 people reported sleeping for 4 hours.

Un grouped Data: Any untreated primary data is called un-grouped data.

Grouped Data: Secondary data (frequency distribution form) is called grouped data

Example (Un-grouped Data/Grouped Data):

Let us consider the marks obtained by 100 students of a class in Economics.
Table 1.2: Marks of 100 Students of a Class in Economics

If the raw-data of Table 1.2 are arranged in either ascending, or, descending order of magnitude,
we get a better way of presentation, usually called an “array” (Table 1.3).

Now let us present the above data in the form of a simple (or, ungrouped) frequency distribution using the tally marks. A tally mark is an upward slanted stroke (/) which is put against a value each time it occurs in the raw data. The fifth occurrence of the value is represented by a cross tally mark (\) as shown across the first four tally marks.
Finally, the tally marks are counted and the total of the tally marks against each value is its
frequency.
Let us now represent the data in Table 1.3 as simple (or, ungrouped) frequency distribution.

Grouped Frequency Distribution:

The data in Table 1.3 can be further condensed by putting them into smaller groups, or, classes called “class-Intervals”. The number of items which fall in a class-interval is called its “class frequency”.
The tabulation of raw data by dividing the whole range of observations into a number of classes and indicating the corresponding class-frequencies against the class-intervals, is called “grouped frequency distribution”.
Let us now represent the data in Table 1.3 as grouped frequency distribution. We find that the lowest value is 56 and the highest value is 73. Thus for approximately 10 classes the difference of values between two consecutive classes will be 73-56/10 =17/10=1.7=2 and the nine class-intervals will be 56–57, 58–59, ..., etc. (Table 1.5).

Thus the steps in preparing the grouped frequency distribution are:
1. Determining the class intervals.
2. Recording the data using tally marks.
3. Finding frequency of each class by counting the tally marks.

Several Important Terms:

(a) Class-limits: The maximum and minimum values of a class-interval are called upper class limit and lower class-limit respectively. In Table 1.5 the lower class-limits of nine classes are 56, 58, 60, 62, 64, 66, 68, 70, 72 and the upper class-limits are 57, 59, 61, 63, 65, 67, 69, 71, 73.
(b) Class-mark, or, Mid-value: The class-mark, or, mid-value of the class-interval lies exactly at the middle of the class-interval and is given by:

(i) Class-mark, or, Mid-value = (lower class limit + upper class limit) /2
or, (lower class boundary + upper class boundary)/2
or, Lower class-limit + 1/2 (upper class-limit − lower class-limit)

(c) Class boundaries: Class boundaries are the true-limits of a class interval. It is associated with grouped frequency distribution, where there is a gap between the upper class-limit and the lower class-limit of the next class.

Class intervals Class Boundaries
Marks
56- 57(upper class limit) 55.5-57.5
(lower class limit)58⤢-59 57.5-59.5
60-61 59.5-61.5
62-63 61.5-63.5
64-65 63.5-65.5
66-67 65.5-67.5
68-69 67.5-69.5
70-71 69.5-71.5
72-73 71.5-73.5

The above common difference between the upper class-limit of a class-interval and the lower class limit of the next higher class interval i.e 58 – 57 = 1. This difference 1 is denoted as d.

Now the above class boundaries can be determined by using the following formula:

Lower class boundary = lower class-limit −1/2 d i.e lower class boundry = 56-1/2 (1)=55.5
Upper class boundary = upper class-limit +1/2 d i.e upper class boundary- 57+1/2(1)=57.5

The class-boundaries of the class-intervals of Table 1.5 will be 55.5 – 57.5; 57.5 – 59.5; 59.5 – 61.5; etc.,as above, since d = The class-boundaries convert a grouped frequency distribution (inclusive type) into a continuous frequency distribution.

Graphs

Graphs and charts can quickly convey to the reader the essential points or trends in the data. Graphs and charts are particularly useful when data are being presented to an audience, because information has to be conveyed in a limited time period.

There are some general common sense recommendations to follow when presenting

data:

i) The presentation should be as simple as possible. Avoid the trap of adding too much information. It is not the aim to include all the information you have but only a summary of the essential feature(s) you are tying to illustrate. A good rule of thumb is to only present one idea or to have only one purpose for each graph or chart you create.

ii) The presentation should be self-explanatory. A chart or graph is not serving its purpose if the reader cannot comprehend the legends or has to refer to the text in order to understand it. There is a careful balance between too much information which makes the graph or chart too complicated and too little information that makes the chart difficult to comprehend or worse misleading.

iii) The title should be clear, and concise indicating what?, when?, and where? the data were obtained.

iv) Codes, legends and labels should be clear and concise, following standard formats if possible.

v) The use of footnotes is advised to explain essential features of the data that are critical for the correct interpretation of the graph or chart.

Section-3

Analysis Of Data

In previous section we have discussed the 2nd phase i.e presentation of data in tabular form. Now this section will discuss basic technique of data analysis i.e study of central tendency means numarical value mostly fall in the centre of data and also represent the whole data. Central tendency means average value of the data. There are following common averages:

Arithmetic Mean
Mode
Median

The above averages will be computed by adopting any of the following method according to the nature of data i.e Ungrouped / Grouped Data:

Averages Nature of Data

Un-grouped data Grouped data

Arithmetic Mean ⅃ㄡ= ∑x / n ㄡ= ∑fx / ∑f

Mode Most frequently choose max frequency
occurring value pick corresponding value
(for discrete variable)

- L+(fm-f1) /(fm-f1)+(fm-f2)xh
(for continuous variable)

Median (n+1/2) th term L+h / f (n / 2-c)
(for discrete / (for continuous variable)
continuous
variable)

Please Note:

x denotes to the values of the variable X
n denotes number of values of X
f denotes frequency of different groups
L denotes lower class boundary
fm denotes frequency
f1 denotes previous frequency
f2 denotes next frequency
h denotes size of class
n/2 denotes ∑f / 2m

Examples:

Arithmetic Mean (Un-grouped data):

Example 1: Marks obtained by ten students of first year in economics are : 65, 60, 57, 40, 38, 50, 51, 44,62, 53 calculate mean

Sol:

lets marks denoted by x

520= ∑x

Arithmetic Mean : ㄡ= ∑x / n = 520/10=52

Arithmetic Mean (Grouped data):

Example 2: Weight of 125 students is given below, calculate A.M

Weight Frequency

95-100 7

100-105 17

105-110 29

110-115 35

115-120 22

120-125 15

Sol:

Weight Frequency * x (mid point) fx

95-100 7 97.5 682.5

100-105 17 102.5 1742.5

105-110 29 107.5 3117.5

110-115 35 112.5 3937.5

115-120 22 117.5 2585

120-125 15 122.5 1837.5

∑f = 125 - ∑fx= 13902.5

*x= mid point = (lower class + upper class) / 2 = (95+100)/2 = 97.5 and so on ......

Arithmetic Mean= A.M= ㄡ= ∑fx / ∑f = 13902.5/125=111.22

Mode (Un-grouped data):

Example 3: Daily wage received by 8 child labours: Rs. 20,25.50,30,35,50,35,50 calculate mode.

Sol: Most frequently occurring value = in above most occurring value is Rs. 50, so mode will be Rs. 50 per day.

Mode (Grouped data - for discrete variable):

Example 4: shoe shop sold ladies shoes ,fol are the shoe sizes and pairs sold, calculate mode:

Size of ladies shoe No. of pairs

5 20

5.5 35

6 15

4 18

Sol: choose max frequency pick corresponding value as mode

Size of ladies shoe(x) No. of pairs (f)

5 20

5.5 corresponding value ← 35 = Max frequency

6 15

4 18

Thus mode of lady shoes is 5.5

Mode (Grouped data - for continuous variable):

Example 5: Time taken by different labours to complete the task is given below, find mode

Time Labours

10-15 20

15-20 30

20-25 40

25-30 25

Sol:

Time (C.B) Labours

10-15 20

15-20 30 = f1

L↴
20 -25 =h= 25-20= 5 40 = fm

25-30 25 =f2

Mode= L+(fm-f1) /(fm-f1)+(fm-f2)xh = 20+(40-30)/ (40-30)+(40-25)x5= 20+2= 22 minutes

Median (Un-grouped data - for discrete / continuous variable):

Example 6: no.of brothers and sisters of the employees of the company noted below find median:

6,5,3,4,2,1,4,3,7,5,5

Sol: above data rearrange as array: 1,2,3,3,4,4,5,5,5,6,7

Median = (n+1/2) th term = (11+1/2) = 12/2=6th term in the arranged data that is 4.

Median (Grouped data - for continuous variable): Ages of the employees of a company noted below:

Age No.of Employees

Below 20 11

20-25 25

25-30 40

30-35 50

35-40 100

40-45 80

Above 45 60

Sol:

Age No.of Employees (f) commulative frequency (c.f)

Below 20 11 11

20-25 25 36

25-30 40 76

30-35 50 126= c

L↴
35-40 = h= 40-35=5 100 = f 226

40-45 80 306

Above 45 60 366
n= ∑f =366

n/2= ∑f/2=366/2=183 ,

Median = L+h / f (n / 2-c) = 35+5/100(183-126)= 35+5x57/100=35+2.85=37.85 years

Section-4

Interpretation Of Data

Data interpretation is the decision-making process that follows the analysis of collected data and drives future action. It is often the final stage of decision-making strategies.

Data interpretation relies on data gathered during the collection period to make informed decisions about how to proceed.

Hours of Sleep	Tally	Frequency
4	\|\|\|	3
5	\|\|\|	3
6	\|\|\|\|	4
7	\|\|\|\|\|	5
8	\|\|\|\|\|	5
9	\|\|\|\|\| \|	6
10	\|\|	2
11	\|	1

Basic Knowledge Of I - Com Subjects

Pages