Business Statistics


Business Statistics
“Quick refresher”





By
Muhammad Usman Qazi

This quick refresher is for beginners or those who need a quick refresher


Table of Contents



  • Bsic Concepts

    • What is Statistics?

    • Kinds of Statistics

    • Constant

    • Variable

  • Section 1 - Collection Of Data

  • Section 2 - Presentation Of Data

  • Section 3 - Analysis of Dsata

  • Section 4- Interpretation Of Data



Today statistics has become an important tool in the work of many academic disciplines such as Medicine, Psychology, Education, Sociology, Engineering and Physics, just to name a few.

Statistics is also important in many aspects of society such as Business, Industry and Government. Because of the increasing use of statistics  so many areas of our lives, it has become very desirable to understand and practice statistical thinking. This is an important  even if you don't use statistical method directly.

Here we are not going to discuss in detail about the statistics history and definitions in different era. We will study the statistics just go through the basic concepts


Basic Concepts:

Before discussing further, first we need to understand few basis concepts to develop better understanding about the subject.

What is Statistics?

The science of collection, presentation, analysis and interpretations of numerical data.ist

Kinds of Statistics

There are following two kinds of statistics:

  • Descriptive Statistics
  • Inferential Statistics

  • Descriptive Statistics: Descriptive statistics give information that describes the data in some various forms,e.g tables, graph, diagram and other tools which help describing the data.  
  • nferential Statisticsmakes inferences about populations using data drawn from the population. Instead of using the entire population to gather the data, the statistician will collect a sample or samples from the millions of residents and make inferences about the entire population using the sample.
          Note:This quick refresher is all about Descriptive Statistics.




Constant: is an identical value from person to person, place to place or time to time. variable can be classified of the following two classification:

Variable: if attains different values from person to person, place to place or time to time is called variable e.g weight of individuals, price of rice.
  • Qualitative variable:take on values that are names or labels. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of qualitative or categorical variables. 
  • Quantitative Variable: are numeric. They represent a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable, quantitative variable have following further two classification:
    • Discrete Variable: If a variable can not take on any value between its minimum value and its maximum value i.e counting process, it is called a discrete variable e.g numbers of family members i.e five brothers, three sisters, ten rupees..
    • Continuous Variable: If a variable can take on any value between its minimum value and its maximum value i.e ever possible value, it is called a continuous variable e.g height of individual in feet.







    The above def of Statistics (What is Statistics?) tells about following four phases of statistics, which need to be required in statistical examinations / investigation :

    1. Collection of data
    2. Presentation of data i.e collected data prensented into readable , understandable form.
    3. Analysis of data i,e presented data analysed
    4. Interpretation of data i.e determine the conclusion / findings

    The above phases will be discussed in following four sections in detail.



    Section-1

    Collection Of Data

    The first step in any enquiry (investigation) is collection of data. The data may be collected for the whole population or for a sample only. It is mostly collected on sample basis. Collection of data is very difficult job. The investigator is well trained person who collect the statistical data.

    There are two sources for the collection of data

    (1) Primary Data (2) Secondary Data

    (1) Primary Data:

    The primary data are the first hand information collected, compiled and published by organization for some purpose. They are most original data in character and have not undergone any sort of statistical treatment.

    Example: Population census reports are primary data because these are collected, complied and published by the population census organization.

    Methods of Collecting Primary Data:

    Primary data are collected by the following methods:

    1. Personal Investigation: The researcher conducts the survey him/herself and collects data from it. The data collected in this way is usually accurate and reliable. This method of collecting data is only applicable in case of small research projects.

    2. Through Investigation: Trained investigators are employed to collect the data. These investigators contact the individuals and fill in questionnaire after asking the required information. Most of the organizing implied this method.

    3. Collection through Questionnaire: The researchers get the data from local representation or agents that are based upon their own experience. This method is quick but gives only rough estimate.

    4. Through Telephone: The researchers get information through telephone this method is quick and give accurate information.

    (2) Secondary Data:

    The secondary data are the second hand information which are already collected by some one (organization) for some purpose and are available for the present study. The secondary data are not pure in character and have undergone some treatment at least once.

    Example: Economics survey of England is secondary data because these are collected by more than one organization like Bureau of statistics, Board of Revenue, the Banks etc…

    Methods of Collecting Secondary Data:

    The secondary data are collected by the following sources:

    1. Official: e.g. The publications of the Statistical Division, Ministry of Finance, the Federal Bureaus of Statistics, Ministries of Food, Agriculture, Industry, Labor etc
    2. Semi-Official: e.g. State Bank, Railway Board, Central Cotton Committee, Boards of Economic Enquiry etc…

    3. Publication of Trade Associations, Chambers of Commerce etc…

    4. Technical and Trade Journals and Newspapers.

    5. Research Organizations such as Universities and other institutions.


    Difference between Primary and Secondary Data:

    The difference between primary and secondary data is only a change of hand. The primary data are the first hand data information which is directly collected form one source. They are most original data in character and have not undergone any sort of statistical treatment while the secondary data are obtained from some other sources or agencies. They are not pure in character and have undergone some treatment at least once.

    For Example: Suppose we interested to find the average age of MS students. We collect the age’s data by two methods; either by directly collecting from each student himself personally or getting their ages from the university record. The data collected by the direct personal investigation is called primary data and the data obtained from the university record is called secondary data.

    Editing of Data:

    After collecting the data either from primary or secondary source, the next step is its editing. Editing means the examination of collected data to discover any error and mistake before presenting it. It has to be decided before hand what degree of accuracy is wanted and what extent of errors can be tolerated in the inquiry. The editing of secondary data is simpler than that of primary data.



    Section-2

    Presentation Of Data



    We have already discussed the first phase of statistical enquiry i.e collection of data either from primary or secondary source in previous section. After collection of data, it needs to be sorted in an easily understandable form. The next phase of a statistical enquiry is presentation of collected data The presentation of collected data usually use in the form of


    • Array
    • Tables
    •  Graphs. diagrams and charts.

    Array Form:
    In this form collected data will be presented into ascending or decending order of magnitude.

    Tables:
    Collected data will be presented into rows and columns. This is used for almost all subjects like Economics, Accounting etc. To manage data in this form, there are following two procedures which are interlinked with other:


    1. Classification
    2. Tabulation
    1) Classification:

    In classification collected data are distributed into different classes or groups according to their resemblances 
    in other words “the process of arranging things in groups or classes according to their resemblances and affinities and gives expression to the unity of attributes that may subsist amongst a diversity of individuals”.
    The raw data, collected in real situations and arranged haphazardly, do not give a clear picture.Thus to locate similarities and reduce mental strain we resort to classification. Classification condenses the data by dropping out unnecessary details. It facilitates comparison between different sets of data clearly showing the different points of agreement and disagreement. 
    During population census, people in the country are classified according to sex (males/females), marital status (married/unmarried), place of residence (rural/urban), Age (0–5 years, 6– 10 years, 11–15 years, etc.), profession (agriculture, production, commerce, transport, doctor, others) in differnet cities etc.
    Usually there are four following bases to classify data:

    • Spatial classification:
            Data are classified on the basis of locational i.e city wise prices of sugar.
    • Temporal classification
           Arrangement of data according to its time of occurrence i.e  yearly imports of Pakistan

    • Qualitative classification
          Data classified on the basis of some quality i.e colour of eyes, sex etc.
    • Quantitative classification
          Data classified discrete or continuous on the basis of magnitude i.e height of individual, numbers        of brothers
    2) Tabulation
    is the process of managing data into tabular form according to their classes i.e data in rows and columns 
     Tables are a standard method of presenting qualitative or categorical data, but they can also be used to summarize quantitative data (see Table 1). 

     For example: 

    Table 1. Birth Rate and Death Rate
    in different County for 2017  

                                   Country                    Birth Rate                  Death Rate    
                                       China                         33                                24
                                       Australia                     25                                14
                                       USA                           40                                10
                                      Japan                          35                                19

    Frequency Distribution:
    Frequency is how often something occurs.
    frequency can also be defined as how often something happens. For example, the number of dogs that people own in a neighborhood is a frequency.
    A distribution refers to the pattern of these frequencies. 
    So a frequency distribution looks at how frequently certain things happen within a sample of values.

    In other words " Frequency distribution is a tabular form of quantitative data into different classes along with the class frequencies i.e number of values in each class"
    In our example above, you might do a survey of your neighborhood to see how many dogs each household owns.

    Let’s say you obtain the following set of scores from your sample:

    1, 0, 1, 4, 1, 2, 0, 3, 0 2, 1, 1, 2, 0, 1, 1, 3

    The first step in turning this into a frequency distribution is to create a table. Label one column the items you are counting, in this case the number of dogs in households in your neighborhood.

    Next, create a column where you can tally the responses. Place a line for each instance the number occurs.

    Finally, total your tallies and add the final number to a third column.

    Number of Dogs in Household
    Tally
    Frequency
    0
    ||||
    4
    1
    ||||| ||
    7
    2
    |||
    3
    3
    ||
    2
    4 or more
    |
    1

    Using a frequency distribution, you can look for patterns in the data. Looking at the table above you can quickly see that out of the 17 households surveyed, 7 families had one dog while 4 families did not have a dog.

    Another Example of a Frequency Distribution

    For example, let’s suppose that you are collecting data on how many hours of sleep college students get each night.

    After conducting a survey of 29 of your classmates, you are left with the following set of scores:

    7, 5, 8, 9, 4, 10, 7, 9, 9, 6, 5, 11, 6, 5, 9, 9, 8, 6, 9, 7, 8, 4, 7, 8, 7, 6, 10, 4, 8

    In order to make sense of this information, you need to find a way to organize the data. A frequency distribution is commonly used to categorize information so that it can be interpreted quickly in a visual way.

    In our example above, the number of hours each week serves as the categories and the occurrences of each number are then tallied.

    The above information could be presented in a table:

    Hours of Sleep
    Tally
    Frequency
    4
    |||
    3
    5
    |||
    3
    6
    ||||
    4
    7
    |||||
    5
    8
    |||||
    5
    9
    ||||| |
    6
    10
    ||
    2
    11
    |
    1

    Looking at the table, you can quickly see that 7 people reported sleeping sleeping for 9 hours while only 3 people reported sleeping for 4 hours.

    Un grouped Data: Any untreated primary data is called un-grouped data.

    Grouped Data: Secondary data (frequency distribution form)  is called grouped data 

    Example (Un-grouped Data/Grouped Data):

    Let us consider the marks obtained by 100 students of a class in Economics.
    Table 1.2: Marks of 100 Students of a Class in Economics



    If the raw-data of Table 1.2 are arranged in either ascending, or, descending order of magnitude,
    we get a better way of presentation, usually called an “array” (Table 1.3).

    Now let us present the above data in the form of a simple (or, ungrouped) frequency distribution using the tally marks. A tally mark is an upward slanted stroke (/) which is put against a value each time it occurs in the raw data. The fifth occurrence of the value is represented by a cross tally mark (\) as shown across the first four tally marks.
    Finally, the tally marks are counted and the total of the tally marks against each value is its
    frequency.
    Let us now represent the data in Table 1.3 as simple (or, ungrouped) frequency distribution.

    Grouped Frequency Distribution: 

    The data in Table 1.3 can be further condensed by putting them into smaller groups, or, classes called “class-Intervals”. The number of items which fall in a class-interval is called its “class frequency”.
    The tabulation of raw data by dividing the whole range of observations into a number of classes and indicating the corresponding class-frequencies against the class-intervals, is called “grouped frequency distribution”.
    Let us now represent the data in Table 1.3 as grouped frequency distribution. We find that the lowest value is 56 and the highest value is 73. Thus for approximately 10 classes the difference of values between two consecutive classes will be 73-56/10 =17/10=1.7=2 and the nine class-intervals will be 56–57, 58–59, ..., etc. (Table 1.5).

    Thus the steps in preparing the grouped frequency distribution are:
    1. Determining the class intervals.
    2. Recording the data using tally marks.
    3. Finding frequency of each class by counting the tally marks.

    Several Important Terms:

    (a) Class-limits: The maximum and minimum values of a class-interval are called upper class limit and lower class-limit respectively. In Table 1.5 the lower class-limits of nine classes are 56, 58, 60, 62, 64, 66, 68, 70, 72 and the upper class-limits are 57, 59, 61, 63, 65, 67, 69, 71, 73.
    (b) Class-mark, or, Mid-value: The class-mark, or, mid-value of the class-interval lies exactly at the middle of the class-interval and is given by:


    (i) Class-mark, or, Mid-value = (lower class limit + upper class limit) /2
    or, (lower class boundary + upper class boundary)/2
    or, Lower class-limit + 1/2 (upper class-limit − lower class-limit)

    (c) Class boundaries: Class boundaries are the true-limits of a class interval. It is associated with grouped frequency distribution, where there is a gap between the upper class-limit and the lower class-limit of the next class.

                             Class intervals                              Class Boundaries
                                   Marks
                                 56-    57(upper class limit)                55.5-57.5
     (lower class limit)58-59                                            57.5-59.5
                                   60-61                                                59.5-61.5
                                   62-63                                                 61.5-63.5
                                   64-65                                                 63.5-65.5
                                   66-67                                                 65.5-67.5
                                   68-69                                                 67.5-69.5
                                   70-71                                                 69.5-71.5
                                    72-73                                                71.5-73.5

    The above common difference between the upper class-limit of a class-interval and the lower class limit of the next higher class interval i.e 58 – 57 = 1. This difference 1 is denoted as d.

    Now the above class boundaries can be determined by using the following formula:


    Lower class boundary = lower class-limit −1/2 d i.e lower class boundry = 56-1/2 (1)=55.5
    Upper class boundary = upper class-limit +1/2 d i.e upper class boundary- 57+1/2(1)=57.5

    The class-boundaries of the class-intervals of Table 1.5 will be 55.5 – 57.5; 57.5 – 59.5; 59.5 – 61.5; etc.,as above, since d =  The class-boundaries convert a grouped frequency distribution (inclusive type) into a continuous frequency distribution.

    Graphs

    Graphs and charts can quickly convey to the reader the essential points or trends in the data. Graphs and charts are particularly useful when data are being presented to an audience, because information has to be conveyed in a limited time period.  
    There are some general common sense recommendations to follow when presenting 
    data:

    i) The presentation should be as simple as possible. Avoid the trap of adding too much information. It is not the aim to include all the information you have but only a summary of the essential feature(s) you are tying to illustrate. A good rule of thumb is to only present one idea or to have only one purpose for each graph or chart you create.

    ii) The presentation should be self-explanatory. A chart or graph is not serving its purpose if the reader cannot comprehend the legends or has to refer to the text in order to understand it. There is a careful balance between too much information which makes the graph or chart too complicated and too little information that makes the chart difficult to comprehend or worse misleading.  

    iii) The title should be clear, and concise indicating what?, when?, and where? the data were obtained.

    iv)  Codes, legends and labels should be clear and concise, following standard formats if   possible.

      v) The use of footnotes is advised to explain essential features of the data that are critical for the correct interpretation of the graph or chart.  



    Section-3

    Analysis Of Data


    In previous section we have discussed the 2nd phase i.e presentation of data in tabular form. Now this section will discuss basic technique of data analysis i.e study of central tendency means numarical value mostly fall in the centre of data and also represent the whole data. Central tendency means average value of the data. There are following common averages:


    • Arithmetic Mean
    • Mode
    • Median
    The above averages will be computed by adopting any of the following method according to the nature of data i.e Ungrouped / Grouped Data:

    Averages                                                         Nature of Data


                                                     Un-grouped data                  Grouped data


    Arithmetic Mean                    ⅃ㄡ= ∑x / n                           ㄡ= ∑fx / ∑f



    Mode                                      Most frequently                    choose max frequency
                                                    occurring value                    pick corresponding value
                                                                                                   (for discrete variable)


                                                         -                                    L+(fm-f1) /(fm-f1)+(fm-f2)xh
                                                                                               (for continuous variable)

                                                                                                                   

    Median                                   (n+1/2) th term                      L+h  / f (n / 2-c)
                                                    (for discrete /                        (for continuous variable)
                                                     continuous
                                                    variable)                    
                                                                                                       
    Please Note:


    • x denotes to the values of the variable X
    • n denotes number of values of X
    • f denotes frequency of different groups
    • L  denotes lower class boundary
    • fm denotes frequency
    • f1 denotes previous frequency
    • f2 denotes next frequency
    • h denotes size of class 
    • n/2 denotes ∑f / 2m

    Examples:


    Arithmetic Mean (Un-grouped data):

    Example 1: Marks obtained by ten students of first year in  economics are : 65, 60, 57, 40, 38, 50, 51, 44,62, 53 calculate mean
    Sol:
    lets marks denoted by x

    x     
    65
    60
    57
    40
    38
    50
    51
    44
    62
    53
    520= ∑x
     Arithmetic Mean : ㄡ= ∑x / n =  520/10=52

    Arithmetic Mean (Grouped data):

    Example 2: Weight of 125 students is given below, calculate A.M

    Weight                      Frequency

    95-100                       7

    100-105                   17

    105-110                   29

    110-115                   35

    115-120                  22

    120-125                  15

    Sol:

    Weight                      Frequency                   * x (mid point)                       fx

    95-100                       7                                  97.5                                     682.5

    100-105                   17                                  102.5                                    1742.5

    105-110                   29                                   107.5                                   3117.5

    110-115                   35                                   112.5                                   3937.5

    115-120                  22                                    117.5                                   2585

    120-125                  15                                    122.5                                   1837.5
                          ∑f = 125                                    -                                ∑fx= 13902.5


    *x= mid point = (lower class + upper class) / 2 = (95+100)/2 = 97.5 and so on ......

    Arithmetic Mean= A.M= ㄡ= ∑fx / ∑f = 13902.5/125=111.22


    Mode (Un-grouped data):

    Example 3: Daily wage received by 8 child labours: Rs. 20,25.50,30,35,50,35,50 calculate mode.

    Sol:  Most frequently occurring value = in above most occurring value is Rs. 50, so mode will be Rs. 50 per day.
                                                 
    Mode (Grouped data -  for discrete variable):

    Example 4: shoe shop sold ladies shoes ,fol are the shoe sizes and pairs sold, calculate mode:

    Size of ladies shoe                  No. of pairs
                        

    5                                               20

    5.5                                            35

    6                                               15

    4                                               18 


    Sol:  choose max frequency pick corresponding value as mode
                                                  

    Size of ladies shoe(x)                  No. of pairs (f)
                        

    5                                                   20

    5.5   corresponding value  ←       35 = Max frequency

    6                                                   15

    4                                                   18 

    Thus mode of lady shoes is 5.5

    Mode (Grouped data -  for continuous variable):

    Example 5: Time taken by different labours to complete the task is given below, find mode


    Time                                 Labours 

    10-15                                   20
       
    15-20                                   30

    20-25                                  40

    25-30                                  25


    Sol:

    Time  (C.B)                               Labours 

       10-15                                         20
       
       15-20                                        30 = f1
    L↴
       20 -25 =h= 25-20= 5               40 = fm

       25-30                                       25  =f2



    Mode= L+(fm-f1) /(fm-f1)+(fm-f2)xh = 20+(40-30)/ (40-30)+(40-25)x5= 20+2= 22 minutes

    Median (Un-grouped data -  for discrete / continuous variable):

    Example 6: no.of brothers and sisters of the employees of the company noted below find median:

    6,5,3,4,2,1,4,3,7,5,5

    Sol: above data rearrange as array: 1,2,3,3,4,4,5,5,5,6,7

    Median = (n+1/2) th term = (11+1/2) = 12/2=6th term in the arranged data that is 4.

    Median (Grouped data -  for  continuous variable): Ages of the employees of a company noted below:

    Age                                                 No.of Employees

    Below 20                                        11

    20-25                                              25

    25-30                                              40     

    30-35                                              50

    35-40                                              100

    40-45                                              80

    Above 45                                       60

    Sol:

       Age                                                 No.of Employees (f)                     commulative frequency (c.f)

       Below 20                                        11                                                         11

       20-25                                              25                                                         36   

       25-30                                              40                                                         76   

       30-35                                              50                                                         126= c
    L↴
       35-40 =  h= 40-35=5                     100 = f                                                  226

      40-45                                              80                                                         306

      Above 45                                       60                                                          366
                                              n= ∑f =366



    n/2= ∑f/2=366/2=183 ,

    Median =   L+h  / f (n / 2-c) = 35+5/100(183-126)= 35+5x57/100=35+2.85=37.85 years


    Section-4

    Interpretation Of Data 

    Data interpretation is the decision-making process that follows the analysis of collected data and drives future action. It is often the final stage of decision-making strategies.

    Data interpretation relies on data gathered during the collection period to make informed decisions about how to proceed. 









    No comments:

    Post a Comment