Number crunching

The very power of statistics has led some to dismiss them as damned lies. Robert Russell talks to Dr Ioannis Kyriakou about how to use them to analyse business performance.

Assessing divisional performance may be easy with limited data to look at, but finance executives comparing sizable sets of data – also known as populations – may want to consider using statistics to quickly carry out their analysis. To do this, it’s useful to have a quick reminder of some basic stats.

Measures of central tendency - mode, median and mean

A good start is finding the average of the data being looked at. When assessing one piece of datum against the entire population of data we need to have a midpoint against which we can determine its relative performance. We call this the measure of central tendency – a single value which is representative in some way of the data as a whole. Measures of central tendency in common use include the mode, the median and the mean.

Measures of spread - standard deviation and variance

Comparing a specific point of data to the mean of the entire population allows you to appraise the performance in terms of the average, but this reference is lacking in detail as the mean would not reveal the extent of range of the data. The standard deviation quantifies the spread of data around the mean; it is a useful measure to use as a quick idea of the spread of numbers and helps to better understand how the results fit around this mean. A low standard deviation will approach zero and imply a low spread of results around the mean; a large standard deviation would imply a wide spread of results around the mean along with a lack of conformity in results. The standard deviation is calculated by taking the square root of the variance, which is the weighted sum of the squared differences of the data points from the mean.

Finance executives comparing sizable sets of data may want to use statistics to quickly carry out their analysis

Dr Ioannis Kyriakou Finance & Management Magazine, July/August 2015

Distribution

Numbers in a population are said to be distributed across a spectrum of values – we call this pattern of results the distribution. This can take many forms, but we are focusing on continuous data sets. It would be unusual for any data sets that finance would be working with to fall within the definition of a “normal distribution”, but the concept creates a useful base for comparison. A distribution is considered to be “normal” if the median, mode and mean are all equal; another aspect to them is that you can say with absolute certainty that some 68% of the population’s values within a normal distribution will be within one standard deviation from the mean; 95% of the values will fall within two standard deviations; and 99.7% lie within three standard deviations, which gives the distribution a symmetrical pattern around the mean.

This sounds fine, but how does it help me when all my data sets fall outside the definition of a “normal population”? This is where Chebyshev’s theorem comes in.

Chebyshev's theorem

Chebyshev’s inequality is the most popular and often used theory in statistics. It promises that 1/x2 of the population will be outside x standard deviations from the mean in any data set, ie, ¼ of a data set will be within two standard deviations of the mean, 11% will be within three standard deviations of the mean, etc. This statistical fact enables a us to know with absolute certainty that in any data set, 11% of the data will always be outside three standard deviations of the mean. This rapidly identifies the weakest (and strongest) in a population. The beauty of this theory is that it applies to all data sets, irrespective of their standard deviation or distribution.

Applying this to, say, the profitability of businesses would enable us to produce a relative performance score enabling us to identify underperforming divisions. Businesses would be able to say with confidence that 89% of any population would lie inside three standard deviations of the mean. In an audit of output or performance-related pay exercise, a business would be able to better reassure itself of the range of output or to estimate the costs of a programme in advance. Excel can even work out all of these formulae – listed in the example below.

Example

Acorn Limited packages cement for retail sale in 50kg bags. Acorn audited all of its production during a 10-minute period from each production line and gathered the weights in a table as below.

The mean of these results is 49.965 – the Excel formula is @average(a:z) where a:z is the data set. The median score is 50.02 – the Excel formula is @median(a:z) – and the mode is 50.05. Like most data sets this is not a normal distribution.

All three are measures of the “average” weight of a bag of cement, and the company can choose which of them to publish. Reports to customers may want to include all three measures, but there would be nothing statistically wrong about quoting the average weight of each bag as being 50.02kg – the mean bag weight is marginally below 50kg.

We should look at the next step to discover more about the population.

The standard deviation is the square root of the average deviation from the mean, and this is 0.188981 for this population – the Excel formula @stdevp(a:z), where a:z is the data set. Please note that @stdev(a:z) will work out the standard deviation as if the set were only a sample and the result will be different. You should use @stdevp for full data sets.

The mean of 49.965 and the standard deviation of 0.189 (to three decimal points) imply that any weight between 49.776 and 50.154 would be within one standard deviation of the mean, and those between 49.587 and 50.343 would be within two standard deviations of the mean.

What we know with absolute certainty from Chebyshev’s theorem is that 75% of all bagged cement produced will be between 49.587kg and 50.154kg a bag.

The visual form can be seen in Figure 2. Please note that this example is limited for simplicity. We would normally expect any population to have thousands of data points.

Figure 1: Acorn Limited Audit — Figure 1

Figure 2: A series of values based on Acorn Limited Audit — Figure 2

Statistics refresher

MODE
This is the value seen most often in a population. For unordered qualitative (descriptive, non-numerical) data – such as the types of vehicles passing a census point – the mode is the only representative value. For continuous quantitative (numerical) data, which is that we will be using more frequently, the mode may be meaningless.

MEDIAN
Numerical data sets can be rearranged in order – normally ascending. The middle of the set is the median, and this point can represent the data set, although it is a physical mid-point in a re-ordered population.

MEAN
This is the numerical mid-point of a population and is normally referred to as the average. Although the mean and the median each define, in some sense, the centre of the data, the mean is sensitive to the magnitude of the values on either side of it, whereas the median is sensitive only to the number of values on either side of it. Unusually large or small values affect the mean more than the median.

About the author

Dr Ioannis Kyriakou is a senior lecturer in actuarial science, the Cass Business School, City University

Download PDF article

Finance & Management Magazine, Issue 234, July/August 2015

More support on business

Read our articles, eBooks, reports and guides on Financial management

Financial management hub Financial management eBooks

Previous article

Visualise this - Financial modelling and forecasting

Next article

Debunking sampling myths

Can't find what you're looking for?

The ICAEW Library can give you the right information from trustworthy, professional sources that aren't freely available online. Contact us for expert help with your enquiries and research.

Benefits of membership

Becoming a member

Pay fees and subscriptions

BenefitsPlus - savings for members

Support throughout your career

My online training file

Book an exam

Exam resources

Digital learning materials

Student Insights