Windows 7

Microsoft Excel 2010 : Calculating the Mode (part 1)

2/6/2014 3:14:30 AM

The mean gives you a measure of central tendency by taking all the actual values in a group into account. The median measures central tendency differently, by giving you the midpoint of a ranked group of values. The mode takes yet another tack: It tells you which one of several categories occurs most frequently.

You can get this information from the FREQUENCY() function. But the MODE() function returns the most frequently occurring observation only, and it’s a little quicker to use. Furthermore, as you’ll see in this section, a little work can get MODE() to work with data on a nominal scale—that’s also possible with FREQUENCY() but it’s a lot more work.

Suppose you have a set of numbers in a range of cells, as shown in Figure 1. The following formula returns the numeric value that occurs most frequently in that range (in Figure1, the formula is entered in cell C1):

=MODE(A2:A21)

Figure 1. Excel’s MODE() function works only with numeric values.

The pivot chart in Figure 1 provides the same information graphically. Notice that the mode returned by the function in cell C1 is the same value as the most frequently occurring value shown in the pivot chart.

The problem is that you don’t usually care about the mode of numeric values. It’s possible that you have at hand a list of the ages of the people who live on your block, or the weight of each player on your favorite football team, or the height of each student in your daughter’s fourth grade class. It’s even conceivable that you have a good reason to know the most frequently occurring age, weight, or height in a group of people.

Among other purposes, numeric measures are good for recording small distinctions: Joe is 33 years old and Jane is 34; Dave weighs 230 pounds and Don weighs 232; Jake is 47 inches tall and Judy stands 48 inches. In a group of 18 or 20 people, it’s quite possible that everyone is of a different age, or a different weight or a different height. The same is true of most objects and numeric measurements that you can think of.

In that case, it is not plausible that you would want to know the modal age, or weight, or height. The mean, yes, or the median, but why would you want to know that the most frequently occurring age is 47 years, when the next most frequently occurring age is 46 and the next is 48?

The mode is seldom a useful statistic when the variable being studied is numeric and ungrouped. It’s when you are interested in nominal data, categories such as brands of cars or children’s given names or political preferences—that the mode is of interest. It’s worth noting that the mode is the only sensible measure of central tendency when you’re dealing with nominal data. The modal boy’s name for newborns in 2009 was Jacob; that statistic is interesting to some people in some way. But what’s the mean of Jacob, Michael, and Ethan? The median of Emma, Isabella, and Emily? The mode is the only sensible measure of central tendency for nominal data.

But Excel’s MODE() function doesn’t work with nominal data. If you present to it, as its argument, a range that contains exclusively text data such as names, MODE() returns the #N/A error value. If one or more text values are included in a list of numeric values, MODE() simply ignores the text values.

I’ll take this opportunity to complain that it doesn’t make a lot of sense for Excel to provide analytic support for a situation that seldom occurs (for example, caring about the modal height of a group of fourth graders) while it fails to support situations that occur all the time (“Which model of car did we sell most of last week?”).

Figure 2 shows a couple of solutions to the problem with MODE().

Figure 2. MODE() is much more useful with categories than with interval or ordinal scales of measurement.

In contrast to the pivot chart shown in Figure 1, where just one value pokes up above the others because it occurs twice instead of once, the frequency distribution in Figure 2 is more informative. You can see that Ford, the modal value, leads Toyota by a slim margin and GM by somewhat more. (This report is genuine and was exported to Excel by a used car dealer from a popular small business accounting package.)

To create a pivot chart that looks like the one in Figure 2, follow these steps:

1.
Arrange your raw data in an Excel list format: the field name in the first column (such as A1) and the values in the cells below the field name (such as A2:A21).

2.
Select a cell that has several empty columns to its right and several empty rows below it. This is to avoid overwriting any important data with the pivot table.

3.
Click the Ribbon’s Insert tab, and click the PivotTable drop-down in the Tables group. Choose PivotChart from the drop-down list. The dialog box shown in Figure 3 appears.

Figure 3. In this dialog box you indicate where Excel can find the underlying data set for the analysis, and where you want to pivot table to start.

4.
Identify the range that contains your raw data (refer to step 1) by dragging through it with your mouse pointer, by typing its range address, or by typing its name if it’s a named table or range. The location of the data should now appear in the Table/Range edit box. Click OK to get the layout shown in Figure 4.

Figure 4. The PivotTable Field List pane appears automatically.

5.
In the PivotTable Field List, drag the field or fields you’re interested in down from the list and into the appropriate area at the bottom. In this example, you would drag Make down into the Axis Fields area and also drag it into the Values area.

The pivot chart and the pivot table that the pivot chart is based on both update as soon as you’ve dropped a field into an area. If you started with the data shown in Figure 2, you should get a pivot chart that’s identical, or nearly so, to the pivot chart in that figure.

Note

Excel makes one of two assumptions, depending on whether the cell that’s active when you begin to create the pivot table contains data.

One, if you started by selecting an empty cell, Excel assumes that’s where you want to put the pivot table’s upper-left corner. Excel puts the active cell’s address in the Location edit box, as shown in Figure 3.

Two, if you started by selecting a cell that contains a value or formula, Excel assumes that cell is part of the source data for the pivot table or pivot chart. Excel finds the boundaries of the contiguous, filled cells and puts the resulting address in the Table/Range edit box.

A few comments on this analysis:

• The mode is quite a useful statistic when it’s applied to categories: political parties, consumer brands, days of the week, states in a region, and so on. Excel really should have a built-in worksheet function that returns the mode for text values. But it doesn’t, and the next section shows you how to write your own worksheet formula for the mode, one that will work for both numeric and text values.

• When you have just a few distinct categories, consider building a pivot chart to show how many instances there are of each. A pivot chart that shows the number of instances of each category is an appealing way to present your data to an audience. (There is no type of chart that communicates well when there are many categories to consider. The visual clutter obscures the message. In that sort of situation, consider combining categories or omitting some.)

• Standard Excel charts do not show the number of instances per category without some preliminary work. You would have to get a count of each category before creating the chart, and that’s the purpose of the pivot table that underlies the pivot chart. The pivot chart, based on the pivot table, is simply a faster way to complete the analysis than creating your own table to count category membership and then basing a standard Excel chart on that table.

• The mode is the only sensible measure of central tendency when you’re working with nominal data such as category names. The median requires that you rank order things in some way: shortest to tallest, least expensive to priciest, or slowest to fastest.