Practical Data Analysis with JMP, Third Edition. Robert Carver
At this point, you might wisely pause and say, “Wait a second. Can there possibly be 432 countries in the East Asia and the Pacific region?” And you would be right. Remember that we have stacked data, with 13 rows representing 12 years of data devoted to each country. Therefore, there are 432/12 = 36 countries in the region.
Even though JMP handles the heavy computational or graphical tasks, always think about the data and its context and ask yourself if the results make sense to you.
Using the Data Filter to Temporarily Narrow the Focus
Because we know each country appears repeatedly in this data table, let’s choose just one year’s data to obtain a clearer picture of regional variation. We can specify rows to display in a graph by using the Data Filter. This is a tool that allows us to select rows that satisfy specific conditions such as only displaying data rows from the year 2010.
This chapter illustrates the use of the Data Filter to temporarily select rows in a data table for all active analyses. This is known as the global Data Filter. Alternatively, when you click the red triangles in most analysis reports, you will find a Script option with a local Data Filter that applies only to the current report. The local Data Filter is illustrated in later chapters, but curious readers should explore it at any time.
1. To see the effects of the Data Filter, we will instruct JMP to automatically update the graph and recalculate the frequencies. Click the red arrow next to Distributions and choose Redo ► Automatic Recalc.
2. Select Rows ► Data Filter. In the list of Columns, select year and click the Add button.
3. The dialog box takes on a new appearance (Figure 3.3). It now displays a list of years contained in the table. Near the top of the dialog box, check Show and Include so that only the rows that we select for 2010 will appear in all graphs and be included in any computations. Other rows will be hidden and excluded.
Figure 3.3: Choosing 2010 in the Data Filter
4. Scroll down the list of Year levels and highlight 2015. As noted in the dialog box, this selects 215 rows and temporarily suppresses the others.
5. Minimize the Data Filter. If you look in the data table of Life Expectancy, you will see that most rows now have two icons (
Using Graph Builder to Explore Categorical Data Visually
In Chapter 1, we met the Graph Builder, and we will use it throughout this book. It is most useful when working with multiple variables, but even with a single nominal variable, it provides a quick way to generate multiple views of the same data. Because interactivity is such an important feature of the tool, this section of the chapter provides few step-by-step directions. You should interact with the tool and think about the extent to which different graphing formats and options communicate the information content of the variable called region.
1. Select Graph ► Graph Builder. The region column identifies groups of countries. Drag it to the X drop zone.
Within Graph Builder, you can freely reposition a column from one drop zone to another. Hover the cursor over the column name until the cursor changes to the hand shape
With Region on the X axis, you will see seven clumps of black points above the seven region names. This is not very informative.
At the top of Graph Builder is a selector bar of icons (see Figure 3.4) representing different graph types. The graphing options available depend on the type of data we have placed on the graph. Hence, some icons are dimmed, but with Region on the X axis, we can opt for any of the highlighted option.
Figure 3.4: Graphing Options for a Nominal Column
2. Spend some time using different graphing formats. Which ones do you think do the best job of clearly and fully summarizing the number of countries within each region?
3. For this example, let’s use a bar chart (seventh option from the left). There is considerable research demonstrating that most people find this simple graph type easy to interpret accurately. Then click Done.
It is always good practice to help a reader by giving a graph an informative descriptive title. The default title “region,” though accurate, is not very helpful. In JMP, it is easy to alter the titles of graphs and other results.
4. Move your cursor to the title region just above the graph and double-click. You can now customize the title of this chart to make it more informative. Type Observations per Region, replacing “Chart” as the title.
5. We have done a bit of work on our project. Let’s save it now as Chap_03.
With most categorical data, JMP automatically reports values in alphabetical sequence1 (East Asia & Pacific, Europe & Central Asia, and so on). We can revise the order of values to suit our purposes as well. Suppose that we want to list the regions approximately from West to East, North to South. In that case, we might prefer a sequence as follows:
North America
Latin America & Caribbean
Europe & Central Asia
Middle East & North Africa
Sub-Saharan Africa
South Asia
East Asia & Pacific
To change the default sequence of categorical values (whether nominal or ordinal), we return to the Life Expectancy data table.
6. Select region from the data grid or the columns panel, right-click, and select Column Info.
7. Click Column Properties and select Value Order.
8. Select a value name and use the Move Up and Move Down buttons to revise the value order to match what we have chosen. Then, click OK.
Now return to Graph Builder and look at the bar chart. You will see that customizing the value order within the data table re-orders the X axis. The effect should speak for itself.
9. Experiment with the other charting options by clicking the red arrow and choosing Show Control Panel and then selecting various graph types.
With categorical data, your choices are limited. Still, it’s worth a few minutes to become familiar with them. When you are through exploring, restore the graphic to a bar chart and leave it open. We will return to this graph in a few pages.
Distribution of a Quantitative Variable
The standard graphing choices expand considerably when we have quantitative data—particularly for continuous variables or discrete variables with many possible values. We will want to summarize a large collection of values in a way that shows where observations tend to cluster.
As a way of visualizing the distribution of a continuous variable, the most commonly used graph is a histogram. A histogram is basically a bar chart with values of the variable on one axis and frequency on the other. Let’s illustrate.
In our data set, we have estimated life expectancy at birth for each country for 13 different years. We just used the Data Filter to isolate the data for 2015, so let’s continue to explore the state of the world in 2015.