Using Stata for Quantitative Analysis. Kyle C. Longest

Using Stata for Quantitative Analysis

FIGURE 1.7 • ORIGINAL DATA

Appending Data

As an example of the first combination scenario, consider that the current 10 cases were the first set of respondents who completed your survey. You may have begun your analyses assuming that these cases were the only ones who had decided to respond to the survey. But, as is often the case, after a few more weeks you notice that 10 new respondents submitted their survey information. You would of course want to include these tardy completers in your final analyses.

In this scenario, you would have a second data set of these 10 new cases, and it would look like that presented in Figure 1.8.

FIGURE 1.8 • NEW OBSERVATIONS WITH SIMILAR VARIABLES

As you can see, these new data contain the exact same variables as the original data, but the set has 10 new respondents, which is indicated by their 10 unique ids values. The type of data combination you would want to conduct in this situation is referred to as “appending” in Stata because you are adding new cases to an existing data set. Therefore the command to complete this data combination is –append-. Again, for the specifics of how to complete this command, see the Stata Help Files section of Chapter 8 to learn how to use the help file to teach yourself the full details of the –append- process. But to help you see what the end product is, Figure 1.9 displays what the data would look like if you used the –append- command to join the two example data sets.

FIGURE 1.9 • NEW APPENDED DATA SET

You can see that the new data set now contains 20 cases, which are the original 10 respondents plus the 10 more recent survey completers, and each has information on all of the same variables.

Merging Data

There are two typical variants for the second combination scenario. The first occurs when you conduct a follow-up survey on a similar set of cases, such as a pre-test–post-test model. Here you would want to include the new variables (i.e., post-test responses) to the initial data set from the pre-test. In this situation your new data would look like that presented in Figure 1.10.

FIGURE 1.10 • NEW VARIABLES FOR ORIGINAL RESPONDENTS

If you compare these new data to the original, you will notice that these are the same 10 cases, noted by their similar ids values. Also, none of their gender identifications have changed. But all of their ages have increased by a year (as if this follow-up survey was conducted 1 year after the initial survey), and several of their employment status and religion responses have changed. In this case, you would be looking to attach this new information to the original data to potentially examine the causes of why some respondents shift employment categories or religions, for example.

A second variant that involves the same data combination process would be where you would like to include new variables for existing cases that correspond to some other information about these cases. For example, perhaps you have a set of survey responses from adults who recently visited a hospital. You may want to bring in new variables that involve information about the particular hospital each case visited. Or following the example we have been using, you may want to bring in information about the religious denomination with which they affiliate. In this situation, your new data might look like those shown in Figure 1.11.

FIGURE 1.11 • DENOMINATION SPECIFIC DATA

In these data, you will notice that the information pertains to the particular religion, not the respondents. The variables therefore are information about how many total Baptists there are, or whether Mormonism would be considered an evangelical denomination. Of course, in a real situation, you could have a great deal more information about each denomination that may be useful in analyzing your survey data. Notice here that you do not have every denomination in this new data that is present in your original data. This situation can occur with this type of combination and will not cause a problem for Stata.

Both of these situations are referred to as “merging” in Stata because you are bringing in new information about the existing cases. As you may have guessed, then, the command to complete the combination is –merge-. One key difference in the two types of merges is what exactly you are merging on. Understanding this difference is the key to completing the merge correctly. In the first merge example, you would be adding new information about the cases, which means you would merge on the ids variable. It is the ids variable that links the original data to the new data. The second situation, however, would require that you merge on the religoth variable because it is the link between the two data sets. You may have realized that doing the latter means that several cases in your combined data will have the exact same values for the new denomination-based variables. That is, every respondent that identifies as Baptist will receive the exact same value for the totalmembers and evangelical variables. This commonality is exactly what you are looking for when you incorporate this type of information.

Once you have identified the variable that you will merge the two data sets with (i.e., which variable allows you to link to the two data sets), the –merge- command is relatively straightforward. Again, following along with the Stata Help Files section of Chapter 8 will help you understand exactly how to complete this combination for your particular needs. Again, it may be helpful here to see what the final product looks like to have a better sense of exactly what the –merge- command does and whether it may be what you need. Figures 1.12 and 1.13 display the final data after completing a merge first with the post-test data shown in Figure 1.10 and then completing a different merge with the denomination data from Figure 1.11.

FIGURE 1.12 • NEW MERGED DATA WITH NEW OBSERVATIONS

FIGURE 1.13 • NEW MERGED DATA WITH DENOMINATION INFORMATION

In this example, you can see that the final data set still contains the original 10 cases, but now the information from their follow-up survey is connected to their original responses. Again, some information (i.e., gender) has remained constant, whereas other data have altered as their lives have presumably changed.

In this merge example, the same original cases are present, but information that pertains to their response in the religoth variable is now included. Because the data set with information about each denomination did not include some of the particular denominations that the respondents reported, several cases now have missing information on these new variables. But the new information that is provided may be helpful in analyzing why belonging to specific denominations may be related to particular behaviors or trajectories.

Types of Variables in Data Files

At this point, you should feel comfortable with the basic structure of data files. Each row holds the information for one case and each column is a different variable. With this knowledge, you are almost ready to start analyzing your data. There is, however, one distinction in the types of variables included in data that is important to understand.

To help illustrate this difference, consider the NSYR variable gender in the Chapter 1 Data.dta file. This variable came from the following question asked of all respondents:

Are

Скачать книгу