Fundamentals of Programming in SAS. James Blum
HHIncome;
run;
Output 1.8.1: Expected Result from Program 1.8.1 (Colors and Fonts May Differ)
Analysis Variable : HHINCOME Total household income | ||||||
MortgageStatus | N Obs | N | Mean | Std Dev | Minimum | Maximum |
N/A | 303342 | 303342 | 37180.59 | 39475.13 | -19998.00 | 1070000.00 |
No, owned free and clear | 300349 | 300349 | 53569.08 | 63690.40 | -22298.00 | 1739770.00 |
Yes, contract to purchase | 9756 | 9756 | 51068.50 | 46069.11 | -7599.00 | 834000.00 |
Yes, mortgaged/ deed of trust or similar debt | 545615 | 545615 | 84203.70 | 72997.92 | -29997.00 | 1407000.00 |
Case Study
For additional practice, multiple case studies are available in addition to the IPUMS CPS case study used in subsequent chapters. See Section 8.1 to apply the skills from this chapter to the Clinical Trials Case Study. For additional case studies, including extensions to the IPUMS CPS case study, see the author pages.
Chapter 2: Foundations for Analyzing Data and Reading Data from Other Sources
2.3 Getting Started with Data Exploration in SAS
2.3.1 Assigning Labels and Using SAS Formats
2.3.2 PROC SORT and BY-Group Processing
2.4 Using the MEANS Procedure for Quantitative Summaries
2.4.1 Choosing Analysis Variables and Statistics in PROC MEANS
2.4.2 Using the CLASS Statement in PROC MEANS
2.5.2 Permanent Storage and Inspection of Defined Formats
2.6 Subsetting with the WHERE Statement
2.7 Using the FREQ Procedure for Categorical Summaries
2.7.1 Choosing Analysis Variables in PROC FREQ
2.7.2 Multi-Way Tables in PROC FREQ
2.8.1 Introduction to Reading Delimited Files
2.8.3 Introduction to Reading Fixed-Position Data
2.9 Details of the DATA Step Process
2.9.1 Introduction to the Compilation and Execution Phases
2.9.2 Building blocks of a Data Set: Input Buffers and Program Data Vectors
2.11 Wrap-Up Activity
2.12 Chapter Notes
2.13 Exercises
2.1 Learning Objectives
At the conclusion of this chapter, mastery of the concepts covered in the narrative includes the ability to:
Apply the MEANS procedure to produce a variety of quantitative summaries, potentially grouped across several categories
Apply the FREQ procedure to produce frequency and relative frequency tables, including cross-tabulations
Categorize data for analyses in either the MEANS or FREQ procedures using internal SAS formats or user-defined formats
Formulate a strategy for selecting only the necessary rows when processing a SAS data set
Apply the DATA step to read data from delimited or fixed-position raw text files
Describe the operations carried out during the compilation and execution phases of the DATA step
Compare and contrast the input buffer and program data vector
Apply DATA step statements to assist in debugging
Apply the COMPARE procedure to compare and validate a data set against a standard
Use the concepts of this chapter to solve the problems in the wrap-up activity. Additional exercises and case-studies are also available to test these concepts.
2.2 Case Study Activity
This section introduces a case study that is used as a basis for most of the concepts and associated activities in this book. The data comes from the Current Population Survey by the Integrated Public Use Microdata Series (IPUMS CPS). IPUMS CPS contains a wide variety of information, only a subset of the data collected from 2001-2015 is included in the examples here. Further, the data used is introduced in various segments, starting with simple sets of variables and eventually adding more information that must be assembled to achieve the objectives of each section.
This chapter works with data that includes household-level information from the 2005 and 2010 IPUMS CPS data sets of over one million observations each. Included are variables on state, county, metropolitan area/city, household income, home value, mortgage status, ownership status, and mortgage payment. Outputs 2.2.1 through 2.2.4 show tabular summaries from the 2010 data, including quantitative statistics, frequencies, and/or percentages. Reproducing these tables in the wrap-up activity in Section 2.11 is the primary objective for this chapter.
The first sample output shown in Output 2.2.1 produces a set of six statistics on mortgage payments across metropolitan status for mortgages of $100 per month or more. In order to make this table, and the slightly more complicated Output 2.2.2, several components of the MEANS procedure must be understood.
Output 2.2.1: Basic Statistics on Mortgage Payments Grouped on Metropolitan Status
Analysis Variable : MortgagePayment Mortgage Payment | ||||
Metro | N | Mean | Median | Std |