Fundamentals of Programming in SAS. James Blum
from Input Data 2.8.4—the City and MortgageStatus variables are truncated. This truncation occurs due to the default length of 8 assigned to character variables; therefore, SAS did not allocate enough memory to store the values in their entirety. Only the first five records are shown; however, further investigation reveals this truncation occurs for the variable State as well.
Output 2.8.4: Reading the 2005 Basic IPUMS CPS Data (Partial Listing).
Obs | Serial | State | City | CityPop | Metro | CountyFIPS | Ownership |
1 | 2 | Alabama | Not in i | 0 | 4 | 73 | Rented |
2 | 3 | Alabama | Not in i | 0 | 1 | 0 | Rented |
3 | 4 | Alabama | Not in i | 0 | 4 | 73 | Owned |
4 | 5 | Alabama | Not in i | 0 | 1 | 0 | Rented |
5 | 6 | Alabama | Not in i | 0 | 3 | 97 | Owned |
Obs | MortgageStatus | MortgagePayment | HHIncome | HomeValue |
1 | N/A | 0 | 12000 | 9999999 |
2 | N/A | 0 | 17800 | 9999999 |
3 | Yes, mor | 900 | 185000 | 137500 |
4 | N/A | 0 | 2000 | 9999999 |
5 | No, owne | 0 | 72600 | 95000 |
Program 2.8.4 uses the DSD option in the INFILE statement to change three default behaviors:
1. Change the delimiter to comma
2. Treat two consecutive delimiters as a missing value
3. Treat delimiters inside quoted strings as part of a character value and strip off the quotation marks
For Input Data 2.8.4, the first and third actions are necessary to successfully match the structure of the delimiters in the data since (a) the file uses commas as delimiters and (b) commas are included in the quoted strings in the data for the MortgageStatus variable. Because the file does not contain consecutive delimiters, the second modification has no effect.
Of course, it might be necessary to produce the second and third effects while using blanks—or any other character—as the delimiter. It is also often necessary to change the delimiter without making the other modifications included with the DSD option. In those cases, use the DLM= option to specify one or more delimiters by placing them in a single set of quotation marks, as shown in the following examples.
1. DLM = ‘/’ causes SAS to move to a new field when it encounters a forward slash
2. DLM = ‘, ‘ causes SAS to move to a new field when it encounters a comma
3. DLM = ‘,/’ causes SAS to move to a new field when it encounters either a comma or forward slash
Introduction to Variable Attributes
In SAS, the amount of memory allocated to a variable is called the variable’s length; length is one of several attributes that each variable possesses. Other attributes include the name of the variable, its position in the data set (1st column, 2nd column, ...), and its type (character or numeric). As with all the variable attributes, the length is set either by use of a default value or by explicitly setting a value.
By default, both numeric and character variables have a length of eight bytes. For character variables, one byte of memory can hold one character in the English language. Thus, the DATA step truncates several values of State, City, and MortgageStatus from Input Data 2.8.4 since they exceed the default length of eight bytes. For numeric variables, the default length of eight bytes is sufficient to store up to 16 decimal digits (commonly known as double-precision). When using the Microsoft Windows® operating system, numeric variables have a minimum allowable length of three bytes and a maximum length of eight bytes. Character variables may have a minimum length of 1 byte and a maximum length of 32,767 bytes. While there are many options and statements that affect the length of a variable implicitly, the LENGTH statement allows for explicit declaration of the length and type attributes for any variables. Program 2.8.5 demonstrates the usage of the LENGTH statement.
Program 2.8.5: Using the LENGTH Statement
data work.Ipums2005Basic;
length state $ 20 City$ 25 MortgageStatus$50;
infile RawData(“IPUMS2005basic.csv”) dsd;
input Serial State City CityPop Metro
CountyFIPS Ownership $ MortgageStatus$
MortgagePayment HHIncome HomeValue;
run;
proc print data = work.Ipums2005Basic(obs = 5);
run;
The LENGTH statement sets the lengths of State, City, and MortgageStatus to 20, 25, and 50 characters, respectively, with the dollar sign indicating these are character variables. Separating the dollar sign from the variable name or length value is optional, though good programming practices dictate using a consistent style to improve readability.
Type (character or numeric) is an attribute that cannot be changed in the DATA step once it has been established. Because the LENGTH statement sets these variables as character, the dollar sign is optional in the INPUT statement. However, good programming practices generally dictate including it for readability and so that removal of the LENGTH statement does not lead to a data type mismatch. (This would be an execution-time error.)
As in , the spacing between the dollar sign and variable name is optional in the INPUT statement as well. Good programming practices still dictate selecting a consistent spacing style.
Output 2.8.5 shows the results of explicitly setting the length of the State, City, and MortgageStatus variables. In addition to the lengths of these three variables changing, their column position in the SAS data set has changed as well. Variables are added to the data set based on the order they are encountered during compilation of the DATA step, so since the LENGTH statement precedes the INPUT statement, it has actually changed two attributes—length and position—for these three variables (while also defining the type attribute as character).
Output 2.8.5: Using the LENGTH Statement (Partial Listing)
Obs | state | City | MortgageStatus | Serial | CityPop | Metro | CountyFIPS |
1 | Alabama | Not in identifiable city | N/A | 2 | 0 | 4 | 73 |
2 | Alabama | Not in identifiable city | N/A | 3 | 0 | 1 | 0 |
3 | Alabama | Not in identifiable city | Yes, mortgaged/ deed of trust or similar debt | 4 | 0 | 4 | 73 |
4 | Alabama | Not in identifiable city | N/A | 5 | 0 | 1 | 0 |
5 | Alabama | Not in identifiable city | No, owned free and clear | 6 | 0 | 3 | 97 |
Obs | Ownership | MortgagePayment | HHIncome | HomeValue |
1 | Rented | 0 | 12000 | 9999999 |
2 | Rented | 0 | 17800 | 9999999 |
3 | Owned | 900 | 185000 | 137500 |
4 | Rented | 0 | 2000 | 9999999 |
5 | Owned | 0 | 72600 | 95000 |
Like the type attribute, SAS does not allow the position and length attributes to change after their initial values are set. Attempting to change the length attribute after the INPUT statement, as shown in Program 2.8.6, results in a warning in the Log.
Program 2.8.6: Using the LENGTH Statement After the INPUT Statement
data work.Ipums2005Basic;
infile RawData(“IPUMS2005basic.csv”) dsd;
input Serial State $ City $ CityPop Metro
CountyFIPS Ownership $ MortgageStatus $
MortgagePayment HHIncome HomeValue;
length state $20 City $25 MortgageStatus $50;