Digital Forensic Science. Vassil Roussev
or it could be a massive reverse engineering effort, if a (near-)complete picture is needed.
The hypotheses in part (c) concern the rules defining the mappings between higher-level and lower-level events. Identifying these rules is an inherently difficult task, and Carrier proposed only one type of technique with very limited applicability—development tool and process analysis. It analyzes the programming tools and development process to determine how complex events are defined.
Complex state and event definition. This category of techniques defines the complex states that existed (hcs) and the complex events that occurred (hce). It includes eight classes of analysis techniques and each has a directional component (Figure 3.3). Two concern individual states and events, two are forward- and backward-based, and four are upward- and downward-based.
Complex state and event system capabilities methods use the capabilities of the complex system to formulate and test state and event hypotheses based on what is possible. The main utility of this approach is that it can show that another hypothesis is impossible because it is outside of the system’s capabilities.
Complex state and event sample data techniques use sample data from observations of similar systems or from previous executions. The results include metrics on the occurrence of events and states and would show which states and events are most likely. This class of techniques is employed in practice in an ad hoc manner; for example, if a desktop computer is part of the investigation, an analyst would have a hypothesis about what type of content might be present.
Complex state and event reconstruction methods use a state to formulate and test hypotheses about the previous complex event and state. This approach is frequently employed, although the objective is rarely to reconstruct the state immediately preceeding a known one, but an earlier one. Common examples include analyzing web browser history, or most recently used records to determine what the user has recently done.
Figure 3.3: The classes of analysis techniques for defining complex states and events have directional components to them. [27].
Complex state and event construction techniques use a known state to formulate and test hypotheses about the next event and state. Similarly to the corresponding techniques at the primitive level, complex-level construction techniques are rarely used to define the event and the immediately following state. Instead, they are employed to predict what events may have occurred. For example, the content of a user document, or an installed program, can be the basis for a hypothesis on what other events and states may have occured afterward.
The final four classes of methods either abstract low-level data and events to higher-level ones, or perform the reverse—materialize higher-level data and events to lower levels. Data abstraction is a bottom-up approach to define complex storage locations (data structures) using lower-level data and data abstraction transformation rules. For example, given a disk volume, we can use knowledge about the filesystem layout to transform the volume into a set of files.
Data materialization is the reverse of data abstraction, transforming higher-level storage locations into lower-level ones using materialization rules, and has limited practical applications.
Event abstraction is the bottom-up approach to define complex events based on a sequence of lower-level events and abstraction rules. This has limited applicability to practice because low-level events tend to be too many to log; however, they can be used in the process of analyzing program behavior.
Event materialization techniques are the reverse of event abstraction, where high-level events and materialization rules are used to formulate and test hypotheses about lower-level complex and primitive events. For example, if a user is believed to have performed a certain action, then the presence, or absence, of lower-level traces of their action can confirm, or disprove, the hypothesis.
3.3.3 COGNITIVE TASK MODEL
The differential analysis technique presented in Section 3.3.1 is a basic building block of the investigative process, one that is applied at varying levels of abstraction and to a wide variety of artifacts. However, it does not provide an overall view of how forensic experts actually perform an investigation. This is particularly important in order to build forensic tools that properly support the cognitive processes.
Unfortunately, digital forensics has not been the subject of any serious interest from cognitive scientists and there have been no coherent efforts to document forensic investigations. Therefore, we adopt the sense-making process originally developed by Pirolli and Card [142] to describe intelligence analysis—a cognitive task that is very similar to forensic analysis. The Pirolli–Card cognitive model is derived from an in-depth cognitive task analysis (CTA), and provides a reasonably detailed view of the different aspects of an intelligence analyst’s work. Although many of the tools are different, forensic and intelligence analysis are very similar in nature—in both cases analysts have to go through a mountain of raw data to identify (relatively few) relevant facts and put them together in a coherent story. The benefit of using this model is that: (a) it provides a fairly accurate description of the investigative process in its own right, and allows us to map the various tools to the different phases of the investigation; (b) it provides a suitable framework for explaining the relationships of the various models developed within the area of digital forensics; and (c) it can seamlessly incorporate into the investigation information from other sources.
The overall process is shown in Figure 3.4. The rectangular boxes represent different stages in the information-processing pipeline, starting with raw data and ending with presentable results. Arrows indicate transformational processes that move information from one box to another. The x axis approximates the overall level of effort to move information from raw to the specific processing stage. The y axis shows the amount of structure (with respect to the investigative process) in the processed information for every stage. Thus, the overall trend is to move the relevant information from the lower left to the upper right corner of the diagram. In reality, the processing can both meander through multiple iterations of local loops and jump over phases (for routine cases handled by an experienced investigator).
Figure 3.4: Notional model of sense-making loop for analysts derived from cognitive task analysis [185, p. 44].
External data sources include all potential evidence sources for the specific investigation, such as disk images, memory snapshots, network captures, as well as reference databases, such as hashes of known files. The shoebox is a subset of all the data that has been identified as potentially relevant, such as all the email communication between two persons of interest. At any given time, the contents of the shoebox can be viewed as the analyst’s approximation of the information content potentially relevant to the case. The evidence file contains only the parts that directly speak to the case, such as specific email exchanges on topics of interest.
The schema contains a more organized version of the evidence, such as a timeline of events, or a graph of relationships, which allows higher-level reasoning over the evidence. A hypothesis is a tentative conclusion that explains the observed evidence in the schema and, by extension, could form the final conclusion. Once the analyst is satisfied that the hypothesis is supported by the evidence, the hypothesis turns into a presentation, which is the final product of the process. The presentation usually takes on the form of an investigator’s report that both speaks to the high-level conclusions relevant to the legal case, and also documents the low-level technical steps based on which the conclusion has been formed.
The overall analytical process is split into two main activity loops: a foraging loop that involves actions taken to find potential sources of information, query them, and filter them for relevance; and a sense-making loop in which the analyst develops—in an iterative fashion—a conceptual model that is