Digital Forensic Science. Vassil Roussev
processes in the two loops can be classified into bottom-up (organizing data to build a theory) or top-down (finding data based on a theory). In practice, analysts apply these in an opportunistic fashion with many iterations.
Bottom-up Processes
Bottom-up processes are synthetic—they build higher-level (more abstract) representations of the information from more specific pieces of evidence.
• Search and filter: External data sources, hard disks, network traffic, etc., are searched for relevant data based on keywords, time constraints, and others in an effort to eliminate the vast majority of the data that are irrelevant.
• Read and extract: Collections in the shoebox are analyzed to extract individual facts and relationships that can support or disprove a theory. The resulting pieces of artifacts (e.g., individual email messages) are usually annotated with their relevance to the case.
• Schematize: At this step, individual facts and simple implications are organized into a schema that can help organize and help identify the significance and relationship among a growing number of facts and events. Timeline analysis is one of the basic tools of the trade; however, any method of organizing and visualizing the facts—graphs, charts, etc.—can greatly speed up the analysis. This is not an easy process to formalize, and most forensic tools do not directly support it. Therefore, the resulting schemas may exist on a piece of paper, on a whiteboard, or only in the mind of the investigator. Since the overall case could be quite complicated, individual schemas may cover only specific aspects of it, such as the sequence of events discovered.
• Build case: Out of the analysis of the schemas, the analyst eventually comes up with testable theories that can explain the evidence. A theory is a tentative conclusion and often requires more supporting evidence, as well as testing against alternative explanations.
• Tell story: The typical result of a forensic investigation is a final report and, perhaps, an oral presentation in court. The actual presentation may only contain the part of the story that is strongly supported by the digital evidence; weaker points may be established by drawing on evidence from other sources.
Top-down Processes
Top-down processes are analytical—they provide context and direction for the analysis of less structured data search and organization of the evidence. Partial, or tentative conclusions, are used to drive the search of supporting and contradictory pieces of evidence.
• Re-evaluate: Feedback from clients may necessitate re-evaluations, such as the collection of stronger evidence, or the pursuit of alternative theories.
• Search for support: A hypothesis may need more facts to be of interest and, ideally, would be tested against all (reasonably) possible alternative explanations.
• Search for evidence: Analysis of theories may require the re-evaluation of evidence to ascertain its significance/provenance, or may trigger the search for more/better evidence.
• Search for relations: Pieces of evidence in the file can suggest new searches for facts and relations on the data.
• Search for information: The feedback loop from any of the higher levels can ultimately cascade into a search for additional information; this may include new sources, or the reexamination of information that was filtered out during previous passes.
Foraging Loop
It has been observed [138] that analysts tend to start with a high-recall/low-selectivity query, which encompassed a fairly large set of documents—many more than the analyst can afford to read. The original set is then successively modified and narrowed down before the documents are read and analyzed.
The foraging loop is a balancing act between three kinds of processing that an analyst can perform—explore, enrich, and exploit. Exploration effectively expands the shoebox by including larger amounts of data; enrichment shrinks it by providing more specific queries that include fewer objects for consideration; exploitation is the careful reading and analysis of an artifact to extract facts and inferences. Each of these options has varying cost and potential rewards and, according to information foraging theory [141], analysts seek to optimize their cost/benefit trade-off.
Sense-making Loop
Sense-making is a cognitive term and, according to Klein’s [102] widely quoted definition, is the ability to make sense of an ambiguous situation. It is the process of creating situational awareness and understanding to support decision making under uncertainty; it involves the understanding of connections among people, places, and events in order to anticipate their trajectories and act effectively.
There are three main processes that are involved in the sense-making loop: problem structuring—the creation and exploration of hypotheses, evidentiary reasoning—the employment of evidence to support/disprove hypothesis, and decision making—selecting a course of action from a set of available alternatives.
Data Extraction vs. Analysis vs. Legal Interpretation
Considering the overall process from Figure 3.4, we gain a better understanding of the relationships among the different actors. At present, forensics researchers and tool developers primarily provide the means to extract data from the forensic targets (step 1), and the basic means to search and filter it. Although some data analytics and natural language processing methods (like entity extraction) are starting to appear in dedicated forensic software, these capabilities are still fairly rudimentary in terms of their ability to automate parts of the sense-making loop.
The role of the legal experts is to support the upper right corner of the process in terms of building/disproving legal theories. Thus, the investigator’s task can be described as the translation of highly specific technical facts into a higher-level representation and theory that explains them. The explanation is almost always tied to the sequence of actions of humans involved in the case.
In sum, investigators need not be software engineers but must have enough proficiency to understand the significance of the artifacts extracted from the data sources, and be able to competently read the relevant technical literature (peer-reviewed articles). Similarly, analysts must have a working understanding of the legal landscape and must be able to produce a competent report, and properly present their findings on the witness stand, if necessary.
1A more detailed definition and discussion of traditional forensics is beyond our scope.
CHAPTER 4
System Analysis
Modern computer systems in general use still follow the original von Neumann architecture [192], which models a computer system as consisting of three main functional units—CPU, main memory, and secondary storage—connected via data buses. This chapter explores the means by which forensic analysis is applied to these subsystems. To be precise, the actual investigative targets are the respective operating system modules controlling the different hardware subsystems.
System analysis is one of the cornerstones of digital forensics. The average user has very little understanding of what kind of information operating systems maintain about their activities, and frequently do not have the knowledge and/or privilege level to tamper with system records. In effect, this creates a “Locard world” where their actions leave a variety of traces that allow their actions to be tracked.
System analysis provides insight with a lot of leverage and high pay-offs in that, once an extraction, or analytical method, is developed, it can be directly applied to artifacts created by different applications.
4.1 STORAGE FORENSICS
Persistent storage in the form of hard disk drives (HDDs), solid state drives (SSDs), optical disks, external (USB-connected) media, etc., is the primary source of evidence for most digital forensic investigations. Although the importance of