Digital Forensic Science. Vassil Roussev
of the time was to provide locksmith and disaster recovery services. Accordingly, the main thrust of the efforts was reverse-engineering/brute-forcing of the (weak) application-level encryption techniques employed by various vendors, and filesystem “undelete,” which enabled the (partial) recovery of ostensibly deleted information.
2.2 GOLDEN AGE (1997–2007)
Around 1997, we saw the emergence of the first commercial tools, like EnCase Forensic, that specifically targeted law enforcement use cases and provided an integrated environment for managing a case. This marked the beginning of a decade (1997–2007) of rapid expansion of forensic capabilities, both commercial and open source, against a backdrop of growing use of the Internet for business transactions, and a relatively weak understanding and use of privacy and security mechanisms. Garfinkel refers to this period as a “Golden Age” of digital forensics [72]. During this time, we saw the establishment of the first academic conference—DFRWS (dfrws.org) in 2001—with the exclusive focus on basic and applied digital forensic research.
The most important source of forensic data during the period became local storage in the form of internal HDDs and removable media—CD/DVD and USB-attached flash/disk drives. This reflects an IT environment in which most computations were performed on workstations by standalone applications. Although the importance of the Internet was increasing dramatically, most of the related evidence could still be found in the local email client, or (web) browser caches. Thus, filesystem analysis and reconstruction (Section 4.1.4) became the main focus of forensic tools and investigations, and Carrier’s definitive book on the subject [23] can be seen as a symbol of the age.
At the time, RAM was not considered a worthy source of evidence and there were no analytical tools beyond grep
and hexdump
to make sense of it. This began to change in 2005 with the first DFRWS memory forensics challenge [53], which led to the development of a number of tools for Microsoft Windows (discussed in Section 4.2); a follow-up challenge [76] focused research efforts on developing Linux memory analysis tools.
Between 2004 and 2007 several technology developments hinted that a new chapter in the history of the field was getting started. In 2004, Google announced the Gmail service [79]; its main significance is to show that a web application can be deployed on an Internet scale. Web apps are an implementation of the software as a service (SaaS) delivery model in which the client device needs no application-specific installation locally; most of the computation is performed on the provider’s server infrastructure and only a small amount of user interface (UI) code is downloaded on the fly to manage the interaction with the user. Forensically, this is a big shift as most of the artifacts of interest are resident on the server side.
In 2006, Amazon announced its public cloud service [3], which greatly democratized access to large-scale computational resources. It suddenly became possible for any web app—not just the ones from companies with big IT infrastructure—to work at scale; there was no conceptual impediment for all software vendors to go the SaaS route. In practice, it took several years for this movement to become mainstream but, with the benefit of hindsight, it is easy to identify this as a critical moment in IT development.
In 2007, it was Apple’s turn to announce a major technology development—the first smartphone [6]; this was quickly followed by a Google-led effort to build a competing device using open source code, and the first Android device was announced in 2008 [183]. Mobile computing had been around for decades, but the smartphone combined a pocket-portable form factor with general purpose compute platform and ubiquitous network communication, to become—in less than a decade—the indispensible daily companion for the vast majority of people. Accordingly, it has become a witness of their actions, and a major source of forensic evidence.
2.3 PRESENT (2007–)
The current period is likely to be viewed as transitional. On the one hand, we have very mature techniques for analyzing persistent storage (Section 4.1) and main memory (Section 4.2) for all three of the main operating systems (OS) for the desktop/server environments—Microsoft Windows, MacOS, and Linux. Similarly, there are well-developed forensic tools for analyzing the two main mobile OS environments—Android and iOS.
On the other hand, we see exponential growth in the volume of forensic data in need of processing (Section 6.1) and the accelerating transition to cloud-centric IT (Section 4.6). As our discussion will show, the latter presents a qualitatively new target and requires a new set of tools to be developed. Separately, we are also seeing a maturing use of security and privacy techniques, such as encryption and media sanitization, that eliminate some traditional sources of evidence, and make access to others problematic.
It is difficult to predict what will be the event(s) that will mark the logical beginning of the next period, but one early candidate is the announcement of the AWS Lambda platform [9]. There is a broad consensus that the next major technology shift (over the next 10–15 years) will be the widespread adoption of the Internet of Things (IoT) [7]. It is expected that it will bring online between 10 and 100 times more Internet-connected devices of all kinds. The fast growing adoption of AWS Lambda as a means of working with these devices suggests that it could have a similar impact on the IT landscape to that of the original introduction of AWS.
Lambda provides a platform, in which customers write event-handling functions that require no explicit provisioning of resources. In a typical workflow, a device uploads a piece of data to a storage service, like AWS S3. This triggers an event, which is automatically dispatched to an instance of a user-defined handler; the result may be the generation of a series of subsequent events in a processing pipeline. From a forensic perspective, such an IT model renders existing techniques obsolete, as there is no meaningful data to be extracted from the embedded device itself.
2.4 SUMMARY
The main point of this brief walk through the history of digital forensics is to link the predominant forensic methods to the predominant IT environment. Almost all techniques in widespread use today are predicated on access to the full environment in which the relevant computations were performed. This started with standalone personal computers, which first became connected to the network, then became mobile, and eventually became portable and universally connected. Although each step introduced incremental challenges, the overall approach continued to work well.
However, IT is undergoing a rapid and dramatic shift from using software products to employing software services. Unlike prior developments, this one has major forensic implications; in simple terms, tools no longer have access to the full compute environment of the forensic target, which is a service hosted somewhere in a shared data center. Complicating things further is the fact that most computations are ephemeral (and do not leave the customary traces) and storage devices are routinely sanitized.
We will return to this discussion in several places throughout the text, especially in Chapter 6. For now, the main takeaway is that the future of forensics is likely to be different than its past and present. That being said, the bulk of the content will naturally focus on systematizing what we already know, but we will also point out the new challenges that may require completely new solutions.
CHAPTER 3
Definitions and Models
Forensic science is the application of scientific methods to collect, preserve, and analyze evidence related to legal cases. Historically, this involved the systematic analysis of (samples of) physical material in order to establish causal relationships among various events, as well as to address issues of provenance and authenticity.1 The rationale behind it—Locard’s exchange principle—is that physical contact between objects inevitably results in the exchange of matter leaving traces that can be analyzed to (partially) reconstruct