Earth Observation Using Python. Rebekah B. Esmaili
steps are required to make the data usable for scientific analysis. This is particularly evident for data fusion, where two datasets with different resolutions must first be mapped to the same grid before they are compared. Many data users are not satellite scientists or professional programmers but rather members of other research and professional communities, these barriers can be too great to overcome. Even to a technical user, the nuances can be frustrating. At worst, obstacles in coding and data visualization can potentially lead to data misuse, which can tarnish the work of an entire community.
The purpose of this text is to provide an overview of the common preparatory work and visualization techniques that are applied to environmental satellite data using the Python language. This book is highly example‐driven, and all the examples are available online. The exercises are primarily based on hands‐on tutorial workshops that I have developed. The motivation for producing this book is to make the contents of the workshops accessible to more Earth scientists, as very few Python books currently available target the Earth science community.
This book is written to be a practical workbook and not a theoretical textbook. For example, readers will be able to interactively run prewritten code interactively alongside the text to guide them through the code examples. Exercises in each section build on one another, with incremental steps folded in. Readers with minimal coding experience can follow each “baby step” to get them up to become “spun up” quickly, while more experienced coders have the option of working with the code directly and spending more time on building a workflow as described in Section III.
The exercises and solutions provided in this book use Jupyter Notebook, a highly interactive, web‐based development environment. Using Jupyter Notebook, code can be run in a single line or short blocks, and the results are generated within an interactive documented format. This allows the student to view both the Python commands and comments alongside the expected results. Jupyter Notebook can also be easily converted to programs or scripts than can be executed on Linux Machines for high‐performance computing. This provides a friendly work environment to new Python users. Students are also welcome to develop code in any environment they wish, such as the Spyder IDE or using iPython.
While the material builds on concepts learned in other chapters, the book references the location of earlier discussions of the material. Within each chapter, the examples are progressive. This design allows students to build on their understanding knowledge (and learn where to find answers when they need guidance) rather than memorizing syntax or a “recipe.” Professionally, I have worked with many datasets and I have found that the skills and strategies that I apply on satellite data are fairly universal. The examples in this book are intended to help readers become familiar with some of the characteristic quirks that they may encounter when analyzing various satellite datasets in their careers. In this regard, students are also strongly encouraged to submit requests for improvements in future editions.
Like many technological texts, there is a risk that the solutions presented will become outdated as new tools and techniques are developed. The sizable user community already contributing to Python implies it is actively advancing; it is a living language in contrast to compiled, more slowly evolving legacy languages like Fortran and C/C++. A drawback of printed media is that it tends to be static and Python is evolving more rapidly than the typical production schedule of a book. To mitigate this, this book intends to teach fluency in a few, well‐established packages by detailing the steps and thought processes needed for a user needs to carry out more advanced studies. The text focuses discipline‐agnostic packages that are widely used, such as NumPy, Pandas, and xarray, as well as plotting packages such as Matplotlib and Cartopy.
I have chosen to highlight Python primarily because it is a general‐purpose language, rather than being discipline or task‐specific. Python programmers can script, process, analyze, and visualize data. Python’s popularity does not diminish the usefulness and value of other languages and techniques. As with all interpreted programming languages, Python may run more slowly compared to compiled languages like Fortran and C++, the traditional tools of the trade. For instance, some steps in data analysis could be done more succinctly and with greater computational efficiency in other languages. Also, underlying packages in Python often rely on compiled languages, so an advanced Python programmer can develop very computationally efficient programs with popular packages that are built with speed‐optimized algorithms. While not explicitly covered in this book, emerging packages such as Dask can be helpful to process data in parallel, so more advanced scientific programmers can learn to optimize the speed performance of their code. Python interfaces with a variety of languages, so advanced scientific programmers can compile computationally expensive processing components and run them using Python. Then, simpler parts of the code can be written in Python, which is easier to use and debug.
This book encourages readers to share their final code online with the broader community, a practice more common among software developers than scientists. However, it is also good practice to write code and software in a thoughtful and carefully documented manner so that it is usable for others. For instance, well‐written code is general purpose, lacks redundancy, and is intuitively organized so that it may be revised or updated if necessary. Many scientific programmers are self‐learners with a background in procedural programming, and thus their Python code will tend to resemble the flow of a Fortran or IDL program. This text uses Jupyter Notebook, which is designed to promote good programming habits in establishing a “digestible code” mindset; this approach organizes code into short chunks. This book focuses on clear documentation in science algorithms and code. This is handled through version control, using virtual environments, how to structure a usable README file, and what to include in inline commenting.
For most environmental science endeavors, data and code sharing are part of the research‐to‐operations feedback loop. “Operations” refers to continuous data collection for scientific research and hazard monitoring. By sharing these tools with other researchers, datasets are more fully and effectively utilized. Satellite data providers can upgrade existing datasets if there is a demand. Globally, satellite data are provided through data portals by NASA, NOAA, EUMETSAT, ESA, JAXA, and other international agencies. However, the value of these datasets is often only visible through scientific journal articles, which only represent a small subset of potential users. For instance, if the applications of satellite observations used for routine disaster mitigation and planning in a disadvantaged nation are not published in a scientific journal, improvements for disaster‐mitigation specific needs may never be met.
Further, there may be unexpected or novel uses of datasets that can drive scientific inquiry, but if the code that brings those uses to life is hastily written and not easily understood, it is effectively a waste of time for colleagues to attempt to employ such applications. By sharing clearly written code and corresponding documentation for satellite data applications, users can alert colleagues in their community of the existence of scientific breakthrough efforts and expand the potential value of satellite datasets within and beyond their community. Moreover, public knowledge of those efforts can help justify the versatility and value of satellite missions and provide a return on investment for organizations that fund them. In the end, the dissemination of code and data analysis tools will only benefit the scientific community as a whole.
1 A TOUR OF CURRENT SATELLITE MISSIONS AND PRODUCTS
There are thousands of datasets containing observations of the Earth. This chapter describes some satellite types, orbits, and missions, which benefit a variety of fields within Earth sciences, including atmospheric science, oceanography, and hydrology. Data are received on the ground through receiver stations and processed for use using retrieval algorithms. But the raw data requires further manipulation to be useful, and Python is a good choice for analysis and visualization of these datasets.
At present, there are over 13,000 satellite‐based Earth observations freely and openly listed on www.data.gov.