Statistical Approaches for Hidden Variables in Ecology. Nathalie Peyrard
alt="image"/>
where
1.3. Case study: masked booby, Sula dactylatra (originals)
The data used in this section were collected by Sophie Bertrand (IRD), Guilherme Tavares (UFRGS), Christophe Barbraud and Karine Delord (CNRS). The authors wish to thank the IRD Tabasco JEAI (Jeune Equipe Associée Internationale) for permission to use these data.
Figure 1.5. Masked Booby (Sula dactylatra) Photo: Sophie Bertrand. For a color version of this figure, see www.iste.co.uk/peyrard/ecology.zip
1.3.1. Data
This case study concerns the behavior of the masked booby (Sula dactylatra).
The data used here consist of three trajectories of three masked boobies around the Ilha do Meion, an island in the Fernando de Noronha archipelago (Brazil) in the Atlantic Ocean (Figure 1.6). Positions were recorded by GPS, with data collected every 10 s.
Figure 1.6. Area of study (shown in red on the map) and three trajectories obtained by tracking three different red-footed boobies. Data were acquired in time increments of 10 s. For a color version of this figure, see www.iste.co.uk/peyrard/ecology.zip
1.3.2. Projection
The recorded data for the boobies were provided in the form of latitude and longitude measurements, that is, in terms of angles with respect to an origin point on the Earth’s surface. The methods presented earlier use a notion of distance (such as step length). While it is possible to calculate distances traveled over the Earth’s surface using latitude and longitude coordinates, this requires the use of specific formulas for movement on a sphere. Instead, data are often projected onto a plane, enabling the use of Euclidean distance. Due to the spherical nature of the globe, the actual projection used depends on the zone of interest2. In this case, projection is carried out using UTM coordinates for zone 25 – south. In R, the sf library may be used to facilitate geographical data processing (and, notably, projection).
1.3.3. Data smoothing
In this case, the frequency of data acquisition was high (one point every 10 s). While there were few errors in the data (obtained using GPS), the temporal proximity of observations may result in a somewhat erratic-looking trajectory. This erratic effect is even more pronounced in the movement metrics used to detect different activities.
To correct errors, let us take a Gaussian linear hidden Markov model, as described in section 1.2.1. Taking the equations in model [1.1], matrices A and B are taken as known and equal to the identity, while vectors μ and ν are known and equal to 0. Matrices Σm and Σo are presumed to be diagonal, but unknown. The unknown variables, represented the actual position, in this model are estimated using an EM algorithm from the MARSS package. The estimated parameters are then used to reconstruct the real trajectory by means of Kalman smoothing.
Figure 1.7 shows an example of trajectory smoothing. This smoothing process greatly reduces the irregularities present in the trajectory. It is important to note that we are now working with processed data. This transformation is shown here for illustrative purposes, although its relevance in this specific case is somewhat debatable.
Figure 1.7. Result of Kalman smoothing on part of the booby trajectories. Smoothing clearly removes many of the erratic aspects of the trajectory. For a color version of this figure, see www.iste.co.uk/peyrard/ecology.zip
1.3.4. Identification of different activities through movement
We shall begin by using a three-state model. The choice of the number of states in this case will be discussed later.
1.3.4.1. Definition of metrics
Our aim is to identify different activities within a trajectory. In this case, we wish to distinguish between foraging behaviors (associated with rapid changes in direction) and, for example, direct movement toward a point of interest, which results in a straighter trajectory. In this case, turning angle and step length appear to be the most relevant metrics. Biological knowledge concerning the movement of these birds supports the use of these metrics to distinguish between different behaviors.
In this example, we have chosen to adjust two models, which differ in the way in which they treat step length and turning angles (and thus in the associated emission distributions). The two pairs of metrics considered here are as follows:
– Step length and turning angle: a classic choice, as presented by Morales et al. (2004): the emission distributions in this case are a gamma distribution for step length and a circular (von Mises) distribution for angles. This model will be labeled length/angle in our figures.
– Bivariate velocity change metric (Gurarie et al. 2009): the emission distributions in this case are two independent normal distributions. This model will be labeled bivariate speed in our figures.
1.3.4.2. Defining the starting point of the algorithm
These models do not include any covariates, and the initial distribution will not be estimated. Each model is made up of 18 parameters (12 emission distribution parameters and six transition matrix parameters). Iterative optimization applied to a space of this type (such as the EM algorithm) may be affected by the chosen starting point. In both cases, the choice of a suitable starting point for the algorithm is crucial. One relatively generic approach involves a classification of k-averages (for the selected metrics). This rapid classification can be used to identify plausible parameters for different regimes; nevertheless, it is still important to ensure that the result obtained from the algorithm has not been affected by the choice of starting point.
1.3.5. Results
1.3.5.1. Characterization of hidden states
In the two packages used here, the parameters of the HMM are estimated using maximum likelihood, and the sequence of most probable hidden states is retraced using a Viterbi algorithm.
The hidden states in this model characterize the distribution of speeds and turning angles. In terms of trajectories, this implies that a hidden state characterizes a segment between two positions (10 s apart, in this case). States on the trajectory are thus represented on these segments. In this unsupervised classification approach, the labels assigned to hidden states (interpreted as behaviors)