Statistical Approaches for Hidden Variables in Ecology. Nathalie Peyrard
a uniform distribution over {1, . . . , J}.
– The transition distribution: in the case of a homogeneous Markov chain, the transition distribution is fully characterized by the matrix Π, of size J × J, of which each line is a probability vector.
– The emission distribution: the observation is taken to be a random variable, the distribution of which depends, via these parameters, on the activity. The nature of the distribution depends on the nature of the observations. Note that observations are considered to be independent, conditionally to Z.
This model is shown in the graphical form in Figure 1.3.
1.2.2.2. Choice of observation metric
This general framework offers many possibilities in terms of modeling. The subjacent activity may influence different aspects of the trajectory. For example, the trajectory of an individual looking for food will include multiple changes in direction. On the other hand, when an individual is traveling, in the context of migration, for example, its trajectory tends to be relatively straight with only minor changes in direction. In this example, changes in direction are strong activity markers.
Most of the metrics encountered in existing literature are based on the speed and direction of the animal in question.
Figure 1.3. Graphical model. For a color version of this figure, see www.iste.co.uk/peyrard/ecology.zip
Starting from the positions {Pt}t≥0 (with values in ℝ2 obtained at times 0, Δ, 2Δ, . . . ), the process of speeds {Vt}t≥1 is defined by
From these speeds, we can define the direction {ψt}t≥0 (with values in [−π, π[) as the angle between Vt and a reference vector (typically the vector (1, 0) pointing east). From these metrics, we deduce step length (or scalar speed) processes, denoted as {Lt}t≥1, and turning angles, denoted as{φt}t≥1 (with values in ] − π, π] using the convention φ1 = 0) as follows:
The step length and turning angle metrics were the first to be used in behavior, or activity, analysis based on HMMs (Morales et al. 2004) and have been widely used (Patterson et al. 2008). In this way, we obtain the model illustrated in Figure 1.3, where Yt is a bivariate vector of coordinates (Lt, φt) (often considered to be independent). One drawback to this method is the need to define an emission distribution, which is compatible with angles in order to model (φt). In practice, Von Mises (Jammalamadaka and Sengupta 2001) or Wrapped Cauchy distributions are the most widely used.
A different set of equivalent metrics may be used in order to avoid working with circular distributions, as proposed in Gurarie et al. (2009) and Gloaguen et al. (2015); these are persistence velocity
An observation Yt is thus a vector made up of these two components. As these components are signed, it is logical to model Yt using a bivariate normal distributions. Where relevant, this model allows the introduction of a dependency relationship between the two movement components, something which is difficult to achieve when selecting a couple (Lt, φt).
Ecological expertise concerning the effects of different activities on movement can also contribute to the choice of an appropriate metric. In the case study presented at the end of this chapter, the two classic metrics were used to illustrate the difference between the two approaches, in terms of both results and practical implementation.
Figure 1.4. Illustration of the quantities present in equations [1.5]–[1.8]. Pt denotes the successive positions occupied by the tracked individual. The series of speed vectors denoted as(Vt) and (Lt) denotes step length as defined by equation [1.5]. The series of directions is denoted as(Ψt), while (φt) is the series of turning angles as defined by equation [1.6]. For a color version of this figure, see www.iste.co.uk/peyrard/ecology.zip
1.2.2.3. Covariates inclusion
A further question concerns the extent to which activity is influenced by covariates (distance from a point of interest, time of day, etc.). One way of including covariates is to model their impact on the transition between activities (Calenge et al. 2009; Morales et al. 2004; Michelot et al. 2016).
For example, in the model presented here, ℙ (Zt = j|Zt−1 = i) is independent of t and takes a value of Π (i, j). Let us suppose that at each moment t, p covariates are measured and stored in a line vector xt. Transition probability can be linked to these variables according to a multiclass logistic regression approach:
The first equation indicates that the probability of switching to a different activity j from a current activity i is connected to external conditions via a linear combination of covariates at time t. β(i, j) is the column vector (of dimension p) of the coefficients corresponding to the influence of each covariate on this probability. The second equation is a constraint equation that ensures that the vector (ℙ (Zt = 1|Zt−1 = i)), . . . , ℙ (Zt = J|Zt−1 = i)) is a probability vector.
It is thus possible to take account of notions such as the fact that an individual will spend a longer period of time actively foraging in a location that is rich in food sources, while in a less favorable environment, it will rapidly switch to a traveling