Introduction to Spatiotemporal Kriging: A Powerful Tool for Space-Time Data Imputation

Introduction to Spatiotemporal Kriging: A Powerful Tool for Space-Time Data Imputation

In many real-world applications, data are collected across both space and time, think of rainfall measurements across a network of weather stations over several months, or air pollution levels recorded hourly at various urban locations. Analyzing such data requires methods that can account for both spatial and temporal dependencies. This is where spatiotemporal kriging comes in, a geostatistical technique designed to interpolate or predict missing values in datasets that vary across space and time.

In this article, I will guide you step by step through the process of performing spatio-temporal kriging for imputing missing values.

Consider a spatiotemporal process denoted as:

${Z (s, t) : (s, t) \in D \subseteq R^{d} \times R}$

Here, $\mathbf{s} \in \mathbb{R}^d$ , with $d = 2$ , represents the spatial location (typically latitude and longitude), and $t \in R denotes time.$

This formulation allows us to explore how a variable of interest changes both geographically and temporally.

To perform spatiotemporal kriging for missing data imputation, two main strategies are commonly employed:

Direct Method: A variogram is directly fitted to the original data, and kriging is applied to estimate the missing values. This method treats the raw observations as a realization of the underlying spatiotemporal process.
Residual-Based Method: First, a multiple linear regression model is fitted to explain the variation in the data using relevant covariates. Then, the variogram is constructed using the residuals from this model. Kriging is performed on these residuals, and the final predictions are obtained by combining the regression fit with the kriged residuals.

These approaches offer flexibility in modeling complex space-time patterns, especially in environmental and geophysical studies. In the sections that follow, we'll delve deeper into how spatiotemporal kriging works, how variograms are constructed, and how these methods perform in practical applications.

1. Directly Fitting the Variogram to the Original Data

In the context of space-time modeling, rainfall can be described as a spatiotemporal process:

$\{Z(\mathbf{s}, t): (\mathbf{s}, t) \in D \subseteq \mathbb{R}^d \times \mathbb{R}\}$

where $\mathbf{s} \in \mathbb{R}^d$ (with $d = 2$ ) represents the spatial dimensions, typically latitude and longitude, and $t \in R represents time.$

This formulation allows us to model how rainfall varies not just geographic locations, but also over time.

To capture the dependence structure of this spatiotemporal process, we use a spatiotemporal variogram. The covariance is defined as the expected squared difference between observations separated by spatial and temporal lags $(h_s, h_t)$

$Cov (h_{s}, h_{t}) = E [(Z (s, t) - Z (s + h_{s}, t + h_{t}))^{2}] (1)$

From this, the experimental semi-variogram is defined as half of the covariance:

$γ_{s t} (h_{s}, h_{t}) = \frac{1}{2} E [(Z (s, t) - Z (s + h_{s}, t + h_{t}))^{2}] (2)$

Here, the spatial lag $h_s = \mathbf{s} - \mathbf{s}'$ and temporal lag $h_t = t - t'$ represent the separation between any two space-time points $(\mathbf{s}, t)$ and $(\mathbf{s}', t')$ within the domain $D$ .

To estimate the semi-variogram empirically, we compute:

${\hat{γ}}_{s t} (r_{s}, r_{t}) = \frac{1}{2 ∣ L (r_{s}, r_{t}) ∣} \sum_{L (r_{s}, r_{t})} [Z (s + h_{s}, t + h_{t}) - Z (s, t)]^{2} (3)$

where $L(r_s, r_t)$ is the set of all point pairs that are approximately $r_s$ spatial units and $r_t$ temporal units apart, and $|L(r_s, r_t)|$ is the number of such pairs.

Once the empirical variogram is estimated, the next step is to fit a theoretical variogram model. There are several options available for selecting a theoretical variogram model depending on the characteristics of the spatial or spatiotemporal data. Some commonly used models include:

1. Separable Model

This model assumes that spatial and temporal dependencies can be modeled separately and then combined multiplicatively.

Covariance Function:

$C_{sep} (h, u) = C_{s} (h) \cdot C_{t} (u)$

Variogram:

$γ_{sep} (h, u) = sill \cdot ({\overset{ˉ}{γ}}_{s} (h) + {\overset{ˉ}{γ}}_{t} (u) - {\overset{ˉ}{γ}}_{s} (h) {\overset{ˉ}{γ}}_{t} (u))$

2. Product-Sum Model

Allows for interaction between space and time, combining spatial, temporal, and joint components.

Covariance Function:

$C_{\text{ps}}(h, u) = k \cdot C_s(h)C_t(u) + C_s(h) + C_t(u)$

Variogram:

$γ_{ps} (h, u) = (k \cdot {sill}_{t} + 1) γ_{s} (h) + (k \cdot {sill}_{s} + 1) γ_{t} (u) - k γ_{s} (h) γ_{t} (u)$

3. Metric Model

Treats space and time on equal footing using a metric distance, after scaling time to spatial units.

Metric Distance:

$d = \sqrt{h^{2} + (κ u)^{2}}$

Variogram:

$γ_{m} (h, u) = γ_{joint} (\sqrt{h^{2} + (κ u)^{2}})$

4. Sum-Metric Model

Combines individual spatial, temporal, and joint components additively.

Covariance Function:

$C_{\text{sm}}(h, u) = C_s(h) + C_t(u) + C_{\text{joint}}\left( \sqrt{h^2 + (\kappa u)^2} \right)$

Variogram:

$γ_{sm} (h, u) = γ_{s} (h) + γ_{t} (u) + γ_{joint} (\sqrt{h^{2} + (κ u)^{2}})$

5. Simple Sum-Metric Model

A restricted version of the sum-metric model with a single spatio-temporal nugget and nugget-free spatial, temporal, and joint components.

Variogram:

$γ_{ssm} (h, u) = nug \cdot 1_{h > 0 \lor u > 0} + γ_{s} (h) + γ_{t} (u) + γ_{joint} (\sqrt{h^{2} + (κ u)^{2}})$

In this article, I use the metric model to explain the procedure further, which assumes that the spatial and temporal dependencies can be combined using a metric distance:

$γ_{metric} (h_{s}, h_{t}) = γ_{joint} (\sqrt{h_{s}^{2} + (k \cdot h_{t})^{2}}) (4)$

Here, $k$ is a scaling parameter that adjusts the influence of temporal lag relative to spatial lag, and $\gamma_{\text{joint}}$ is a suitable isotropic variogram model (such as spherical, exponential, or Gaussian).

Once the model parameters (including $k$ ) are estimated by minimizing the mean squared error (MSE) between the empirical and fitted variograms, we can proceed with spatiotemporal kriging to predict missing values.

The kriging predictor for an unobserved space-time point $(\mathbf{s}, t)$ is a weighted sum of the known data values:

$\hat{Z} (s, t) = \sum_{i = 1}^{n} λ_{i} Z (s_{i}, t_{i}) (5)$

where $\lambda_i$ are the kriging weights, determined by solving the following system of equations derived using Lagrange multipliers:

$\sum_{i=1}^{n} \lambda_i \gamma\left[(\mathbf{s}_i, t_i) - (\mathbf{s}_j, t_j)\right] + \mu = \gamma\left[(\mathbf{s}_j, t_j) - (\mathbf{s}_0, t_0)\right] \quad \text{(11)}$ $\sum_{i=1}^{n} \lambda_i = 1 \quad \text{(7)}$

Here, $(\mathbf{s}_0, t_0)$ is the target point for prediction, $\mu$ is the Lagrange multiplier ensuring unbiasedness, and $n$ is the number of neighboring points used for kriging. Solving this system provides the optimal weights $\lambda_i$ that minimize prediction error.

Fitting the Variogram to Residuals from the Multiple Linear Regression Model

The spatio-temporal variation $Z(\mathbf{s}, t)$ , as introduced above, is modeled by decomposing it into a deterministic component $m(\mathbf{s}, t)$ and a stochastic residual component $\varepsilon(\mathbf{s}, t)$ , as expressed in Equation (8):

$\begin{matrix} Z (s, t) = m (s, t) + ε (s, t) & (8) \end{matrix}$

This formulation assumes that the process $Z(\mathbf{s}, t)$ possesses both first- and second-order moments. The deterministic component $m(\mathbf{s}, t) = \mathbb{E}[Z(\mathbf{s}, t)]$ represents the expected value of the process and is estimated using a multiple linear regression model based on ordinary least squares (OLS).

The residual term $\varepsilon(\mathbf{s}, t)$ captures the unexplained variability in the data and is assumed to consist of three mutually independent and second-order stationary components:

A spatial component,
A temporal component, and
A spatio-temporal interaction component.

It is also assumed that these components are spatially isotropic.

The deterministic part of the model, representing the trend, is fitted using the following OLS formulation:

$\begin{matrix} \hat{Z} (s, t) = \sum_{k = 0}^{p} {\hat{β}}_{k} X_{k} (s, t) & (9) \end{matrix}$

where $X_k(\mathbf{s}, t)$ are the covariates (predictors), and $\hat{\beta}_k$ are the estimated regression coefficients. This modeling approach incorporates spatial, temporal, and spatio-temporal variability through the selected covariates.

While the regression model can effectively capture broad trends in the data, it may not fully account for all the underlying spatial and temporal complexities. The residuals, therefore, may still exhibit structured spatio-temporal dependencies.

To capture these dependencies, a spatio-temporal variogram is estimated from the regression residuals. This step allows the characterization of remaining correlation structures not explained by the deterministic model. The estimated variogram of the residuals is then used in a kriging framework to interpolate the residual field, following the same procedure outlined in the section ''Directly Fitting the Variogram to the Original Data''.

This two-stage modeling approach, trend estimation via regression followed by kriging of residuals, enables more accurate and nuanced spatial and temporal predictions by combining deterministic and stochastic modeling strategies.

Interpretation of the Sample Variogram and Variogram Parameters

The Below figure presents the sample variogram, which is computed based on Equation (2). This empirical variogram illustrates how spatial and temporal correlations in the data decay with increasing separation in space and time.

On the variogram plot, the axes represent separation distances; typically, the horizontal axes correspond to spatial distances (in kilometers) and temporal lags (in units of time). According to spatial theory, the greater the distance between pairs of points, the lower the correlation between their values. The same principle applies in the temporal dimension: as the time gap between observations increases, the temporal correlation decreases.

The color gradient in the sample variogram plot indicates the magnitude of the semi-variance, which measures dissimilarity between point pairs. Cooler colors (such as blue) indicate lower semi-variance, meaning that those points are highly correlated in both space and time. Conversely, warmer colors (such as red) suggest higher dissimilarity, indicating lower correlation.

After constructing the sample variogram, the next step is to fit a theoretical variogram model that best captures the structure observed in the empirical variogram. The figure below illustrates such a model and highlights several key parameters that need to be estimated. These parameters play an essential role in defining the shape and behavior of the variogram:

The Figure above shows an experimental variogram with a variogram model fitted to it. Each red square is a lag of the experimental variogram. The x-axis represents the distance between pairs of points, and the y-axis represents the calculated value of the variogram, where a greater value indicates less correlation between pairs of points.

Nugget: The nugget is the y-intercept of the variogram. In practical terms, the nugget represents the small-scale variability of the data. A portion of that short-range variability can be the result of measurement error.

Sill: The sill is the total variance contribution, or the maximum variability between pairs of points.

Range: The range is the distance after which the variogram levels off. The physical meaning of the range is that pairs of points that are this distance (in time or in distance) are not spatially and temporally correlated.

Anisotropy: This parameter quantifies how many spatial units are equivalent to one time unit. It essentially allows you to convert time differences into "effective" spatial distances.

Spatio-temporal kriging offers a powerful and flexible framework for imputing missing values in datasets that vary across both space and time. By capturing the underlying spatial and temporal dependencies, through careful modeling of variograms and trend components, it enables more accurate, informed, and reliable predictions at unobserved locations and times. Whether you're working with environmental data, health metrics, or climate observations, incorporating spatio-temporal structure into your imputation process can significantly enhance the quality of your analysis. As sensors and monitoring systems continue to generate increasingly rich datasets, mastering techniques like spatio-temporal kriging will become essential for any data scientist or researcher dealing with incomplete spatio-temporal records.

The Stat Path

Search This Blog