Causality in extremes of time series

Bodik, Juraj; Paluš, Milan; Pawlas, Zbyněk

doi:10.1007/s10687-023-00479-5

Causality in extremes of time series

Open access
Published: 31 October 2023

Volume 27, pages 67–121, (2024)
Cite this article

Download PDF

You have full access to this open access article

Extremes Aims and scope Submit manuscript

Causality in extremes of time series

Download PDF

1335 Accesses
1 Altmetric
Explore all metrics

Abstract

Consider two stationary time series with heavy-tailed marginal distributions. We aim to detect whether they have a causal relation, that is, if a change in one causes a change in the other. Usual methods for causal discovery are not well suited if the causal mechanisms only appear during extreme events. We propose a framework to detect a causal structure from the extremes of time series, providing a new tool to extract causal information from extreme events. We introduce the causal tail coefficient for time series, which can identify asymmetrical causal relations between extreme events under certain assumptions. This method can handle nonlinear relations and latent variables. Moreover, we mention how our method can help estimate a typical time difference between extreme events. Our methodology is especially well suited for large sample sizes, and we show the performance on the simulations. Finally, we apply our method to real-world space-weather and hydro-meteorological datasets.

Trend detection for heteroscedastic extremes

Article 03 September 2019

Time-Frequency Causal Inference Uncovers Anomalous Events in Environmental Systems

On the measurement and treatment of extremes in time series

Article 14 May 2016

1 Introduction

The ultimate goal of causal inference is understanding relationships between random variables and predicting the outcomes resulting from their modification or manipulation (Peters et al. 2017). Causal inference finds utility across a wide array of scientific domains. For instance, in the field of medicine, it aids in comprehending the propagation of epileptic seizures across distinct brain regions (Imbens and Rubin 2015). Or in climate science, it facilitates the prediction of variables such as temperature and rainfall (Naveau et al. 2020) while unraveling the causes behind sudden changes in river discharges (Mhalla et al. 2020). Extensive efforts have been invested in establishing its mathematical foundation (Pearl 2009). Structural causal models (SCMs) serve as mathematical tools for representing causal relationships between variables, especially in scenarios where temporal order is unavailable. Recent advancements include estimating causal mechanisms within structural causal models (Peters et al. 2014; Mooij et al. 2016; Glymour et al. 2019). However, causal discovery in time series often necessitates alternative models (Peters et al. 2017, Chapter 10) and faces different problems (Runge et al. 2019a).

A commonly used concept for describing a causality in time series is Granger causality (Granger 1969, 1980). Though various definitions of causality, such as Sims causality (Sims 1972), structural causality (White and Lu 2010), and interventional causality (Eichler and Didelez 2010), coexist, Granger causality holds a prominent position in the literature (Berzuini et al. 2012, Chapter 22.2). It rests upon two key principles: precedence of cause over effect and the unique information contained in the cause that is otherwise unattainable. In this paper, we primarily consider strong Granger causality, which employs conditional independence (see Section 1.3). Variations include linear Granger causality (Hosoya 1977) and Granger causality in mean (Ho 2015). In the autoregressive models considered in this paper, all previously mentioned notions are closely related, and their differences are typically not of practical relevance (Berzuini et al. 2012, Chapter 22.8).

Several approaches exist for learning a causal structure in time series. State-of-the-art methods for causal discovery in time series use consecutive conditional independence testing (Zhang and Zhang 2008) or fitting the vector autoregressive (VAR) models (Eichler 2012). Runge et al. (2019b) introduce a PCMCI method that combines a state-of-the-art PC method (named after the inventors Peter Spirtes and Clark Glymour; Spirtes et al. (1993)) and MCI (momentary conditional independence; Pompe and Runge (2011)) to identify and adjust for many potential confounders. However, there is still a need for more robust methods against a variety of hidden confounders. Gerhardus and Runge (2021) introduce the LPCMCI algorithm that is adapted to handle latent confounders. Another approach leverages Shannon’s information theory, utilizing entropy and mutual information to probe causal aspects of dynamic and complex systems (Hlaváčková-Schindler et al. 2007).

Typically, causal inference methods in time series describe the causality in the body of the distribution (causality in the mean). Several articles deal with the second-order causality (causality in variance; Gemici and Polat (2021)) using, e.g., GARCH (Generalized Autoregressive Conditional Heteroskedastic) modeling (Hafner and Herwartz 2008). However, causality in extremes is a new field of research. Different perspectives can be seen when looking mainly at the tails of the distributions (Coles 2001). Many causal mechanisms are present only during extreme events, and interventions often carry information that is likely to be causal (Cox and Wermuth 1996). While the connection between extreme value theory and time series has been thoroughly studied (Yang and Hongzhi 2005; Mikosch and Wintenberger 2015; Kulik and Soulier 2020), the area of causal inference at this juncture remains unexplored.

Recent work intertwines extreme value theory and causality in SCM. Engelke and Hitz (2020) propose graphical models in the context of extremes. Kiriliouk and Naveau (2020) study probabilities of necessary and sufficient causation as defined in the counterfactual theory using the multivariate generalized Pareto distributions. Deuber et al. (2021) develop a method for estimating extremal quantiles of treatment effects. Further approaches involve recursive max-linear models on directed acyclic graphs (Gissibl and Klüppelberg 2018; Klüppelberg and Krali 2021).

Our paper aims to establish a framework of causality in extremes of time series. This work builds on the work of Gnecco et al. (2021), who first introduced the concept of causal tail coefficient in the context of SCM, followed by Pasche et al. (2022), who extended this coefficient by incorporating possible covariates into the model. We aim to move the theory of causality in extremes from SCMs into a context of Granger-type causality in time series.

The paper is organized as follows. The subsequent section provides a concise overview of the causal tail coefficient’s existing developments, presents a motivating time series example, and introduces preliminary results and notation. Section 2 contains the main results and provides an illustrative model example. Section 3 extends the proposed method and discusses outcomes under assumptions violations. The section also addresses scenarios involving hidden confounders and a choice of the minimal time delay. Section 4 addresses the estimation problem, examining estimator properties and employing simulations on synthetic datasets. In Section 5, we apply the method to two real-world datasets. First, we consider an application concerning space weather and geomagnetic storms, corroborating prior findings that employ conditional mutual information. Lastly, we employ our methodology on a hydrometeorological dataset, investigating six distinct weather and climate phenomena. To maintain brevity, proofs are relocated to the Appendix. Appendix A introduces auxiliary propositions used in the proofs, while the proofs themselves can be found in Appendix B.

1.1 SCM and existing work on the causal tail coefficient

Gnecco et al. (2021) established the groundwork for the causal tail coefficient within linear SCMs (Pearl 2009). A linear SCM over real random variables $X_1,\dots ,X_p$ is a collection of p assignments

$$\begin{aligned} X_j = \sum _{k\in pa(j)}\beta _{jk}X_k +\varepsilon _j, \quad j\in V, \end{aligned}$$

(1)

where $pa(j)\subseteq V=\{1, \dots , p\}$ denotes parents of j, $\varepsilon _1, \dots , \varepsilon _p$ are jointly independent random variables and $\beta _{j,k}\in \mathbb {R}\setminus \{0\}$ are the causal weights. We suppose that the associated graph $\mathcal {G}=(V,E)$, with the directed edge (i, j) present if and only if $i\in pa(j)$, is a directed acyclic graph (DAG) with nodes V and edges E. We say that $X_i$ causes $X_j$ if a directed path exists from i to j in $\mathcal {G}$. Structural causal models describe not only observational distributions but also distributions under interventions (or manipulations) on the variables. By intervening on $X_j$, we understand a new SCM with p assignments, all of them identical with (1) except the assignment for $X_j$ (Pearl 2009).

Let $X_i, X_j$ be a pair of random variables from linear SCM. We assume that $X_i, X_j$ are heavy-tailed with respective distributions $F_i, F_j$. The causal (upper) tail coefficient of $X_i$ on $X_j$ is defined in Gnecco et al. (2021) as

$$\Gamma _{i,j}:= \lim _{u\rightarrow 1^-}\mathbb {E}[ F_j(X_j)\mid F_i(X_i)>u ],$$

if the limit exists. This coefficient lies between zero and one and captures the causal influence of $X_i$ on $X_j$ in the upper tail since, intuitively, if $X_i$ has a monotonically increasing causal influence on $X_j$, we expect $\Gamma _{i,j}$ to be close to unity. The coefficient is asymmetric, as extremes of $X_j$ need not lead to extremes of $X_i$, and in that case, $\Gamma _{j,i}$ will be appreciably smaller than $\Gamma _{i,j}$. In Section 1.2, we discuss the intuition of the choice of this coefficient in more detail (in the context of time series).

Under certain assumptions on the tails of $\varepsilon _i$ and the linear SCM, the values of $\Gamma _{i,j}$ and $\Gamma _{j,i}$ allow us to discover the causal relationship between $X_i$ and $X_j$. These relations are summarized in Table 1.

Table 1 Causal relationship between a pair of random variables $X_i, X_j$ following appropriately restricted linear SCM under different values of the causal tail coefficients. Blank entries and cases when $\Gamma _{i,j} < 0.5$ or $\Gamma _{j,i}<0.5$ cannot occur (Gnecco et al. 2021, Theorem 1)

Full size table

Gnecco et al. (2021) proposed a consistent non-parametric estimator of $\Gamma _{i,j}$. Without loss of generality, consider $i = 1$ and $j = 2$. If $(x_{1,1},x_{1,2}), \dots ,(x_{n,1},x_{n,2})$ are $n\in \mathbb {N}$ independent replicates of $(X_1, X_2)$, the estimator of $\Gamma _{1,2}$ is defined as

$$\begin{aligned} \hat{\Gamma }_{1,2}:=\frac{1}{k}\sum _{i=1}^n\hat{F}_2(x_{i,2})\mathbb {1}(x_{i,1}>x_{(n-k+1),1}), \end{aligned}$$

(2)

where $x_{(n-k+1),1}$ is the k-th largest value of $x_{1,1}, \dots , x_{n,1}$, $\hat{F}_2(t)=\frac{1}{n}\sum _{i=1}^n \mathbb {1}(x_{i,2}\le t)$, and $\mathbb {1}(\cdot )$ is the indicator function.

Pasche et al. (2022) adapted the estimation process to account for potential confounding factors. The authors modified (2) by replacing $\hat{F}_2(x)$ by a covariate-dependent estimator, where the upper tail of $\hat{F}_2(x)$ is modeled by a Pareto approximation (Coles 2001). Moreover, Pasche et al. (2022) implemented a permutation test to formally test the hypothesis that the causal tail coefficient is equal to 1.

1.2 Motivating example and the main idea in the context of time series

The following example illustrates a typical case considered in this paper. Let $(\textbf{X},\textbf{Y})^\top = ((X_t,Y_t)^\top , t\in \mathbb {Z})$ be a bivariate strictly stationary^{Footnote 1} time series described by the following recurrent relations

$$\begin{aligned} X_t&=\frac{1}{2}X_{t-1}+\varepsilon _t^X,\\ Y_t&=\frac{1}{2}Y_{t-1}+ \sqrt{X_{t-5}} + \varepsilon _t^Y, \end{aligned}$$

where $\varepsilon _t^X, \varepsilon _t^Y\overset{iid}{\sim }$ Pareto(1, 1).^{Footnote 2}

This scenario is depicted in Fig. 1. Here, $\textbf{X}$ causes $\textbf{Y}$ (in Granger sense, see Definition 3 given later), simply because the knowledge of $\textbf{X}$ improves the prediction of $\textbf{Y}$. However, the converse is not true.

Consider the data shown in Fig. 1. Our goal is to identify any causal relationship between these time series. There is (at least in this realization) an evident asymmetry between the two time series in the extremes. If the cause ($\textbf{X}$) is extremely large, then the effect ($\textbf{Y}$) will be also extremely large (see the second and third “jump”). However, if $\textbf{Y}$ is extremely large, then $\textbf{X}$ will not necessarily be extremely large (as evident in the first “jump”). This indicates that an extreme value of $\textbf{X}$ tends to trigger an extreme value of $\textbf{Y}$, implying a causal link in an intuitive sense. Yet, a crucial factor is the presence of a time delay (or time lag). The extremes don’t have to be concurrent – there’s a time lag during which the influence of $\textbf{X}$ on $\textbf{Y}$ becomes apparent. In this specific example, this lag is exactly 5 time units.

To encapsulate this idea mathematically, we introduce the causal tail coefficient for time series

$$\begin{aligned} \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p):=\lim _{u\rightarrow 1^-}{{\,\mathrm{\mathbb {E}\,}\,}}[\max \{F_Y(Y_0), \dots , F_Y(Y_{p})\}\mid F_X(X_0)>u], \end{aligned}$$

(3)

where $F_X, F_Y$ are the marginal distribution functions of $\textbf{X}, \textbf{Y}$, respectively. This coefficient mathematically expresses natural questions: Does an extreme value in $\textbf{X}$ invariably lead to an extreme value in $\textbf{Y}$? How large $\textbf{Y}$ will be in the next p steps if $\textbf{X}$ is extremely large (in their respective scales)? In our example, we can consider $p=5$. If $X_0$ is extremely large, then $Y_5$ will surely also be extremely large (large cause implies large effect), but not the other way around (large effect does not imply large cause). Hence intuitively, the following should hold: $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)=1$, but $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)<1$. The main part of the paper consists of determining the assumptions under which this is true.

1.3 Preliminaries and notation

We use bold capital letters to represent notation for time series and random vectors. A time series, or a random process, consists of a set of random variables $\textbf{Z} = (Z_t, t \in \mathbb {Z})$ defined on the same probability space. We exclusively work with time series defined over integers.

We will use the standard notion of regular variation (Resnick 1987; Embrechts et al. 1997). A real random variable X is regularly varying with tail index $\theta >0$, if its distribution function has a form $F_X(x)=1-x^{-\theta } L(x)$ for some slowly varying function L, i.e., a function satisfying $\lim _{x\rightarrow \infty }\frac{L(\alpha x)}{L(x)} = 1$ for every $\alpha > 0$ (Kulik and Soulier 2020, Section 1.3). This property is denoted by $X\sim \textrm{RV}(\theta )$. Examples of regularly varying distributions include Pareto, Cauchy or Fréchet distributions, to name a few (Mikosch 1999). For real functions f, g, we denote $f(x)\sim g(x) \iff \lim _{x\rightarrow \infty }\frac{f(x)}{g(x)}=1$.

The main principle that we aim to use is the so-called max-sum equivalence, that is, for two random variables X, Y we have $\mathbb {P}(X+Y>x)\sim \mathbb {P}(X>x)+\mathbb {P}(Y>x)\sim \mathbb {P}(\max (X,Y)>x)$ as $x\rightarrow \infty$. This is satisfied when $X,Y\overset{iid}{\sim }\ \textrm{RV}(\theta )$ (Embrechts et al. 1997, Section 1.3.1). Similar results hold even if we deal with finite or infinite sums of random variables (Resnick 1987, Section 4.5).

Consider a bivariate process $(\textbf{X},\textbf{Y})^\top =( (X_t, Y_t)^\top , t\in \mathbb {Z})$. The concept of strong Granger causality (Berzuini et al. 2012, Chapter 22.2) can be expressed as follows: The process $\textbf{X}$ is said to cause $\textbf{Y}$ (denoted as $\textbf{X}\rightarrow \textbf{Y}$) if $Y_{t+1}$ is not independent with the past of $\textbf{X}$ given all relevant variables in the universe up to time t except the past values of $\textbf{X}$; that is,

$$Y_{t+1} \not \!\perp \!\!\!\perp \textbf{X}_{past(t)}\mid \mathcal {C}_{t}\setminus \textbf{X}_{past(t)},$$

where $past(t) = (t, t-1, t-2, \dots )$ and $\mathcal {C}_t$ represents all relevant variables in the universe up to time t. The philosophical notion of $\mathcal {C}_{t}$ is typically replaced by only a finite set of relevant variables. To illustrate with an example, given a three-dimensional process $(\textbf{X},\textbf{Y},\textbf{Z})^\top =( (X_t, Y_t, Z_t)^\top , t\in \mathbb {Z})$, we replace the information set $\mathcal {C}_t$ by $(\textbf{X}_{past(t)}, \textbf{Y}_{past(t)}, \textbf{Z}_{past(t)})^\top$ and say that the process $\textbf{X}$ causes $\textbf{Y}$ with respect to $(\textbf{X},\textbf{Y},\textbf{Z})^\top$ if $Y_{t+1} \not \!\perp \!\!\!\perp \textbf{X}_{past(t)}\mid (\textbf{Y}_{past(t)}, \textbf{Z}_{past(t)})$. We have to note that such $\textbf{X}$ has to be seen only as a potential cause, since enlarging the information set can lead to a change in the causal structure.

In certain models, the definition of causality can be simplified (Palachy 2019). For instance, Sims (1972) demonstrated that the Granger causality definition is equivalent to certain parameter restrictions within a linear framework. We provide the formal definition in Section 2.1 for a specific class of autoregressive models.

2 The causal tail coefficient for time series

The central concept introduced in this paper is the causal tail coefficient for time series $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ with extremal delay $p\in \mathbb {N}$ defined in (3). In scenarios where we exclude instantaneous effects (such as when $X_0$ directly causes $Y_0$), we can directly employ the following coefficient:

$$\begin{aligned} \begin{aligned} \Gamma ^{time, -0}_{\textbf{X}\rightarrow \textbf{Y}}(p):=\lim _{u\rightarrow 1^-}{{\,\mathrm{\mathbb {E}\,}\,}}[\max \{F_Y(Y_1), \dots , F_Y(Y_{p})\}\mid F_X(X_0)>u], \end{aligned} \end{aligned}$$

where, as usual, $F_X, F_Y$ are the marginal distribution functions of $\textbf{X}, \textbf{Y}$, respectively.

Notice that $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\in [0,1]$, and $\Gamma ^{time,-0}_{\textbf{X}\rightarrow \textbf{Y}}(p)\le \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\le \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p+1)$. Furthermore, for any increasing functions $h_1$ and $h_2:\mathbb {R}\rightarrow \mathbb {R}$, we observe $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p) = \Gamma ^{time}_{h_1(\textbf{X})\rightarrow h_2(\textbf{Y})}(p)$, where $h_1(\textbf{X})=(h_1(X_t), t\in \mathbb {Z})$, as $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ depends solely on rescaled margins $F_X(X_i)$ and $F_Y(Y_i)$.

2.1 Models

In this paper, we work with two models of time series— $\textrm{VAR}(q)$ process (vector autoregressive process of order q, Section 2.3.1 in Lütkepohl (2005)) and $\textrm{NAAR}(q)$ model (nonlinear additive autoregressive model of order q). We now introduce the notation.

Definition 1

We say that $(\textbf{X},\textbf{Y})^\top =((X_t,Y_t)^\top , t\in \mathbb {Z})$ follows the bivariate $\textrm{VAR}(q)$ model, if it has the following representation:

$$\begin{aligned} X_t&= \alpha _{1}X_{t-1}+\dots + \alpha _{q}X_{t-q}+ \gamma _{1}Y_{t-1}+\dots +\gamma _{q}Y_{t-q} + \varepsilon _t^X,\\ Y_t&=\beta _{1}Y_{t-1}+\dots + \beta _{q}Y_{t-q} + \delta _{1}X_{t-1}+\dots +\delta _{q}X_{t-q} + \varepsilon _t^Y, \end{aligned}$$

where $\alpha _i, \beta _i, \gamma _i, \delta _i\in \mathbb {R}$, $i=1,\dots ,q$, are real constants, and $(\varepsilon _t^X, t\in \mathbb {Z})$, $(\varepsilon _t^Y, t\in \mathbb {Z})$ are white noises. We say that it satisfies the stability condition, if ${{\,\textrm{det}\,}}(I_{d}-A_1z-\dots -A_qz^q)\ne 0$ for all $|z|\le 1$, where $I_d$ denotes the d-dimensional identity matrix and $A_i = \begin{pmatrix} \alpha _i &{} \gamma _i \\ \beta _i &{} \delta _i \end{pmatrix}$.

We say that $\textbf{X}$ (Granger) causes $\textbf{Y}$ (notation $\textbf{X}\rightarrow \textbf{Y}$) if there exists $i\in \{1, \dots , q\}$ such that $\delta _i\ne 0$.

Let $i,j\in \mathbb {Z}: 0\le j-i\le q$. We can specify that $X_i$ causes $Y_j$ if $\delta _{j-i}\ne 0$. Note that $\textbf{X}$ causes $\textbf{Y}$ if and only if there exist $i\le j$ such that $X_i$ causes $Y_j$. In this paper, our focus is on exploring whether $\textbf{X}$ causes $\textbf{Y}$. It’s also important to highlight that a situation where $\textbf{X}$ causes $\textbf{Y}$ and vice versa simultaneously is admissible.

Under the stability assumption, we can rewrite these time series using causal representations (Lütkepohl 2005, Section 2.1.3):

$$\begin{aligned} X_t=\sum _{i=0}^\infty a_i\varepsilon ^X_{t-i}+\sum _{i=0}^\infty c_i\varepsilon ^Y_{t-i};\qquad Y_t=\sum _{i=0}^\infty b_i\varepsilon ^Y_{t-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{t-i}, \end{aligned}$$

(4)

with suitable constants $a_i, b_i, c_i, d_i\in \mathbb {R}$. Thus, $\textbf{X}$ causes $\textbf{Y}$ if and only if there exists $i: d_i\ne 0$ (Kuersteiner 2010).

Now, we state our model assumptions.

Definition 2

(Heavy-tailed VAR model) Let $(\textbf{X},\textbf{Y})^\top$ follow the stable $\textrm{VAR}(q)$ model as defined above with its causal representation given by (4). We introduce the assumptions:

$\varepsilon _t^X, \varepsilon _t^Y\overset{\text {iid}}{\sim }\textrm{RV}(\theta )$ for some $\theta >0$,
$\alpha _i, \beta _i, \gamma _i, \delta _i\ge 0$,
$\exists \delta >0$ such that $\sum _{i=0}^\infty a_i^{\theta -\delta }<\infty , \sum _{i=0}^\infty b_i^{\theta -\delta }<\infty , \sum _{i=0}^\infty c_i^{\theta -\delta }<\infty$,$\sum _{i=0}^\infty d_i^{\theta -\delta }<\infty$.

Under these assumptions, we refer to $(\textbf{X},\textbf{Y})^\top$ as following the heavy-tailed $\textrm{VAR}(q, \theta )$ model ($\textrm{hVAR}(q,\theta )$ for short).

The first assumption is crucial, as it ensures the regular variation of our time series. The second assumption can be relaxed and can be replaced by the extremal causal condition discussed in Section 3.1. The third assumption is aimed at ensuring the a.s. summability of the sums $\sum _{i=0}^\infty a_i\varepsilon ^X_{t-i}$. Moreover, this assumption guarantees the stationarity of $\textbf{X}$ and $\textbf{Y}$. It also establishes a crucial max-sum equivalence relationship, namely $\mathbb {P}(\sum _{i=0}^\infty \alpha _i\varepsilon _i^\cdot>u)\sim [\sum _{i=0}^\infty \alpha _i^\theta ] \mathbb {P}(\varepsilon _1^\cdot >u)$ (Mikosch and Samorodnitsky 2000, Lemma A.3).

We consider a nonlinear generalization in order to minimize the assumptions on the body of the time series. First, we introduce the nonlinear counterpart of Definition 1.

Definition 3

We define $(\textbf{X},\textbf{Y})^\top =((X_t,Y_t)^\top , t\in \mathbb {Z})$ to follow the bivariate $\textrm{NAAR}(q)$ model, specified by the equations:

$$\begin{aligned} X_t=f_{1}(X_{t-1}) + f_{2}(Y_{t-q}) + \varepsilon _t^X; \qquad Y_t=g_{1}(Y_{t-1}) + g_{2}(X_{t-q}) + \varepsilon _t^Y. \end{aligned}$$

We state that $\textbf{X}$ (Granger) causes $\textbf{Y}$ if $g_2$ is a non-constant function on the support of $X_{t-q}$ (a.s.).

It is interesting to note that in the univariate case $d=1$, an often used condition $\lim _{|x|\rightarrow \infty }\frac{|f_1(x)|}{|x|}<1$ is “almost” necessary for stationarity ((Bhattacharya and Lee 1995, Corollary 2.2) or (Andel 1989, Theorem 2.2)).

Definition 4

(Heavy-tailed NAAR model) Let $(\textbf{X},\textbf{Y})^\top$ follow the stationary $\textrm{NAAR}(q)$ model from Definition 3. We require functions $f_1, f_2, g_1, g_2$ to either be zero functions or continuous non-negative functions satisfying $\lim _{x\rightarrow \infty }h(x)=\infty$ and $\lim _{x\rightarrow \infty }\frac{h(x)}{x}<1$ for $h=f_1, f_2, g_1, g_2$. Moreover, let $\varepsilon _t^X, \varepsilon _t^Y\overset{\text {iid}}{\sim }\textrm{RV}(\theta )$ be non-negative for some $\theta >0$. Then, we say that $(\textbf{X},\textbf{Y})^\top$ follows the heavy-tailed $\textrm{NAAR}(q,\theta )$ model ($\textrm{hNAAR}(q,\theta )$ for short).

It’s important to emphasize that our restrictions on the functions $f_1, f_2, g_1, g_2$ are minimal in the body – we’ve only imposed conditions on their tails. By imposing the constraint $\lim _{x\rightarrow \infty }\frac{h(x)}{x}<1$ for $h=f_2, g_2$, we ensure that the tail of $f_2(Y_{t-q})$ (resp. $g_2(X_{t-q})$) is not larger than the tail of $Y_{t-q}$ (resp. $X_{t-q}$). Assumption $\lim _{x\rightarrow \infty }\frac{h(x)}{x}<1$ for $h=f_1, g_1$ is closely related to the time series’ stationarity. In the $\textrm{hNAAR}$ models, our assumption of regular variation for the noise variables directly implies that $X_t, Y_t\sim \textrm{RV}(\theta )$ (Yang and Hongzhi 2005, Theorem 2.3). Moreover, our framework doesn’t assume the existence of any moments; it accommodates cases like $\theta <1$, where the expectation $\mathbb {E}(X_t)$ does not exist.

Obviously, $\textrm{hNAAR}(q,\theta )$ models encompass a broader class of time series models compared to $\textrm{hVAR}(q,\theta )$ models. However, they are not nested. In the $\textrm{hNAAR}(q,\theta )$ case, we assumed that the $X_t$ and $Y_t$ are functions of only two previous values ($X_t$ being a function of $X_{t-1},Y_{t-q}$ and $Y_t$ being a function of $Y_{t-1}, X_{t-q}$). In the VAR case, $X_t, Y_t$ can depend on more than two previous values. Moreover, the NAAR case has an additional assumption $\varepsilon _t^X, \varepsilon _t^Y\ge 0$. Nevertheless, up to these two differences, the class of $\textrm{hVAR}(q,\theta )$ models lies inside of the class of $\textrm{hNAAR}(q,\theta )$ models.

2.2 Causal direction

The subsequent two theorems constitute the core of this paper, connecting the classical concept of causality with causality in extreme events.

Theorem 1

Let $(\textbf{X},\textbf{Y})^\top$ be a bivariate time series which follows either the $\textrm{hVAR}(q,\theta )$ model or the $\textrm{hNAAR}(q,\theta )$ model. If $\textbf{X}$ causes $\textbf{Y}$, then $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(q)=1$.

The proof can be found in Appendix B. Intriguingly, the regular variation condition is not used within the proof. We assume that we know the exact (correct) order q. Nevertheless, for every $p\ge q$, we have $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\ge \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(q) =1$. The choice of an appropriate delay p will be discussed in Section 3.3.

Theorem 2

Let $(\textbf{X},\textbf{Y})^\top$ be a bivariate time series which follows either the $\textrm{hVAR}(q, \theta )$ model or the $\textrm{hNAAR}(q, \theta )$ model. If $\textbf{Y}$ does not cause $\textbf{X}$, then $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)<1$ for all $p\in \mathbb {N}$.

The proof can be found in Appendix B. The primary step of the proof stems from Proposition 2, presented in Appendix A. The core idea is that large sums of independent, regularly varying random variables tend to be driven by only a single large value. Consequently, if $Y_0$ is large, it could be attributed to the largeness of an $\varepsilon _i^Y$, which does not affect $\textbf{X}$.

Note that distinct notation is employed for the time series order (denoted as $q\in \mathbb {N}$) and the extremal delay (denoted as $p\in \mathbb {N}$). Although these two coefficients need not be equivalent, Theorem 1 prompts our primary focus on scenarios where $p\ge q$.

Example 1

Consider the bivariate time series $(\textbf{X},\textbf{Y})^\top$ described by the equations:

$$\begin{aligned} X_t&=0.5X_{t-1}+ \varepsilon _t^X;\qquad Y_t=0.5Y_{t-1}+0.5X_{t-1}+ \varepsilon _t^Y, \end{aligned}$$

where $\varepsilon _t^X, \varepsilon _t^Y\overset{\text {iid}}{\sim }\textrm{Pareto}(1,1)$, with a tail index $\theta =1$. A causal representation of this $\textrm{hVAR}(1,1)$ model is given by:

$$\begin{aligned} X_t=\sum _{i=0}^\infty \frac{1}{2^i}\varepsilon ^X_{t-i}; \qquad Y_t=\sum _{i=0}^\infty \frac{1}{2^i}\varepsilon ^Y_{t-i}+\sum _{i=0}^\infty \frac{i}{2^i}\varepsilon ^X_{t-i}. \end{aligned}$$

In this case, the order is $q=1$, and it is sufficient to take only

$$\Gamma ^{time, -0}_{\textbf{X}\rightarrow \textbf{Y}}(1)=\lim _{u\rightarrow 1^-}{{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(Y_1)\mid F_X(X_0)>u],$$

as discussed in Section 3.3. Let us give some vague computation of this coefficient. From Theorem 1, we know that $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(1)=1$. For the other direction, rewrite

$$\begin{aligned} \begin{aligned} \lim _{u\rightarrow 1^-}{{\,\mathrm{\mathbb {E}\,}\,}}[F_X(X_1)\mid F_Y(Y_0)>u]= \lim _{v\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[F_X(X_1)\mid \sum _{i=0}^\infty \frac{1}{2^i}\varepsilon ^Y_{-i}+\sum _{i=0}^\infty \frac{i}{2^i}\varepsilon ^X_{-i}>v]. \end{aligned} \end{aligned}$$

First, note the following relations (the first follows from the independence and the second from Lemma 5 in Appendix A):

$$\begin{aligned}&\lim _{v\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[F_X(X_1)\mid \sum _{i=0}^\infty \frac{1}{2^i}\varepsilon ^Y_{-i}>v]={{\,\mathrm{\mathbb {E}\,}\,}}[F_X(X_1)]=1/2,\\&\lim _{v\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[F_X(X_1)\mid \sum _{i=0}^\infty \frac{i}{2^i}\varepsilon ^X_{-i}>v]=1. \end{aligned}$$

Furthermore, we know that $\mathbb {P}(X_1<\lambda \mid \sum _{i=0}^\infty \frac{1}{2^i}\varepsilon ^Y_{-i}+\sum _{i=0}^\infty \frac{i}{2^i}\varepsilon ^X_{-i}>v)= \frac{\mathbb {P}(X_1<\lambda )}{2}$ for every constant $\lambda \in \mathbb {R}$ (using Proposition 2^{Footnote 3}). This implies that, with a probability of 1/2, $X_1|{F_Y(Y_0)>u}$ has the same distribution as non-conditional $X_1$, and with complementary probability, it tends to $\infty$ as $u\rightarrow 1^-$. Combining these results, we obtain:

$$\Gamma ^{time, -0}_{\textbf{Y}\rightarrow \textbf{X}}(1) = \lim _{u\rightarrow 1^-}{{\,\mathrm{\mathbb {E}\,}\,}}[F_X(X_1)\mid F_Y(Y_0)>u]=\frac{1}{2}\cdot \frac{1}{2} + \frac{1}{2}\cdot 1=\frac{3}{4}.$$

The order q is usually unknown. If we take $p=2$, then the value of

$$\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(2)=\lim _{u\rightarrow 1^-}{{\,\mathrm{\mathbb {E}\,}\,}}[\max \{F_X(X_0), F_X(X_1), F_X(X_2)\}\mid F_Y(Y_0)>u]$$

will be slightly larger than $\frac{3}{4}$. More precisely, it will be equal to $\frac{1}{2}\cdot {{\,\mathrm{\mathbb {E}\,}\,}}[\max \{F_X(X_0), F_X(X_1),F_X(X_2)\}] + \frac{1}{2}\cdot 1$. The true value is around 0.80, as determined through simulations and the use of computer software.

3 Properties and extensions

In this section, we exclusively focus on Heavy-tailed VAR models. While we believe that similar results also apply to Heavy-tailed NAAR models, the formulation and proof of such results are beyond the scope of this paper.

3.1 Real-valued coefficients

We discuss the extension to include potentially negative coefficients (as indicated by the second assumption in Definition 2) and non-directly proportional dependencies. Until now, we have assumed that a large positive $\textbf{X}$ causes large positive $\textbf{Y}$. In other words, we have assumed that all coefficients in the $\textrm{hVAR}$ model are non-negative.

The most straightforward modification arises when a large positive $\textbf{X}$ causes large negative $\textbf{Y}$ (where the gain for one causes a loss for others). In such cases, it suffices to consider the maxima of the pair $(\textbf{X}, -\textbf{Y})$. While we apply this simple modification in our application, it cannot be universally applied.

The concept behind extending the causal tail coefficient for time series involves utilizing the absolute values $|\textbf{X}|$ and $|\textbf{Y}|$ instead of $\textbf{X}$ and $\textbf{Y}$. However, to implement this extension, we need to restrict the family of $\textrm{VAR}(q)$ models in a specific manner. Theorems 1 and 2 do not hold if we allow arbitrary coefficients $\alpha _i$, $\beta _i$, $\gamma _i$, $\delta _i \in \mathbb {R}$, whether positive or negative, where $i=1, \dots , q$. The following example illustrates a problematic scenario where Theorems 1 and 2 do not hold.

Example 2

Consider a vector $(\textbf{X}, \textbf{Y})^\top$ following a $\textrm{VAR}(2)$ model defined as:

$$\begin{aligned} X_t&=0.5X_{t-1}+ \varepsilon _t^X; \qquad Y_t=X_{t-1} - 0.5 X_{t-2}+ \varepsilon _t^Y. \end{aligned}$$

Its causal representation can be written as

$$\begin{aligned} X_t&=\sum _{i=0}^\infty \frac{1}{2^i}\varepsilon ^X_{t-i};\qquad Y_t=\varepsilon _t^Y + \varepsilon ^X_{t-1}. \end{aligned}$$

Detecting extreme causal relationships can be challenging in this model. This is because, despite $\textbf{X}$ being a cause of $\textbf{Y}$, it cannot be assumed that the extreme in $X_{t-1}$ will necessarily result in an extreme in $Y_t$. Even if $X_{t-2}$ is large, leading to a large $X_{t-1}$, the same does not hold for $Y_t$. To address this, we will impose a restriction on our model in the following manner.

Definition 5

(Extremal causal condition) Consider the pair $(\textbf{X},\textbf{Y})^\top$, which follows the stable $\textrm{VAR}(q)$ model, with its causal structure taking the form:

$$\begin{aligned} X_t=\sum _{i=0}^\infty a_i\varepsilon ^X_{t-i}+\sum _{i=0}^\infty c_i\varepsilon ^Y_{t-i};\qquad Y_t=\sum _{i=0}^\infty b_i\varepsilon ^Y_{t-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{t-i}. \end{aligned}$$

Let $\textbf{X}$ cause $\textbf{Y}$. We say that $(\textbf{X},\textbf{Y})^\top$ satisfies an extremal causal condition, if there exists an integer $r \le q$ such that, for every $i\in \mathbb {N}\cup \{0\}$:

$$a_i\ne 0 \implies d_{i+r} \ne 0.$$

Lemma 1

The extremal causal condition holds in the $\textrm{hVAR}(q, \theta )$ models (i.e., where the coefficients are non-negative) when $\textbf{X}$ causes $\textbf{Y}$.

The proof can be found in Appendix B. Clearly, the aforementioned condition holds in models where $d_i \ne 0$ for all $i > 0$.

The extremal causal condition embodies the following concept: Take $k\ge r$. If $\varepsilon _t^X$ is “extreme”, this extreme event will also influence $Y_{t+k}$. This implication is in particular true if some of the coefficients are negative. If $\varepsilon _t^X$ is “extremely positive”, this implies that $Y_{t+k}$ will be “extremely negative”. Nevertheless, under the extremal causal condition, if $\varepsilon _t^X$ is “extreme”, $Y_{t+k}$ will also be “extreme”.

Formulating similar conditions for $\textrm{NAAR}$ models is challenging, since we would require that an appropriate combination of functions is non-zero. Due to the extremal causal condition, our focus in this section remains exclusively on $\textrm{VAR}$ models, excluding $\textrm{NAAR}$ models. The following theorem demonstrates that, even when negative coefficients are introduced, the principles outlined in Section 2 remain applicable, thanks to the extremal causal condition.

Theorem 3

Let $(\textbf{X},\textbf{Y})^\top$ be a time series which follows the $\textrm{hVAR}(q, \theta )$ model, with possibly negative coefficients, satisfying the extremal causal condition. Moreover, let $\varepsilon _t^X, \varepsilon _t^Y$ have full support on $\mathbb {R}$, and $|\varepsilon _t^X|, |\varepsilon _t^Y|\sim \textrm{RV}(\theta )$. If $\textbf{X}$ causes $\textbf{Y}$, and $\textbf{Y}$ does not cause $\textbf{X}$, then $\Gamma ^{time}_{|\textbf{X}|\rightarrow |\textbf{Y}|}(q)=1$, and $\Gamma ^{time}_{|\textbf{Y}|\rightarrow |\textbf{X}|}(q)<1$.

The proof can be found in Appendix B.

3.2 Common cause and different tail behavior

Reichenbach’s common cause principle (Reichenbach 1956) states that for every given pair of random variables $(\textbf{X}, \textbf{Y})$, precisely one of the following statements holds true: $\textbf{X}$ causes $\textbf{Y}$, $\textbf{Y}$ causes $\textbf{X}$, they are independent or there exists $\textbf{Z}$ causing both $\textbf{X}$ and $\textbf{Y}$. The challenge lies in distinguishing between true causality and dependence stemming from a common cause. The subsequent theorem demonstrates that our methodology allows us to distinguish between causality and a correlation due to a common cause.

Theorem 4

Let $(\textbf{X}, \, \textbf{Y}, \, \textbf{Z})^\top = ((X_t,Y_t,Z_t)^\top , t\in \mathbb {Z})$ follow the three-dimensional stable $\textrm{VAR}(q)$ model, with iid regularly varying noise variables. Let $\textbf{Z}$ be a common cause of both $\textbf{X}$ and $\textbf{Y}$, and neither $\textbf{X}$ nor $\textbf{Y}$ cause $\textbf{Z}$. If $\textbf{Y}$ does not cause $\textbf{X}$, then $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)<1$ for all $p\in \mathbb {N}$.

The proof can be found in Appendix B. Theorem 4 makes an initial assumption that the tail indexes of $\textbf{X}, \textbf{Y},$ and $\textbf{Z}$ are equal. However, it can be easily extended to the case when $\textbf{Z}$ has lighter tails (see discussion in Section 4.5). In practical scenarios, complete observation of all pertinent data is often unattainable. Nevertheless, Theorem 4 remains valid even when we do not observe the common cause. However, the common cause still needs to fulfill the condition that noise is regularly varying with not heavier tails than those of $\textbf{X}$ and $\textbf{Y}$. We can not check this assumption in practice.

Example 3

Let $(\textbf{X}, \, \textbf{Y}, \, \textbf{Z})^\top$ follow the three-dimensional $\textrm{VAR}(1)$ model, specified by

$$\begin{aligned} Z_t&=0.5Z_{t-1}+\varepsilon _t^Z,\\ X_t&=0.5X_{t-1}+0.5Z_{t-1} + \varepsilon _t^X,\\ Y_t&=0.5Y_{t-1}+0.5Z_{t-1} +0.5X_{t-1} + \varepsilon _t^Y, \end{aligned}$$

where $\varepsilon _t^X, \varepsilon _t^Y\overset{\text {iid}}{\sim }\textrm{Pareto}(2,2)$ and $\varepsilon _t^Z \overset{\text {iid}}{\sim }\textrm{Pareto}(1,1)$ (i.e., $\varepsilon _t^Z$ has a heavier tail than $\varepsilon _t^X, \varepsilon _t^Y$). Then $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(1)=\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(1)=1$,^{Footnote 4} even though $\textbf{Y}$ does not cause $\textbf{X}$.

Even within the context of the unconfounded scenario (or when $\varepsilon _t^Z$ possesses tails that are lighter than those of $\varepsilon _t^X$ and $\varepsilon _t^Y$), a distinct challenge emerges when the tail behaviors of $\varepsilon _X$ and $\varepsilon _Y$ differ. Results from Section 2 remain applicable if $\varepsilon _X$ has lighter tails than $\varepsilon _Y$. However, if $\varepsilon _X$ has heavier tails than $\varepsilon _Y$, then $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(1)=\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(1)=1$ and we can not distinguish between the cause and the effect. A more detailed discussion can be found in Section 4.5.

3.3 Estimating the extremal delay p

Up to this point, we have assumed prior knowledge of the exact order q of our time series, with p being set equal to q. However, what if this order is unknown? In cases where p is too small, accurate causal relationships cannot be obtained (refer to Lemma 2). Conversely, as p becomes larger, $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)$ approaches 1, posing challenges for empirical inference.

One practical approach to address this challenge is to utilize the extremogram (Davis and Mikosch 2009). Similar to the role of correlograms in classical cases, extremograms help us select a “reasonable” value for p by examining plots. However, we can examine the values of $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{ X}}(p)$ for a range of values for p. An illustrative example can be found in Section 4.4.

Now, let’s delve into another consideration: the problem of estimating the temporal synchronization between two time series $(\textbf{X}, \textbf{Y})^\top$, where $\textbf{X}$ is the cause of $\textbf{Y}$. The objective is to determine the time required for information originating from $\textbf{X}$ to impact $\textbf{Y}$. In the context of an intervention applied to $\textbf{X}$, the question is: when will this intervention influence $\textbf{Y}$? A practical instance of this scenario might arise in economics, such as when dealing with two time series representing milk and cheese prices over time. Consider a situation where the government imposes a large tax increase on milk prices at a specific moment. In such a case, when can we expect to observe a subsequent rise in cheese prices? Mathematically, we aim to estimate a parameter known as the minimal delay.

Definition 6

(Minimal delay) Let $(\textbf{X},\textbf{Y})^\top$ follow the stable $\textrm{VAR}(q)$ model specified in Definition 2. We call $s\in \mathbb {N}$ the minimal delay, if $\gamma _1=\dots = \gamma _{s-1} = \delta _1=\dots = \delta _{s-1}=0$ and either $\delta _s\ne 0$ or $\gamma _s\ne 0$. If such s does not exist, we define the minimal delay as $+\infty$.

The subsequent lemma establishes connections between the minimal delay and the causal tail coefficient for time series.

Lemma 2

Let $(\textbf{X},\textbf{Y})^\top$ follow the $\textrm{hVAR}(q, \theta )$ model, where $\textbf{X}$ causes $\textbf{Y}$. Let s be the minimal delay. Then, $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(r)<1$ for all $r<s$, and $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(r)=1$ for all $r\ge s$.

The proof can be found in Appendix B. An approach to estimating the minimal delay s involves identifying the smallest value of s for which $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(s)=1$.

Estimating the minimal delay s through extreme values holds relevance across various applications. One such application involves the assessment of synchrony between two time series. Synchrony, as described in Cheong (2020), quantifies the interdependence of two time series, seeking to determine an optimal lag that aligns the time series most closely. Conventional methodologies encompass techniques like time-lagged cross-correlation or dynamic time warping. Utilizing the causal tail coefficient for time series can offer an avenue to proceed using extreme values, facilitating the conception of synchronization of extremes. This may pave the way for future research and methodological developments.

4 Estimations and simulations

All the methodologies presented in this section have been implemented in the R programming language (R Core Team 2022). The corresponding code can be accessed either in the supplementary package or through the online repository available at https://github.com/jurobodik/Causality-in-extremes-of-time-series.git.

4.1 Non-parametric estimator

We propose an estimator of the causal tail coefficient $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ based on a finite sample $(x_1, y_1)^\top , \dots , (x_n, y_n)^\top , n\in \mathbb {N}$. The estimator uses only the values of $y_i$ where the corresponding $x_i$ surpasses a given threshold.

Definition 7

We define

$$\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p):=\frac{1}{k}\sum _{\begin{array}{c} i = 1, \dots , n-p \\ x_i\ge \tau _k^X \end{array}}\max \{\hat{F}_Y(y_i), \dots , \hat{F}_Y(y_{i+p})\},$$

where $\tau _k^X=x_{(n-k+1)}$ is the k-th largest value of $x_1, \dots , x_n$, where $k\le n$, and $\hat{F}_Y(t)=\frac{1}{n}\sum _{j=1}^n \mathbb {1}(y_j\le t)$.

The value k signifies the number of extremes considered in the estimator. In the upcoming discussions, k will be contingent upon n, leading us to employ the notation $k_n$ instead of k. A fundamental condition in extreme value theory is expressed as:

$$\begin{aligned} k_n\rightarrow \infty , \; \frac{k_n}{n}\rightarrow 0, \quad \text { as } n\rightarrow \infty . \end{aligned}$$

(5)

Note that $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ always lies between zero and one. The next theorem shows that such a statistic is “reasonable” by showing that it is asymptotically unbiased. This result doesn’t necessitate any stringent model assumptions; it only relies on the assumption of a certain rate of convergence for the empirical distribution function.

Theorem 5

Let $(\textbf{X},\textbf{Y})^\top =((X_t,Y_t)^\top , t\in \mathbb {Z})$ be a stationary bivariate time series, whose marginal distributions are absolutely continuous with support on some neighborhood of infinity. Let $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ exist. Let $k_n$ satisfy (5) and

$$\begin{aligned} \frac{n}{k_n}\mathbb {P}\left( \frac{n}{k_n} \sup _{x\in \mathbb {R}}|\hat{F}_X(x)-F_X(x)|>\delta \right) \overset{n\rightarrow \infty }{\longrightarrow }0, \,\,\, \forall \delta >0. \end{aligned}$$

(6)

Then, ${{\,\mathrm{\mathbb {E}\,}\,}}\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\overset{n\rightarrow \infty }{\rightarrow }\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$.^{Footnote 5}

Consequence 1

Let $\textbf{X}\rightarrow \textbf{Y}$. Under the assumptions of Theorems 1 and 5, the proposed estimator is consistent; that is, $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\overset{P}{\rightarrow }\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ as $n\rightarrow \infty$.

The proofs of Theorem 5 and Consequence 1 can be found in Appendix B. To establish asymptotic properties of $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$, it is reasonable to assume the consistency of $\hat{F}_X$. However, this assumption is accompanied by an additional requirement for the rate of its convergence, as specified in condition (6). While this condition is not overly restrictive, we believe that the condition (6) can be improved. For iid random variables, condition (6) holds if $k_n^2/n \rightarrow \infty$ (follows from the Dvoretzky–Kiefer–Wolfowitz inequality). The following lemma establishes that condition (6) is also fulfilled within a certain subset of autoregressive models.

Lemma 3

Let $\textbf{X} = (X_t, t\in \mathbb {Z})$ has the form

$$X_t = \sum _{k=0}^\infty a_k \varepsilon _{t-k},$$

where $(\varepsilon _t, t \in \mathbb {Z})$ are iid random variables with density $f_\varepsilon$. Assume $|a_k|\le \gamma k^{-\beta }$ for some $\beta >1$, $\gamma >0$ and all $k \in \mathbb {N}$. Let $f^\star = \max (1, |f_\varepsilon |_{\infty }, |f'_{\varepsilon }|_{\infty }) < \infty$, where $|f|_{\infty } = \sup _{x\in \mathbb {R}}|f(x)|$ is the supremum norm. Assume ${{\,\mathrm{\mathbb {E}\,}\,}}|\varepsilon _0|^q<\infty$ for $q>2$ and $\mathbb {P}(|\varepsilon _0|>x) \le L(\log x)^{-r_0}x^{-q}$ for some constants $L>0, r_0>1$ and for every $x>1$. If the sequence $(k_n)$ satisfies (5) and

$$\begin{aligned} \exists c>\max \left\{ \frac{1}{2},\frac{2}{1+q\beta }\right\} : \frac{k_n}{n^c}\rightarrow \infty , \text { as }n\rightarrow \infty , \end{aligned}$$

(7)

then the condition (6) is satisfied.

The proof can be found in Appendix B. It is based on Proposition 13 in Chen and Wu (2018). Several concentration bounds exist that describe the convergence speed of the empirical distribution functions, and different assumptions can be made to guarantee the validity of the condition (6). According to Theorem 1.2 in Kontorovich and Weiss (2014), condition (6) holds for geometrically ergodic Markov chains. Regrettably, this result was originally presented solely for $\mathbb {N}$-valued random variables. The Dvoretzky–Kiefer–Wolfowitz inequality provides a similar outcome, but we did not come across a specific adaptation of this inequality for autoregressive time series.

Lemma 3 requires a finite second moment of $\varepsilon _i$, a technical condition that follows from the findings in Chen and Wu (2018). In cases of heavy-tailed time series characterized by a tail index $\theta$, this condition necessitates $\theta >2$. Notably, the coefficient $\beta$ in Lemma 3 encapsulates the strength of dependence in the time series. Larger values of $\beta$ correspond to faster decay of dependence, thereby allowing for milder assumptions on the intermediate sequence $(k_n)$. Throughout the paper, we opt for the choice $k_n=\sqrt{n}$, a decision briefly explored in Section 4.3.

4.2 Some insight using simulations

We will simulate the behavior of the estimates $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}$ for a series of models. Initially, we employ the Monte Carlo principle to estimate the distributions of $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}$ and $\hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}$ for the following model.

Model 1

Let $(\textbf{X},\textbf{Y})^\top$ follow the $\textrm{VAR}(2)$ model

$$\begin{aligned} X_t&=0.5X_{t-1}+ \varepsilon _t^X; \qquad Y_t=0.5Y_{t-1}+ \delta X_{t-2}+\varepsilon _t^Y, \end{aligned}$$

where $\varepsilon _t^X, \varepsilon _t^Y$ are independent noise variables and $\delta \in \mathbb {R}$.

Figure 2 presents the histograms of $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}(2)$ and $\hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}(2)$ from 1000 simulated instances of time series, each with a length of $n=5000$, following Model 1. The parameter $\delta$ is set to 0.5, and $\varepsilon _t^X$ and $\varepsilon _t^Y$ are drawn from the Cauchy distribution. Figure 2 illustrates that within the causal direction, the values of $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}$ predominantly fall within the range of (0.9, 1). Conversely, in the non-causal direction, the values of $\hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}$ typically lie below 0.9. This underscores the asymmetry between the cause and the effect.

We proceed by simulating $(\textbf{X},\textbf{Y})^\top$ according to Model 1. We explore three different values for the parameter $\delta$ (0.1, 0.5, and 0.9) and three distinct sample sizes ($n = 100$, 1000, and 10000). The random variables $\varepsilon _t^X$ and $\varepsilon _t^Y$ are generated from either the standard normal distribution (thus violating the RV assumption) or the standard Pareto distribution. For each value of $\delta$, noise distribution, and sample size, we calculate the estimators $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}:= \hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(2)$. This procedure is repeated 200 times, and we compute the means and quantiles of these estimators. The summarized outcomes are presented in Table 2, where each cell corresponds to a specific model defined by $\delta$, noise distribution, and sample size.

Table 2 The results obtained from 200 simulated time series following Model 1 where $\textbf{X}$ causes $\textbf{Y}$. Each cell in the table corresponds to a distinct coefficient $\delta$, a specific number of data-points n, and a particular noise distribution. The value $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}=\cdot \pm \cdot$ represents the mean of all 200 estimated coefficients $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}$, along with the difference between the $95\%$ empirical quantile and the mean, based on all 200 simulations

Full size table

Conclusion

The method’s performance is unexpectedly robust even when the assumption of regular variation is violated. Although we lack theoretical justification for this observation, particularly with Gaussian noise where the true theoretical value is $\Gamma _{\textbf{X}\rightarrow \textbf{Y}}^{time}(p) = \Gamma _{\textbf{Y}\rightarrow \textbf{X}}^{time}(p) = 1$, empirical results suggest that $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}(p)$ converges faster than $\hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}(p)$. This sub-asymptotic behavior reveals an inherent asymmetry. Further investigation is required to determine whether this observation holds as a general result.

Conversely, when the tails of the noise variables differ between the cause and the effect, our method does not perform well. If $\varepsilon _t^X$ possesses heavier tails than $\varepsilon _t^Y$, for large n, both estimates tend to converge close to 1. Alternatively, if $\varepsilon _t^X$ has lighter tails than $\varepsilon _t^Y$, both $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}$ and $\hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}$ tend to be very small, significantly deviating from 1. Consequently, different tail behaviors of the noise variables appear to be the primary challenge. This issue is explored in greater detail in Section 4.5.

4.3 Choice of a threshold

A common challenge in extreme value theory is the selection of an appropriate threshold. In our case, this corresponds to the choice of the parameter k in Definition 7. There exists a trade-off between bias and variance; smaller values of k lead to reduced bias (but increased variance), and larger values of k result in higher bias (but lower variance). Unfortunately, there is no universally applicable method for threshold selection.

To provide insight into this behavior, consider Model 2 with $n=1000$ and $\delta = 0.5$. Figure 3 illustrates the estimators $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(2)$ and $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}(2)$ using various values of k. The variance of $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}(2)$ for small k is very large. Conversely, as k increases, the bias of $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(2)$ will increase.

Based on this example and several others, the choice $k=\sqrt{n}$ appears to be a reasonable option. It is crucial to emphasize that while this choice may be pragmatic, it might not necessarily be optimal.

By visually evaluating Fig. 3, it is possible to infer the values of $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(2)$ and $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}(2)$. The observation seems to be that $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(2)$ approaches one as k decreases, while the same trend does not hold for $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}(2)$.

4.4 Choice of the extremal delay

To provide an example of how $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ behaves for various selections of p, we consider the following model.

Model 2

Let $(\textbf{X},\textbf{Y})^\top$ follow the $\textrm{hVAR}(6,1)$ model

$$\begin{aligned} X_t&=0.5X_{t-1}+ \varepsilon _t^X; \qquad Y_t=0.5Y_{t-1}+ 0.5 X_{t-6}+\varepsilon _t^Y, \end{aligned}$$

where $\varepsilon _t^X, \varepsilon _t^Y\overset{\text {iid}}{\sim }\textrm{Cauchy}$.

Notice that the minimal delay is equal to 6. Similar to the approach in Section 4.3, we simulate data from Model 2 with a size of $n=1000$. Subsequently, we compute $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ and $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)$ for different values of p. This process is repeated 100 times. Figure 4 shows the mean, $5\%$ and $95\%$ empirical quantiles of these estimates.

The coefficient $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ in the correct causal direction rises much faster than in the other direction until it reaches the “correct” minimal delay. Following this, the coefficient approaches 1, remaining in close proximity even for larger p, aligning with theoretical expectations. Conversely, $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)$ experiences a slower ascent, gradually converging towards 1.

Nevertheless, practical scenarios often demand the selection of a specific p. In cases involving a small number of time series, like the application in Section 5.1, it’s feasible to calculate $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ for several p choices and create visualizations, as exemplified in Fig. 8 further ahead. In instances involving a greater number of time series, such as the application in Section 5.2, a pragmatic decision for p should align with the maximum expected physical time delay within the system. The challenge of selecting a suitable time delay is not exclusive to the extreme value context and is encountered in non-extreme scenarios as well (Hacker and Hatemi-J 2008; Runge et al. 2019b).

4.5 Hidden confounder and different tail behavior of the noise

The aforementioned examples have exclusively addressed scenarios where $\textbf{X}$ causes $\textbf{Y}$. However, it is important to investigate how the methodology fares when the correlation stems from a shared confounder. Furthermore, how does it respond when the tail indexes of the noise variables differ? To address these inquiries, we intend to analyze the following time series.

Model 3

Let $(\textbf{X}, \, \textbf{Y}, \, \textbf{Z})^\top$ follow the $\textrm{VAR}(3)$ model

$$\begin{aligned} Z_t&=0.5Z_{t-1}+\varepsilon _t^Z,\\ X_t&=0.5X_{t-1}+ 0.5Z_{t-2}+ \delta _YY_{t-3}+\varepsilon _t^X,\\ Y_t&=0.5Y_{t-1}+ 0.5Z_{t-1}+ \delta _XX_{t-3}+\varepsilon _t^Y, \end{aligned}$$

with independent noise variables $\varepsilon _t^X\sim \textrm{t}_{\theta _{X}}, \varepsilon _t^Y\sim \textrm{t}_{\theta _{Y}}, \varepsilon _t^Z\sim \textrm{t}_{\theta _{Z}}$ (notation $\textrm{t}_{\theta }$ corresponds to Student’s t-distribution with $\theta$ degrees of freedom) and constants $\delta _X, \delta _Y\in \mathbb {R}$.

The process $\textbf{Z}$ represents an unobserved common cause, illustrating various scenarios: non-causal ($\delta _X=\delta _Y=0$), $\textbf{X}$ causing $\textbf{Y}$ ($\delta _X\ne 0$), $\textbf{Y}$ causing $\textbf{X}$ ($\delta _Y\ne 0$), and bidirectional causality ($\delta _X,\delta _Y\ne 0$, notation $\textbf{X}\leftrightarrow \textbf{Y}$). Additionally, the tail indexes $\theta _X$, $\theta _Y$, and $\theta _Z$ characterize the tail index of the time series. If $\theta _X$ is large, then the distribution of $\varepsilon _t^X$ is close to Gaussian. The case $\theta _X < \theta _Y$ indicates that $\varepsilon _t^X$ exhibits heavier tails than $\varepsilon _t^Y$.

Simulating time series according to Model 3 with $n=1000$, $\delta _X, \delta _Y\in \{0,0.5, 1\}$, and various tail index combinations, we compute $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}:= \hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(3)$. This procedure is repeated 500 times to compute means and empirical quantiles of the estimators. The difference $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}} - \hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}$ is also calculated alongside $95\%$ empirical quantiles. The outcomes are displayed in Table 3.

The results indicate that whenever either $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}$ or $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}$ is smaller than one, we can detect the correct causal direction. When $\textbf{X}$ does not cause $\textbf{Y}$ and vice versa, $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}$ and $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}$ are very similar and smaller than 1. When $\textbf{X}$ causes $\textbf{Y}$, $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}$ is generally much larger than $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}$. When $\textbf{X}\leftrightarrow \textbf{Y}$, both $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}$ and $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}$ closely approach 1. However, if $\varepsilon _t^Z$ exhibits heavier tails than $\varepsilon _t^X$ and $\varepsilon _t^Y$, both $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}$ and $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}$ are in close proximity to 1, regardless of the true causal relationship between $\textbf{X}$ and $\textbf{Y}$. Consequently, if $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}$ and $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}$ are close to 1, we can not distinguish whether this is due $\textbf{X}\leftrightarrow \textbf{Y}$, or due to the presence of a hidden confounder with heavier tails.

An interesting scenario arises when $\theta _X\ne \theta _Y$. Theory suggests that the method should work well as long as $\theta _X\ge \theta _Y$ (heavier tails of $\varepsilon _t^Y$) and should no longer work if $\theta _X<\theta _Y$ (a supporting argument for this claim can be found in Appendix B, Claim 1). However, it’s important to note that these results pertain to a population (limiting) context. For small sample sizes, the situation where $\theta _X<\theta _Y$ appears to “work well”, as evident from the bold values in Tables 3 and 4. There is an evident asymmetry between the cause and the effect, but it vanishes for large n. Both cases ($\theta _X<\theta _Y$ and $\theta _X>\theta _Y$) can create problems for the estimation part, and further investigation is needed to understand better the method’s behavior under different tail indices.

To summarize, the most challenging situation occurs when a heavier-tailed noise variable acts as a common cause. This situation necessitates careful consideration, particularly when both $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}},\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}\approx 1$. This ambiguity can be attributed to either true bidirectional causality or the influence of a hidden confounder.

Table 3 The estimation of $\hat{\Gamma }^{time}(3)$ is performed on time series generated from Model 3. The tail indexes $\theta _X$, $\theta _Y$, and $\theta _Z$ correspond to the tail index of the respective time series, where a large $\theta _X$ signifies lighter tails and a small $\theta _X$ denotes heavy tails in $\varepsilon _X$. Each value of $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}=\cdot \pm \cdot$ represents the mean of the estimated coefficients $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}$, along with the difference between this mean and the $95\%$ empirical quantile out of all 500 simulations

Full size table

4.6 Testing

One approach to establish a formal methodology for detecting the causal direction between two time series involves the utilization of a threshold. Specifically, we say that $\textbf{X}$ causes $\textbf{Y}$ if and only if $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\ge \tau$, where $\tau =0.9$ or 0.95. Our goal is to evaluate the hypothesis $H_0: \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p) < 1$ against the alternative $H_A: \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p) = 1$. An understanding of the distribution of $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ is imperative. However, computing this distribution lies beyond the purview of our current study. We considered estimating confidence intervals using the block bootstrap technique (Section 5.4 in Hesterberg (2014) and the accompanying code in the supplementary package). However, we opted not to incorporate this methodology within the confines of this paper, as its performance in our specific scenario proved to be suboptimal. Future research can reveal a better-working methodology. Additionally, visual aids such as Figs. 3 and 4 can serve as valuable tools for enhancing understanding. Subsequently, we will delve into the scenario wherein we infer $\textbf{X}\rightarrow \textbf{Y}$ if and only if the value of $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ meets or exceeds the designated threshold $\tau$.

In the following, we provide an insight into the process of selecting the appropriate threshold value $\tau$. This choice should depend on the sample size n and the extremal delay p. Opting for a small $\tau$ runs the risk of yielding erroneous conclusions, while selecting a large $\tau$ can curtail the statistical power. Analogous methodologies are commonly employed, particularly within the realm of extreme value theory. For instance, in contexts such as distinguishing between extremal dependence and independence (Haan and Zhou 2011). To develop an intuitive understanding, we will focus on Model 1 with $\delta =0.5$ and $\varepsilon _t^X, \varepsilon _t^Y\overset{\text {iid}}{\sim }\textrm{Pareto}(1,1)$.

We will observe two objectives as a function of $\tau$. The first is the percentage of correctly concluding that $\textbf{X}\rightarrow \textbf{Y}$ and the other is the percentage of correctly concluding that $\textbf{Y}\not \rightarrow \textbf{X}$ (that is, concluding that $\textbf{Y}$ does not cause $\textbf{X}$).

Figures 5 and 6 illustrate the sensitivity of the results to the choice of the threshold $\tau =\tau (n,p)$ across a range of (n, p) values. It is reasonable to expect that a larger value of p should correspond to a larger value of $\tau$. To ensure that $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)< \tau$ occurs in less than $5\%$ of cases, simulations suggest values of $\tau (500, 6)=0.90$ and $\tau (500, 11)=0.93$. For larger values of $p>11$, $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ and $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)$ exhibit substantial similarity.

It appears that as n increases, the accuracy of $\hat{\Gamma }^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)$ improves. The $5\%$ significance level seems consistent across all n values, with the main enhancement lying in the power to detect causality.

In conclusion, the appropriate selection of $\tau$ exhibits minimal variation with changes in n and p. In all cases, opting for $\tau \approx 0.9$ appears reasonable, unless p becomes exceedingly large.

Table 4 A comparison of four methods for the causal inference on two time series models, a simple $\textrm{VAR}(2)$ model and a more complex nonlinear model with a hidden common cause. The percentage obtained after 100 repetitions indicates how many times the estimation was correct

Full size table

4.7 Comparison with state-of-the-art methods

We conduct a comparative analysis of our method with three classical techniques found in the literature. The initial approach is the classical Granger test utilizing default parameters and a significance level of $\alpha =0.05$. The second and third approaches encompass the PCMCI and LPCMCI methods, both employing default parameters and featuring a robust partial correlation independence test. For comprehensive insights into their implementations, please refer to the supplementary materials. Our method corresponds to estimating $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(3)$ and concluding that $\textbf{X}$ causes $\textbf{Y}$ if and only if $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(3)\ge \tau$ with the choice $\tau =0.9$.

We examine two distinct models. The first is the VAR Model 1, characterized by $\delta =0.5$ and noise following a $\textrm{Pareto}(1,1)$ distribution. Notably, this noise exhibits an infinite expected value, which can potentially pose challenges for classical methodologies. The second is more complex NAAR model with a hidden confounder, as defined in the subsequent description.

Model 4

Let $(\textbf{X}, \, \textbf{Y}, \, \textbf{Z})^\top$ follow the $\textrm{hNAAR}(3,1)$ model

$$\begin{aligned} Z_t&=0.5Z_{t-1}+\varepsilon _t^Z,\\ X_t&=0.5X_{t-1}+ 0.5Z_{t-2} +\varepsilon _t^X,\\ Y_t&=0.5Y_{t-1}+ 0.5Z_{t-1}+ f_X(X_{t-3})+\varepsilon _t^Y, \end{aligned}$$

with independent noise variables $\varepsilon _t^X, \varepsilon _t^Y, 2\varepsilon _t^Z\sim \textrm{Pareto}(1,1)$ and $f_X(x) = x^{\frac{3}{4}}\mathbb {1}(x>50)$.

The function $f_X$ in Model 4 represents a case when the causality presents itself only in the tail, not in the body of the distribution.

We conduct simulations on the aforementioned time series, considering two data set sizes: $n=500$ and $n=5000$. Subsequently, employing the four methodologies, we compute the count of simulations (out of 100) where the causal relationships $\textbf{X}\rightarrow \textbf{Y}$ and $\textbf{Y}\not \rightarrow \textbf{X}$ are accurately inferred. The outcome percentages are presented in Table 4.

The results suggest that state-of-the-art methods behave well within the context of the simple model. Although these methods inherently assume finite expected values, it appears that the $\textrm{Pareto}(1,1)$ noise distribution does not introduce significant challenges. Our method demonstrates satisfactory performance but exhibits slightly reduced power when compared with other approaches.

On the other hand, state-of-the-art methods encounter difficulties when confronted with the more complex model. PCMCI and LPCMCI frequently miscalculate the directionality, erroneously estimating that $\textbf{Y}$ causes $\textbf{X}$. This misjudgment arises from the presence of a hidden common cause $\textbf{Z}$, which creates a spurious effect that even LPCMCI can not handle well. In contrast, our method successfully discerns genuine causality from causal connections attributed to a shared cause. To conclude, the outcomes presented in Table 4 imply that our method is particularly well-suited for large sample sizes, where it outperforms state-of-the-art techniques.

5 Applications

We apply our methodology to two real datasets. The first one pertains to geomagnetic storms in the field of space weather science, while the second one is related to earth science and explores causal connections between hydro-meteorological phenomena. All data and a detailed R code are available in the supplementary package.

5.1 Space weather

In the subsequent sections, we address a problem within the realm of space weather studies. The term “space weather” refers to the variable conditions on the Sun and in space that can influence the performance of technology we use on Earth. Extreme space weather could potentially cause damage to critical infrastructures – especially the electric grid. In order to protect people and systems that might be at risk from space weather effects, we need to understand the causes of space weather.^{Footnote 6}

Geomagnetic storms and substorms serve as indicators of geomagnetic activity. A substorm manifests as a sudden intensification and heightened motion of auroral arcs, potentially leading to magnetic field disturbances in the auroral zones of up to 1000 nT (Tesla units). The origin of this geomagnetic activity lies within the Sun itself. More precisely, a substantial correlation exists between this geomagnetic activity, the solar wind (a flow of negatively charged particles emitted by the Sun), and the interplanetary magnetic field (which constitutes a portion of the solar magnetic field carried outward by the solar wind).

A fundamental challenge in this field revolves around the determination and prediction of specific characteristics: the magnetic storm index (SYM) and the substorm index (AE). It may seem that AE is a driving factor (cause) of SYM because the accumulation of successive substorms usually precedes the occurrence of magnetic storms. However, a recent article (Manshour et al. 2021) suggests otherwise. It proposes that a vertical component of the interplanetary magnetic field (BZ) serves as a shared trigger for both indices. Our approach involves applying our methodology to validate this finding and ascertain whether this causal influence becomes evident in extreme scenarios.

Our dataset comprises three distinct time series (SYM, AE, BZ), encompassing approximately $100\,000$ measurements, each captured at 5-minute intervals over the entire year 2000. These data, in addition to being available in the supplementary package, can also be accessed online through the NASA website.^{Footnote 7} A visual representation of the data is presented in Fig. 7. From the nature of the data, we focus on analyzing extremes characterized by extremely small SYM, extremely large AE, and extremely small BZ (that is, we consider −SYM, $+$AE, −BZ and compare maxima). Given the inherent characteristics of the data, an appropriate delay will be smaller than $p=24$ (2 hours).

First, we discuss whether the assumptions for our method are fulfilled. We estimate the tail indexes of our data^{Footnote 8}. Results are the following: SYM has the estimated tail index $0.25 \,\,(0.015, 0.5)$, AE has $0.18 \,\,(0.08, 0.28)$ and BZ has $0.30 \,\,(0.12, 0.46)$. Therefore, the assumption of regular variation characterized by the same tail index appears reasonable. Furthermore, none of the confidence intervals encompass the zero value, although SYM comes quite close. This observation suggests that our time series can be deemed as regularly varying. We also require a stationarity assumption of our time series, which seems to hold. It is believed that our variables do not contain a seasonal pattern or a trend (at least not in a horizon of a few years). The Dickey–Fuller test also suggests stationarity, although it is recognized that tests for stationarity aren’t very dependable (Dickey 2005). In summation, all the assumptions seem to be reasonably satisfied.

Lastly, we proceed to calculate the causal tail coefficient across varying extreme delays p and a range of considered extremes k. The numerical outcomes are depicted in Fig. 8. These findings impart the following insights:

BZ and SYM have strong asymmetry ($\hat{\Gamma }_{BZ\rightarrow SYM}^{time}(p)\approx 1$, and $\hat{\Gamma }_{SYM\rightarrow BZ}^{time}(p)\ll 1$, see blue lines),
BZ and AE have an asymmetry ($\hat{\Gamma }_{BZ\rightarrow AE}^{time}(p)\approx 1$, and $\hat{\Gamma }_{SYM\rightarrow AE}^{time}(p)<1$, see green lines)
SYM and AE have no asymmetry ($\hat{\Gamma }_{AE\rightarrow SYM}^{time}(p),\hat{\Gamma }_{SYM\rightarrow AE}^{time}(p)<1$ and both $\hat{\Gamma }_{SYM\rightarrow AE}^{time}(p)$, $\hat{\Gamma }_{AE\rightarrow SYM}^{time}(p)$ are very similar, see brown lines).

These results align with the hypothesis that BZ is a common cause of SYM and AE, with no causal relation between them. Note that our methodology has the capability to accommodate scenarios involving a common cause, at least in theory. While conventional methods might suggest a causal relationship from AE to SYM, our analysis reveals instances of extreme events where AE exhibits extremity, yet SYM does not.

5.2 Hydrometeorology

Numerous significant phenomena have an impact on weather and climate, even though their importance and effects are not yet well understood. Within this application, we focus on a selection of the most intriguing phenomena to explore potential causal relationships among them. By delving into these relationships, we aim to gain insights into the intricate interplay among these variables, contributing to a deeper understanding of their influence on weather and climate dynamics.

El Niño Southern Oscillation (ENSO) is a climate pattern that describes the unusual warming of surface waters in the eastern tropical Pacific Ocean. It has an impact on ocean temperatures, the speed and strength of ocean currents, the health of coastal fisheries, and local weather from Australia to South America and beyond.
North Atlantic Oscillation (NAO) is a weather phenomenon over the North Atlantic Ocean of fluctuations in the difference of atmospheric pressure.
Indian Dipole Index (DMI) (sometimes referred to as the Dipole Mode Index) is commonly measured by an index describing the difference between sea surface temperature anomalies in two specific regions of the tropical Indian Ocean.
Pacific Decadal Oscillation (PDO) is a recurring pattern of ocean–atmosphere climate variability centred over the mid-latitude Pacific basin.
East Asian Summer Monsoon Index (EASMI) is defined as an area-averaged seasonally dynamically normalized seasonality at 850 hPa within the East Asian monsoon domain.
Amount of rainfall in a region in India (AoR) is a data set consisting of the amount of rainfall in the region in the centre of India.

For all six variables, we possess monthly measurements starting from 1.1.1953. Following preliminary data manipulation, such as mitigating seasonality and ensuring data stationarity (for comprehensive specifics, please refer to the supplementary package), we turn our attention to assessing the assumption of heavy-tailness.

It seems that the assumption of identical tail indexes across all time series does not hold. Employing the same methodology as in the preceding subsection, we calculate the estimated tail indexes along with their corresponding confidence intervals, as presented in Table 5. It is evident that ENSO and AoR exhibit notably larger tail indexes in comparison to the other variables. On the other hand, variables NAO, DMI, PDO, and EASMI appear to share similar tail indexes (with NAO possibly having a slightly smaller tail index). Still, we will proceed keeping in mind that these relations between the first and second group may be misleading.

Finally, we proceed to compute the causal tail coefficients for each pair of time series. In this application, we conclude that $\textbf{X}$ causes $\textbf{Y}$ if and only if both $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}(p)>0.9$ and $\hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}(p)<0.8$. This selection is motivated by the outcomes discussed in Section 4.5. Simulation results indicate that if both $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}(p), \hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}(p)$ are close to 1, we can not distinguish between the case when $\textbf{X}\leftrightarrow \textbf{Y}$ and when there exists a confounder with heavier tails causing both $\textbf{X}$ and $\textbf{Y}$. In this application, the absence of a confounder with heavier tails cannot be assumed, given the distinct tail indexes observed in the relevant time series. Hence, if both $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}(p), \hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}(p)$ are large, we can not conclude any causal relation. Therefore, we are only interested in asymmetric extremal behavior. We chose the threshold 0.9 with the same reasoning discussed in Section 4.6. A similar line of thought guides the choice of the threshold 0.8: a suitable threshold ought to reside within the interval (0.7, 0.9), given that most coefficients $\hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}(p)$ from simulations (and the results of this application in Table 6) lie within this interval. Furthermore, the results remain the same if we chose any threshold in the interval (0.77, 0.86) since no arrows in Fig. 9 would change. Consequently, the chosen threshold logically fits within this gap in the computed coefficients.

We opt to use an extremal delay of $p=12$, equivalent to one year, for all pairs in our analysis.^{Footnote 9} The resulting causal relationships are outlined in Table 6. Subsequently, upon constructing the corresponding graph, we identify specific extremal causal relationships, as depicted in Fig. 9. Here, directed arrows indicate the pairs with an apparent asymmetry in the corresponding $\hat{\Gamma }^{time}(p)$ coefficients (bold numbers in Table 6). The red arrows correspond to a pair of time series with significantly different tail indexes.

Concluding that the discovered causal relations are surely true would be a very bold statement. These highly complex datasets may not conform to a straightforward time series model. Their tail indexes are also different. Additionally, the absence of certain causal relationships doesn’t imply their nonexistence; for instance, a causal connection might exist without exerting significant influence on extreme events. while this methodology doesn’t offer definitive conclusions, it does identify asymmetries in extreme behavior and provides a preliminary notion of directions that might warrant further exploration.

Table 5 Estimations of the tail indexes and their $95\%$ confidence intervals

Full size table

Table 6 Estimation of the causal tail coefficient for time series for each pair of variables. Concerning the order of subscripts, the column comes first, then the row. For example, $\hat{\Gamma }_{DMI\rightarrow NAO}^{time}(p)=0.89$ and $\hat{\Gamma }_{NAO\rightarrow DMI}^{time}(p)=0.83$

Full size table

As a means of comparison with existing research in the field of earth science, we offer a selection of pertinent results and articles that tend to concur with our findings. It’s important to note that this list is not exhaustive.

ENSO $\rightarrow$ EASMI (Pothapakula et al. 2020) (using information theory approach), (Chan and Zhou 2005),
ENSO $\rightarrow$ AoR (Sarkar et al. 2004) (although rigorous causal methodology was not used),
ENSO $\rightarrow$ NAO (Mokhov and Smirnov 2006) (using a phase dynamics modeling which can give different results than looking at extremes),
DMI $\rightarrow$ ENSO (Le et al. 2020),
DMI $\rightarrow$ EASMI (Pothapakula et al. 2020) (using information theory approach),
PDO $\rightarrow$ AoR (Sarkar et al. 2004) (although rigorous causal methodology was not used),
PDO $\rightarrow$ EASMI (Chan and Zhou 2005) (although rigorous causal methodology was not used),
NAO is known to influence air temperature and precipitation in Europe, Northern America and a part of Asia (Wang et al. 2019). However, whether there are relations between NAO and other investigated variables is yet not known.

Our findings align with the majority of results found in the literature. However, there are three exceptions worth noting: our results suggest DMI $\rightarrow$ PDO (we couldn’t find any reason for this causal relation in the literature). Also, our results do not suggest DMI $\rightarrow$ ENSO and ENSO $\rightarrow$ NAO, although these causal relations seem to hold (Mokhov and Smirnov 2006; Le et al. 2020).

6 Conclusion

Causal inference in time series presents a recurring challenge within the literature. In numerous real-world scenarios, the causal mechanisms become apparent during extreme periods rather than within the bulk of the distribution. We introduce an innovative method that leverages extreme events for detecting causal relationships. This method is formalized within a mathematical framework, and several of its properties are systematically demonstrated under specific model assumptions.

This work sheds light on some connections between causality and extremes. Many scientific disciplines use causal inference as a baseline of their work. A method that can detect a causal direction in complex heavy-tailed datasets can be very useful in such domains.

This subject offers an extensive array of prospects for future research. For instance, what is the underlying distribution of $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$? Could we devise a better (and consistent) statistic that mitigates the negative bias inherent in $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$? Is there a means to establish a rigorous testing methodology for the hypothesis $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p) = 1$? Furthermore, can we incorporate observed covariates into the causal inference between $\textbf{X}$ and $\textbf{Y}$, moving beyond reliance solely on pairwise discovery? Does this method retain its efficacy even when applied to light-tailed time series? Can a similar approach be adapted to a more generalized framework? These questions can lead to potential future research and important results.

Data availability

R code and the data are available in an online repository, or at a request to an author.

Notes

A stochastic process is strictly stationary if the joint distributions of n consecutive variables are time-invariant (e.g., Section 2.1.3 in Lütkepohl (2005)). We will not work with other stationarity types.
$\varepsilon _t^X, \varepsilon _t^Y$ are iid (independent and identically distributed), following a Pareto distribution with parameters equal to 1. The distribution function of a Pareto(a, b) random variable is in the form $F(x)=1-(\frac{a}{x})^b$ for $x\ge a$, zero otherwise. When $a=b=1$, it is often called the standard Pareto distribution.
Note the identities $\sum _{i=0}^\infty \frac{i}{2^i}=2=\sum _{i=0}^\infty \frac{1}{2^i}$.
We do not provide a rigorous proof of this equality, but such a proof follows similar steps as the proof of Theorem 1. We can use the fact that $\mathbb {P}(\varepsilon _t^Z + \varepsilon _t^X>v)\sim \mathbb {P}(\varepsilon _t^Z>v)$ as $v\rightarrow \infty$ and appropriately modify Proposition 2 such that $\varepsilon _t^Z$ is the leading term on both sides. The same results can be seen from simulations in Section 4.5.
$\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ depends on n, although we omitted this index for clarification.
Text taken from a webpage https://www.ready.gov/space-weather, accessed 18.5.2021.
NASA webpage https://cdaweb.gsfc.nasa.gov, accessed 18.5.2021.
We employ the HTailIndex function from the ExtremeRisks package in R with variable $k=500$ (Padoan and Stupfler 2020). The confidence intervals are computed using the normal approximation from Theorem 3.2.5 in Haan and Ferreira (2006) and page 1288 in Drees (2000). Details can be found in the corresponding R documentation.
In words, we answer the following question: When an extreme event happens in one time series, will an extreme event happen in the next 12 months in the other time series? And is this not valid for the opposite direction? We answer yes if $\hat{\Gamma }_{\textbf{X}\rightarrow \textbf{Y}}^{time}>0.9$ and $\hat{\Gamma }_{\textbf{Y}\rightarrow \textbf{X}}^{time}<0.8$.
$1-\sqrt{1-\delta }$ is a solution of $1-(1-\zeta )(1-\zeta )=\delta$. When $\delta \rightarrow 0$ then also $\zeta \rightarrow 0$.
This is possible from the assumptions on continuity and the limit behavior of f.
This also implies that they satisfy the tail balance condition that is defined in Jessen and Mikosch (2006)

References

Andel, J.: On nonlinear models for time series. Statistics 20(4), 615–632 (1989). https://doi.org/10.1080/02331888908802217
Article MathSciNet Google Scholar
Berzuini, C., Dawid, P., Bernardinell, L. (eds).: Causality: Statistical Perspectives and Applications. John Wiley & Sons (2012)
Bhattacharya, R.N., Lee, C.: Ergodicity of nonlinear first order autoregressive models. J. Theor. Probab. 8(1), 207–219 (1995). https://doi.org/10.1007/BF02213462
Article MathSciNet Google Scholar
Bingham, N.H., Goldie, C.M., Omey, E.: Regularly varying probability densities. Publications de l’Institut Mathematique 80(94), 47–57 (2006). https://doi.org/10.2298/PIM0694047B
Article MathSciNet Google Scholar
Breiman, L.: On some limit theorems similar to the arc-sin law. Theory of Probability and its Applications 10(2), 323–331 (1965)
Article MathSciNet Google Scholar
Buraczewski, D., Damek, E., Mikosch, T.: Stochastic Models with Power-Law Tails. Springer (2016). https://doi.org/10.1007/978-3-319-29679-1
Article Google Scholar
Chan, J., Zhou, W.: PDO, ENSO and the early summer monsoon rainfall over south China. Geophys. Res. Lett. 32(8) (2005). https://doi.org/10.1029/2004GL022015
Chen, L., Wu, W.B.: Concentration inequalities for empirical processes of linear time series. J. Mach. Learn. Res. 18(231):1–46 (2018). http://jmlr.org/papers/v18/17-012.html
Cheong, J.H.: Four ways to quantify synchrony between time series data (2020). https://doi.org/10.17605/OSF.IO/BA3NY
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, New York. (2001). https://doi.org/10.1007/978-1-4471-3675-0
Article Google Scholar
Cox, D.R., Wermuth, N.: Multivariate Dependencies: Models. Chapman and Hall/CRC, London, Analysis and Interpretation (1996). https://doi.org/10.1201/9781498710398
Book Google Scholar
Davis, R.A., Mikosch, T.: The extremogram: A correlogram for extreme events. Bernoulli 15(4), 977–1009 (2009). https://doi.org/10.3150/09-bej213
Article MathSciNet Google Scholar
Deuber, D., Li, J., Engelke, S., et al.: Estimation and inference of extremal quantile treatment effects for heavy-tailed distributions. arXiv:2110.06627 [stat.ME] (2021). https://doi.org/10.48550/arXiv.2110.06627
Dickey, D.A.: Stationarity issues in time series models. In: SAS Conference Proceedings: SAS Users Group International 30, Philadelphia, 192-30 (2005). https://support.sas.com/en/papers/proceedings-archive/sugi2005.html
Drees, H.: Weighted approximations of tail processes for $\beta$-mixing random variables. Annals of Applied Probability 10(4), 1274–1301 (2000). https://doi.org/10.1214/aoap/1019487617
Eichler, M.: Causal inference in time series analysis. In: Berzuini C, Dawid P, Bernardinelli L (eds) Causality: Statistical Perspectives and Applications. John Wiley and Sons, Chichester, chap 22, p 327–354 (2012). https://doi.org/10.1002/9781119945710.ch22
Eichler, M., Didelez, V.: On granger causality and the effect of interventions in time series. Lifetime Data Anal. 16, 3–32 (2010). https://doi.org/10.1007/s10985-009-9143-3
Article MathSciNet Google Scholar
Embrechts, P., Klüppelberg, C., Mikosch, T.: Modelling Extremal Events for Insurance and Finance. Springer, Berlin. (1997). https://doi.org/10.1007/978-3-642-33483-2
Article Google Scholar
Engelke, S., Hitz, A.S.: Graphical models for extremes. J. Roy. Stat. Soc. B 82(4), 871–932 (2020). https://doi.org/10.1111/rssb.12355
Article MathSciNet Google Scholar
Esary, J.D., Proschan, F., Walkup, D.W.: Association of random variables, with applications. Ann. Math. Stat. 38(5), 1466–1474 (1967). https://doi.org/10.1214/aoms/1177698701
Article MathSciNet Google Scholar
Gemici, E., Polat, M.: Causality-in-mean and causality-in-variance among Bitcoin, Litecoin, and Ethereum. Stud. Econ. Financ. 38(4), 861–872 (2021). https://doi.org/10.1108/SEF-07-2020-0251
Article Google Scholar
Gerhardus, A., Runge, J.: LPCMCI: Causal discovery in time series with latent confounders. EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8259 (2021). https://doi.org/10.5194/egusphere-egu21-8259
Gissibl, N., Klüppelberg, C.: Max-linear models on directed acyclic graphs. Bernoulli 24(4A), 2693–2720 (2018). https://doi.org/10.3150/17-BEJ941
Article MathSciNet Google Scholar
Glymour, C., Zhang, K., Spirtes, P.: Review of causal discovery methods based on graphical models. Front. Genet. 10 (2019). https://doi.org/10.3389/fgene.2019.00524
Gnecco, N., Meinshausen, N., Peters, J., et al.: Causal discovery in heavy-tailed models. Ann. Stat. 49(3), 1755–1778 (2021). https://doi.org/10.1214/20-AOS2021
Article MathSciNet Google Scholar
Granger, C.W.J.: Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3), 424–438 (1969). https://doi.org/10.2307/1912791
Article Google Scholar
Granger, C.W.J.: Testing for causality: A personal viewpoint. J. Econ. Dyn. Control 2, 329–352 (1980). https://doi.org/10.1016/0165-1889(80)90069-X
Article MathSciNet Google Scholar
Haan, L.D., Ferreira, A.: Extreme Value Theory: An Introduction. Springer, New York. (2006). https://doi.org/10.1007/0-387-34471-3
Article Google Scholar
Haan, L.D., Zhou, C.: Extreme residual dependence for random vectors and processes. Adv. Appl. Probab. 43(1), 217–242 (2011). https://doi.org/10.1239/aap/1300198520
Article MathSciNet Google Scholar
Hacker, R.S., Hatemi-J, A.: Optimal lag-length choice in stable and unstable VAR models under situations of homoscedasticity and ARCH. J. Appl. Stat. 35(6), 601–615 (2008). https://doi.org/10.1080/02664760801920473
Article MathSciNet Google Scholar
Hafner, C.M., Herwartz, H.: Testing for causality in variance using multivariate GARCH models. Ann. Econ. Stat. 89, 215–241 (2008). https://doi.org/10.2307/27715168
Article Google Scholar
Hesterberg, T.: What teachers should know about the bootstrap: Resampling in the undergraduate statistics curriculum. Am. Stat. 69(4), 371–386 (2014). https://doi.org/10.1080/00031305.2015.1089789
Article MathSciNet Google Scholar
Hlaváčková-Schindler, K., Paluš, M., Vejmelka, M., et al.: Causality detection based on information-theoretic approaches in time series analysis. Phys. Rep. 441(1), 1–46 (2007). https://doi.org/10.1016/j.physrep.2006.12.004
Article Google Scholar
Ho, T.: Granger causality in mean with application to time series modeling. Economet. Rev. 34(3), 320–342 (2015). https://doi.org/10.1080/07474938.2013.868788
Article Google Scholar
Hosoya, Y.: On the granger condition for non-causality. Econometrica 45(7), 1735–1736 (1977). https://doi.org/10.2307/1913472
Article Google Scholar
Imbens, G.W., Rubin, D.B.: Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, Cambridge. (2015). https://doi.org/10.1017/CBO9781139025751
Article Google Scholar
Jessen, A., Mikosch, T.: Regularly varying functions. Publications de l’Institut Mathematique 80(94), 171–192 (2006). https://doi.org/10.2298/PIM0694171J
Article MathSciNet Google Scholar
Kiriliouk, A., Naveau, P.: Climate extreme event attribution using multivariate peaks-over-thresholds modeling and counterfactual theory. Ann. Appl. Stat. 14(3), 1342–1358 (2020). https://doi.org/10.1214/20-AOAS1355
Article MathSciNet Google Scholar
Klüppelberg, C., Krali, M.: Estimating an extreme Bayesian network via scalings. J. Multivar. Anal. 181(104), 672 (2021). https://doi.org/10.1016/j.jmva.2020.104672
Article MathSciNet Google Scholar
Kontorovich, A., Weiss, R.: Uniform Chernoff and Dvoretzky-Kiefer-Wolfowitz-type inequalities for Markov chains and related processes. J. Appl. Probab. 51(4), 1100–1113 (2014). https://doi.org/10.1239/jap/1421763330
Article MathSciNet Google Scholar
Kuersteiner, G.M.: Granger-sims causality. In: Durlauf SN, Blume LE (eds) Macroeconometrics and Time Series Analysis. Palgrave Macmillan, London, p 119–134 (2010). https://doi.org/10.1057/9780230280830_14
Kulik, R., Soulier, P.: Heavy-Tailed Time Series. Springer, New York. (2020). https://doi.org/10.1007/978-1-0716-0737-4
Article Google Scholar
Le, T., Ha, K.J., Bar, D.H., et al.: Causal effects of Indian Ocean Dipole on El Niño–Southern Oscillation during 1950–2014 based on high-resolution models and reanalysis data. Environ. Res. Lett. 15(10):1040b6 (2020). https://doi.org/10.1088/1748-9326/abb96d
Lütkepohl, H.: New Introduction to Multiple Time Series Analysis. Springer, Berlin. (2005). https://doi.org/10.1007/978-3-540-27752-1
Article Google Scholar
Manshour, P., Balasis, G., Consolini, G., et al.: Causality and information transfer between the solar wind and the magnetosphere-ionosphere system. Entropy 23(4), 390 (2021). https://doi.org/10.3390/e23040390
Article MathSciNet Google Scholar
Mhalla, L., Chavez-Demoulin, V., Dupuis, D.J.: Causal mechanism of extreme river discharges in the upper Danube basin network. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 69(4), 741–764 (2020). https://doi.org/10.1111/rssc.12415
Article MathSciNet Google Scholar
Mikosch, T.: Regular variation, subexponentiality and their applications in probability theory. Int. J. Prod. Econ. (1999). https://www.eurandom.tue.nl/reports/1999/013-report.pdf
Mikosch, T., Samorodnitsky, G.: The supremum of a negative drift random walk with dependent heavy-tailed steps. Ann. Appl. Probab. 10(3), 1025–1064 (2000). https://doi.org/10.1214/aoap/1019487517
Article MathSciNet Google Scholar
Mikosch, T., Wintenberger, O.: A large deviations approach to limit theory for heavy-tailed time series. Probab. Theory Relat. Fields 166, 233–269 (2015). https://doi.org/10.1007/s00440-015-0654-4
Article MathSciNet Google Scholar
Mokhov, I., Smirnov, D.: El Niño-Southern Oscillation drives North Atlantic Oscillation as revealed with nonlinear techniques from climatic indices. Geophys. Res. Lett. 33(3) (2006). https://doi.org/10.1029/2005GL024557
Mooij, J., Peters, J., Janzing, D., et al.: Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17(1), 1103–1204 (2016)
MathSciNet Google Scholar
Naveau, P., Hannart, A., Ribes, A.: Statistical methods for extreme event attribution in climate science. Annu. Rev. Stat. Appl. 7(1), 89–110 (2020). https://doi.org/10.1146/annurev-statistics-031219-041314
Article MathSciNet Google Scholar
Padoan, S., Stupfler, G.: ExtremeRisks: Extreme Risk Measures (2020). https://CRAN.R-project.org/package=ExtremeRisks, R package version 0.0.4
Palachy, S.: Inferring causality in time series data (2019). https://towardsdatascience.com/inferring-causality-in-time-series-data-b8b75fe52c46
Pasche, O., Chavez-Demoulin, V., Davison, A.: Causal modelling of heavy-tailed variables and confounders with application to river flow. Extremes (2022). https://doi.org/10.1007/s10687-022-00456-4
Article Google Scholar
Pearl, J.: Causality: Models, Reasoning and Inference, 2nd edn. Cambridge University Press, Cambridge (2009). https://doi.org/10.1017/CBO9780511803161
Peters, J., Mooij, J., Schölkopf, B.: Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15, 2009–2053 (2014)
MathSciNet Google Scholar
Peters, J., Janzing, D., Schölkopf, B.: Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press, Cambridge (2017). http://library.oapen.org/handle/20.500.12657/26040
Pompe, B., Runge, J.: Momentary information transfer as a coupling measure of time series. Phys. Rev. E 83(051), 122 (2011). https://doi.org/10.1103/PhysRevE.83.051122
Article Google Scholar
Pothapakula, P.K., Primo, C., Sørland, S., et al.: The synergistic impact of ENSO and IOD on Indian summer monsoon rainfall in observations and climate simulations - an information theory perspective. Earth Syst. Dynam. 11(4), 903–923 (2020). https://doi.org/10.5194/esd-11-903-2020
Article Google Scholar
R Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2022). https://www.R-project.org/
Reichenbach, H.: The Direction of Time. Dover Publications (1956). https://doi.org/10.2307/2216858
Article Google Scholar
Resnick, S.I.: Extreme Values. Springer, New York, Regular Variation and Point Processes (1987). https://doi.org/10.1007/978-0-387-75953-1
Book Google Scholar
Runge, J., Bathiany, S., Bollt, E., et al.: Inferring causation from time series in Earth system sciences. Nat. Commun. 10(1):2553 (2019a). https://doi.org/10.1038/s41467-019-10105-3
Runge, J., Nowack, P., Kretschmer, M., et al.: Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 5(11):eaau4996 (2019b). https://doi.org/10.1126/sciadv.aau4996
Sarkar, S., Singh, R.P., Kafatos, M.: Further evidences for weakening relationship of Indian rainfall and ENSO over India. Geophys. Res. Lett. 31(13) (2004). https://doi.org/10.1029/2004GL020259
Sims, C.A.: Money, income, and causality. Am. Econ. Rev. 62(4):540–552 (1972). http://www.jstor.org/stable/1806097
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search. Springer, New York. (1993). https://doi.org/10.1007/978-1-4612-2748-9
Article Google Scholar
Wang, G., Zhang, N., Fan, K., et al.: Central European air temperature: driving force analysis and causal influence of NAO. Theoret. Appl. Climatol. 137, 1421–1427 (2019). https://doi.org/10.1007/s00704-018-2676-1
Article Google Scholar
White, H., Lu, X.: Granger Causality and Dynamic Structural Systems. J. Financ. Economet. 8(2), 193–243 (2010). https://doi.org/10.1093/jjfinec/nbq006
Article Google Scholar
Yang, J., Hongzhi, A.: Nonlinear autoregressive models with heavy-tailed innovation. SCIENCE CHINA Math. 48, 333–340 (2005). https://doi.org/10.1360/03za00321
Article MathSciNet Google Scholar
Zhang, J., Zhang, J.: Discovering causal relationships in gene expression data using the fci algorithm. In: Proceedings of the 8th IEEE International Conference on Bioinformatics and Bioengineering, pp 1–6 (2008). https://doi.org/10.1109/BIBE.2008.4696771

Download references

Acknowledgements

We are grateful to the editorial team and the referees, for knowledgeable comments that improved the paper. The referees spent a significant time and effort on this manuscript. We also wish to thank Valérie Chavez-Demoulin and Katerina Schindler for helpful discussions. This project was supported by the Czech Science Foundation (Project No. GA19-16066 S) and by the Czech Academy of Sciences (Praemium Academiae awarded to M. Paluš). Juraj Bodik was supported by Swiss National Science Foundation.

Funding

Open access funding provided by University of Lausanne

Author information

Authors and Affiliations

Institute of Computer Science, The Czech Academy of Sciences, Prague, Czech Republic
Juraj Bodik & Milan Paluš
Department of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic
Juraj Bodik & Zbyněk Pawlas
Université de Lausanne, HEC, Lausanne, Switzerland
Juraj Bodik

Authors

Juraj Bodik
View author publications
You can also search for this author in PubMed Google Scholar
Milan Paluš
View author publications
You can also search for this author in PubMed Google Scholar
Zbyněk Pawlas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juraj Bodik.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A. Auxiliary propositions

We repeatedly use the following property of regularly varying random variables (Breiman 1965). If $Z\sim \textrm{RV}(\theta )$, then for every $\alpha >0$ we have

$$\mathbb {P}(\alpha Z>x)\sim \alpha ^\theta \mathbb {P}(Z>x).$$

Proposition 1

Let $X,Y, (\varepsilon _i, i\in \mathbb {N})$ be independent continuous random variables with support on some neighborhood of infinity. Let $a_i, b_i\ge 0$, $i\in \mathbb {N}$ and $\lambda _1,\lambda _2\in \mathbb {R}$ be constants. Then

$$\mathbb {P}(X+Y>\lambda _1\mid a_1X+a_2Y>\lambda _2)\ge \mathbb {P}(X+Y>\lambda _1),$$

or more generally,

$$\begin{aligned} \mathbb {P}(\sum _{i=1}^{\infty }a_i\varepsilon _i>\lambda _1\mid \sum _{i=1}^{\infty }b_i\varepsilon _i>\lambda _2)\ge \mathbb {P}(\sum _{i=1}^{\infty }a_i\varepsilon _i>\lambda _1), \end{aligned}$$

provided that the sums are a.s. summable, non-trivial.

Proof

Let $\mathbf {\varepsilon }=(\varepsilon _1, \dots , \varepsilon _n)$, then for any non-decreasing functions $f,g:\mathbb {R}^n\rightarrow \mathbb {R}$ we have

$${{\,\textrm{cov}\,}}(f(\mathbf {\varepsilon }), g(\mathbf {\varepsilon }))\ge 0.$$

This is a well-known result from the theory of associated random variables, see, e.g., (Esary et al. 1967, Theorem 2.1). The functions $f(x_1,\dots , x_n)=\mathbb {1}(\sum _{i=1}^n a_ix_i>\lambda _1)$ and $g(x_1,\dots , x_n)=\mathbb {1}(\sum _{i=1}^n b_ix_i>\lambda _2)$ are non-decreasing because $a_i, b_i\ge 0$. Therefore, we obtain

$$\begin{aligned} \begin{aligned} 0&\le {{\,\textrm{cov}\,}}(f(\varepsilon ), g(\varepsilon ))\\ {}&=\mathbb {P}(\sum _{i=1}^n a_i\varepsilon _i>\lambda _1, \sum _{i=1}^n b_i\varepsilon _i>\lambda _2) - \mathbb {P}(\sum _{i=1}^n a_i\varepsilon _i>\lambda _1)\mathbb {P}(\sum _{i=1}^n b_i\varepsilon _i>\lambda _2). \end{aligned} \end{aligned}$$

Dividing both sides by $\mathbb {P}(\sum _{i=1}^n b_i\varepsilon _i>\lambda _2)$ (which is positive), we obtain the inequality

$$\begin{aligned} \mathbb {P}(\sum _{i=1}^{n}a_i\varepsilon _i>\lambda _1\mid \sum _{i=1}^{n}b_i\varepsilon _i>\lambda _2)\ge \mathbb {P}(\sum _{i=1}^{n}a_i\varepsilon _i>\lambda _1) \end{aligned}$$

for arbitrary $n\in \mathbb {N}$. The assertion of the proposition follows by taking the limits as $n \rightarrow \infty$.$\square$

Proposition 2

For $i=0,1,2,\dots ,$ let

$\varepsilon _i^X, \varepsilon _i^Y\overset{iid}{\sim }\ \textrm{RV}(\theta )$ be continuous,
$a_i, b_i, c_i\ge 0$ be constants and $\exists \delta >0:$ $\sum _{i=0}^\infty a_i^{\theta -\delta }<\infty , \sum _{i=0}^\infty b_i^{\theta -\delta }<\infty , \sum _{i=0}^\infty c_i^{\theta -\delta }<\infty$,
denote $A=\sum _{i=0}^\infty a_i^\theta , B=\sum _{i=0}^\infty b_i^\theta , C=\sum _{i=0}^\infty c_i^\theta$, for which it holds that $A,B,C\in (0, \infty )$,
let $\Phi =\{i\in \mathbb {N}\cup \{0\}: b_i>0=a_i\}$.

Then,

$$\lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^\infty a_i \varepsilon _i^X<\lambda \mid \sum _{i=0}^\infty b_i\varepsilon _i^X + \sum _{i=0}^\infty c_i\varepsilon _i^Y>u)= \mathbb {P}(\sum _{i=0}^\infty a_i \varepsilon _i^X<\lambda )\frac{C+\sum _{i\in \Phi }b_i^\theta }{C+B}$$

for all $\lambda \in \mathbb {R}$.

We consider only those $\lambda \in \mathbb {R}$ such that $\mathbb {P}(\sum _{i=0}^\infty a_i \varepsilon _i^X<\lambda )>0$, otherwise the statement is trivial. Note that the first condition in Proposition 2 implies that all $\sum _{i=0}^\infty a_i\varepsilon _i^X, \sum _{i=0}^\infty b_i\varepsilon _i^X, \sum _{i=0}^\infty c_i\varepsilon _i^Y$ are a.s. summable (Mikosch and Samorodnitsky 2000).We prove this proposition using the following series of lemmas.

Lemma 4

Let $X,Y\sim \textrm{RV}(\theta )$ be independent random variables. Then

$$\lim _{u\rightarrow \infty }\mathbb {P}(X<\lambda \mid X+Y>u)=\mathbb {P}(X<\lambda )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(Y>u)}{\mathbb {P}(Y>u)+\mathbb {P}(X>u)},$$

for every $\lambda \in \mathbb {R}$.

Lemma 5

Under the conditions from Proposition 2,

$$\lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^n a_i\varepsilon _i^X<\lambda \mid \sum _{i=0; i\notin \Phi }^n b_i\varepsilon _i^X>u)=0$$

for all $n\in \mathbb {N}$.

Lemma 6

Let $Z\sim \textrm{RV}(\theta )$ be a random variable independent of $(\varepsilon _i^X, i\in \mathbb {Z})$. Under the conditions from Proposition 2,

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^n a_i \varepsilon _i^X<\lambda \mid \sum _{i=0; i\notin \Phi }^n b_i\varepsilon _i^X + Z>u)\\ {}&= \mathbb {P}(\sum _{i=0}^n a_i \varepsilon _i^X<\lambda )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(Z>u)}{\mathbb {P}(Z>u) + \mathbb {P}(\sum _{i=0; i\notin \Phi }^n b_i\varepsilon _i^X>u)} \end{aligned} \end{aligned}$$

for all $n\in \mathbb {N}$.

Proof of Lemma 4

Using Bayes theorem, we obtain

$$\begin{aligned} \begin{aligned} \mathbb {P}(X<\lambda \, | \, X+Y>u)&=\mathbb {P}(X+Y>u \, | \, X<\lambda )\frac{\mathbb {P}(X<\lambda )}{\mathbb {P}(X+Y>u)}. \end{aligned} \end{aligned}$$

For the denominator we use the sum-equivalence $\mathbb {P}(X+Y>u)\sim \mathbb {P}(X>u)+\mathbb {P}(Y>u)$. Therefore, it is sufficient to show that $\mathbb {P}(X+Y>u\mid X<\lambda )\sim \mathbb {P}(Y>u)$.

Now, let W be a random variable independent of Y with a distribution satisfying $\mathbb {P}(W\le t)=\mathbb {P}(X\le t\mid X<\lambda )$ for all $t\in \mathbb {R}$. Then, $\mathbb {P}(X+Y>u\mid X<\lambda )=\mathbb {P}(W+Y>u)$. We obviously have $\lim _{u\rightarrow \infty }\frac{\mathbb {P}(W>u)}{\mathbb {P}(Y>u)}=0$ and we can use, e.g., Theorem 2.1 in Bingham et al. (2006) to obtain $\lim _{u\rightarrow \infty }\frac{\mathbb {P}(X+Y>u\mid X<\lambda )}{\mathbb {P}(Y>u)}=\lim _{u\rightarrow \infty }\frac{\mathbb {P}(Y+W>u)}{\mathbb {P}(Y>u)}=1$. Therefore $\lim _{u\rightarrow \infty }\mathbb {P}(X+Y>u\mid X<\lambda ) = \lim _{u\rightarrow \infty }\mathbb {P}(Y>u)$, which we wanted to prove.$\square$

Proof of Lemma 5

Without loss of generality $\Phi =\emptyset$, otherwise we have only lower n. Denote $\omega =\min _{i\le n}a_i$, it holds that $\omega >0$. In this proof only, we denote $B=\sum _{i=0}^nb_i$, and $A=\sum _{i=0}^na_i$. The following events relation are valid:

$$\begin{aligned} \begin{aligned} \{\sum _{i=0}^n a_i\varepsilon _i^X<\lambda ;\sum _{i=0}^n b_i\varepsilon _i^X>u\}&\subseteq \{\exists j\le n: \varepsilon _j^X>\frac{u}{B}, \sum _{i=0}^n a_i\varepsilon _i^X<\lambda \}\\ {}&\subseteq \{\exists i,j\le n: \varepsilon _j^X>\frac{u}{B}, \varepsilon _i^X<\frac{\lambda -\frac{\omega u}{B}}{A}\}. \end{aligned} \end{aligned}$$

(Simply put, there needs to be one large and one small $\varepsilon ^X_\cdot$). Therefore, we can rewrite

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^n a_i\varepsilon _i^X<\lambda \mid \sum _{i=0; i\notin \Phi }^n b_i\varepsilon _i^X>u)\\ {}&=\lim _{u\rightarrow \infty }\frac{\mathbb {P}(\sum _{i=0}^n a_i\varepsilon _i^X<\lambda ;\sum _{i=0}^n b_i\varepsilon _i^X>u)}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u)}\\ {}&\le \lim _{u\rightarrow \infty } \frac{\mathbb {P}(\exists i,j\le n: \varepsilon _i^X<\frac{\lambda -\frac{\omega u}{B}}{A}, \varepsilon _j^X>\frac{u}{B}\})}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u)}\\ {}&\le \lim _{u\rightarrow \infty } \frac{n(n+1)\mathbb {P}(\varepsilon _1^X<\frac{\lambda -\frac{\omega u}{B}}{A}, \varepsilon _2^X>\frac{u}{B})}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u)}\\ {}&=n(n+1) \lim _{u\rightarrow \infty } \frac{\mathbb {P}(\varepsilon _1^X<\frac{\lambda -\frac{\omega u}{B}}{A})\mathbb {P}(\varepsilon _2^X>\frac{u}{B})}{\sum _{i=0}^nb_i^\theta \cdot \mathbb {P}(\varepsilon _2^X>u)}\\ {}&=n(n+1) \lim _{u\rightarrow \infty } \mathbb {P}(\varepsilon _1^X<\frac{\lambda -\frac{\omega u}{B}}{A})\frac{B^\theta }{\sum _{i=0}^nb_i^\theta }=0. \end{aligned} \end{aligned}$$

$\square$

Proof of Lemma 6

Without loss of generality $\Phi =\emptyset$, otherwise we have only lower n. The proof is very similar to that of Proposition 1. In this proof only, we denote $B=\sum _{i=0}^nb_i$, and $A=\sum _{i=0}^na_i$.

Let W be a random variable independent of Z with a distribution satisfying $\mathbb {P}(W\le t)=\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X\le t\mid \sum _{i=0}^n a_i\varepsilon _i^X<\lambda )$ for all $t\in \mathbb {R}$. Then, $\mathbb {P}(X+Y>u\mid X<\lambda )=\mathbb {P}(W+Y>u)$. Using Bayes theorem, we have

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^n a_i \varepsilon _i^X<\lambda \mid \sum _{i=0}^n b_i\varepsilon _i^X + Z>u) \\&=\lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^n b_i \varepsilon _i^X+Z>u\mid \sum _{i=0}^n a_i\varepsilon _i^X<\lambda )\frac{\mathbb {P}(\sum _{i=0}^n a_i \varepsilon _i^X<\lambda )}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X + Z>u)}\\&=\mathbb {P}(\sum _{i=0}^n a_i \varepsilon _i^X<\lambda )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(W+Z>u)}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X+Z>u)}\\&=\mathbb {P}(\sum _{i=0}^n a_i \varepsilon _i^X<\lambda )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(W>u)+\mathbb {P}(Z>u)}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u)+\mathbb {P}(Z>u)}. \end{aligned} \end{aligned}$$

In the last equality, we used the fact that W does not have a heavier tail than Z and therefore we can use the sum-equivalence.

All we need to prove is that $\lim _{u\rightarrow \infty }\frac{\mathbb {P}(W>u)}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u)}=0$. Again, using Bayes theorem, we obtain

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty }\frac{\mathbb {P}(W>u)}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u)}= \lim _{u\rightarrow \infty }\frac{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u\mid \sum _{i=0}^n a_i\varepsilon _i^X<\lambda )}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u)}\\ {}&=\lim _{u\rightarrow \infty }\frac{\mathbb {P}(\sum _{i=0}^n a_i\varepsilon _i^X<\lambda \mid \sum _{i=0}^n b_i\varepsilon _i^X>u) \frac{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u)}{\mathbb {P}(\sum _{i=0}^n a_i\varepsilon _i^X<\lambda )}}{\mathbb {P}(\sum _{i=0}^n b_i\varepsilon _i^X>u)}\\ {}&=\frac{1}{\mathbb {P}(\sum _{i=0}^n a_i\varepsilon _i^X<\lambda )} \lim _{u\rightarrow \infty } \mathbb {P}(\sum _{i=0}^n a_i\varepsilon _i^X<\lambda \mid \sum _{i=0}^n b_i\varepsilon _i^X>u). \end{aligned} \end{aligned}$$

The rest follows from Lemma 5.$\square$

Proof of Proposition 2

Let $\delta >0$, define $\zeta =1-\sqrt{1-\delta }>0$^{Footnote 10} and choose large $n_0\in \mathbb {N}$ such that the following conditions hold:

$\mathbb {P}(|\sum _{i=n_0+1}^\infty a_i\varepsilon _i^X|>\delta )<\delta$,
$\frac{\sum _{i=0}^{n_0} b_i^\theta + C}{\sum _{i=0}^\infty b_i^\theta + C}>1-\zeta$,
$\mathbb {P}(|\sum _{i=n_0+1; i\notin \Phi }^\infty b_i\varepsilon _i^X| <\delta )>1-\zeta$.

Denote

$E=\sum _{i=0}^{n_0} a_i\varepsilon _i^X, F=\sum _{i=n_0+1}^\infty a_i\varepsilon _i^X$,
$G=\sum _{i=0; i\notin \Phi }^{n_0} b_i\varepsilon _i^X, H=\sum _{i=n_0+1; i\notin \Phi }^{\infty } b_i\varepsilon _i^X$,
$Z=\sum _{i=0}^\infty c_i\varepsilon _i^Y + \sum _{i\in \Phi } b_i\varepsilon _i^X$.

Then, E, F, Z and also G, H, Z are pairwise independent. Sum-max equivalence gives us $\mathbb {P}(Z>u)\sim [\sum _{i=0}^\infty c_i^\theta + \sum _{i\in \Phi } b_i^\theta ] \mathbb {P}(\varepsilon _1^X>u)$ and $\mathbb {P}(G+H+Z>u)\sim [\sum _{i=0}^\infty c_i^\theta + \sum _{i=0}^\infty b_i^\theta ] \mathbb {P}(\varepsilon _1^X>u)$. Hence, with our notation, we want to prove that

$$\begin{aligned} \begin{aligned} \lim _{u\rightarrow \infty }&\mathbb {P}(E+F<\lambda \mid G+H+Z>u) \\&= \mathbb {P}(E+F<\lambda )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(Z>u)}{\mathbb {P}(G+H>u)+\mathbb {P}(Z>u)}. \end{aligned} \end{aligned}$$

First, due to Lemma 4,

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty } \mathbb {P}(H>\delta \mid H+(G+Z)>u)=1-\lim _{u\rightarrow \infty }\mathbb {P}(H\le \delta \mid H+(G+Z)>u)\\ {}&=1-\mathbb {P}(H\le \delta )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(G+Z>u)}{\mathbb {P}(G+Z+H>u)}=1-\mathbb {P}(H\le \delta )\frac{\sum _{i=0}^{n_0} b_i^\theta + C}{\sum _{i=0}^\infty b_i^\theta + C}\\ {}&<1-(1-\zeta )(1-\zeta )=\delta . \end{aligned} \end{aligned}$$

Second, using previous results and independence of F and (G, Z), we obtain

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty } \mathbb {P}(F>\delta \mid G+H+Z>u)\\ {}&=\lim _{u\rightarrow \infty }\frac{\mathbb {P}(F>\delta , G+H+Z>u, H>\delta )}{\mathbb {P}(G+H+Z>u)} +\frac{\mathbb {P}(F>\delta , G+H+Z>u, H\le \delta )}{\mathbb {P}(G+H+Z>u)} \\&\le \lim _{u\rightarrow \infty } \frac{\mathbb {P}(G+H+Z>u, H>\delta )}{\mathbb {P}(G+H+Z>u)} +\frac{\mathbb {P}(F>\delta , G+Z>u-\delta )}{\mathbb {P}(G+H+Z>u)} \\&= \lim _{u\rightarrow \infty } \mathbb {P}(H>\delta \mid G+H+Z>u) + \frac{\mathbb {P}(F>\delta )\mathbb {P}(G+Z>u-\delta )}{\mathbb {P}(G+H+Z>u)}\\ {}&<\delta + \mathbb {P}(F>\delta )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(G+Z>u)}{\mathbb {P}(G+Z>u)+\mathbb {P}(H>u)}\le \delta + \mathbb {P}(F>\delta ) <2\delta . \end{aligned} \end{aligned}$$

Finally, we obtain (the first inequality is trivial; the second uses the identity $\mathbb {P}(A\cap B)\ge \mathbb {P}(A)-\mathbb {P}(B^c)$; the third uses the previous result; the equality follows from Lemma 6; the next two inequalities follow from the sum-equivalence and trivial $\mathbb {P}(H>u)\ge 0$; the next is trivial and the last inequality follows from $\mathbb {P}(F+\delta>0)>1-\delta$ and independence of E, F):

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty } \mathbb {P}(E+F<\lambda \mid G+H+Z>u)\\ {}&\ge \lim _{u\rightarrow \infty } \mathbb {P}(E+\delta<\lambda ; F\le \delta \mid G+H+Z>u)\\&\ge \lim _{u\rightarrow \infty } \mathbb {P}(E+\delta<\lambda \mid G+H+Z>u) - \mathbb {P}(F>\delta \mid G+H+Z>u)\\ {}&\ge \lim _{u\rightarrow \infty } \mathbb {P}(E<\lambda -\delta \mid G+(H+Z)>u) -2\delta \\&= \mathbb {P}(E<\lambda -\delta )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(H+Z>u)}{\mathbb {P}(H+Z>u) + \mathbb {P}(G>u)}-2\delta \\ {}&\ge \mathbb {P}(E<\lambda -\delta )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(H>u)+\mathbb {P}(Z>u)}{\mathbb {P}(Z>u) + \mathbb {P}(H>u)+\mathbb {P}(G>u)}-2\delta \\ {}&\ge \mathbb {P}(E<\lambda -\delta )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(Z>u)}{\mathbb {P}(Z>u) + \mathbb {P}(G+H>u)}-2\delta \\ {}&\ge \mathbb {P}(E+(F+\delta )<\lambda - \delta ; (F+\delta )> 0)\lim _{u\rightarrow \infty }\frac{\mathbb {P}(Z>u)}{\mathbb {P}(Z>u) + \mathbb {P}(G+H>u)}-2\delta \\ {}&\ge (1-\delta )\mathbb {P}(E+F+\delta <\lambda - \delta )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(Z>u)}{\mathbb {P}(Z>u) + \mathbb {P}(G+H>u)}-2\delta . \end{aligned} \end{aligned}$$

When we send $\delta \rightarrow 0$, we finally obtain

$$\begin{aligned} \begin{aligned}&\mathbb {P}(E+F<\lambda \mid G+H+Z>u)\\ {}&\ge \mathbb {P}(E+F<\lambda )\lim _{u\rightarrow \infty }\frac{\mathbb {P}(Z>u)}{\mathbb {P}(Z>u) + \mathbb {P}(G+H>u)}, \end{aligned} \end{aligned}$$

which we wanted to show. The inequality in the opposite direction can be done analogously.$\square$

Consequence 2

Under the conditions from Proposition 2,

$$\lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^\infty a_i | \varepsilon _i^X|<\lambda \mid \sum _{i=0}^\infty b_i\varepsilon _i^X + \sum _{i=0}^\infty c_i\varepsilon _i^Y>u)= \mathbb {P}(\sum _{i=0}^\infty a_i |\varepsilon _i^X|<\lambda )\frac{C+\sum _{i\in \Phi }b_i^\theta }{C+B}.$$

Proof

The proof is analogous as that of Proposition 2. Modified Lemma 4 and Lemma 6 are still valid, just with $|\varepsilon _i^X|$ instead of $\varepsilon _i^X$ in the equations. Modification for Lemma 5 is trivial, because

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^n a_i|\varepsilon _i^X|<\lambda \mid \sum _{i=0; i\notin \Phi }^n b_i\varepsilon _i^X>u)\\ {}&\le \lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^n a_i\varepsilon _i^X<\lambda \mid \sum _{i=0; i\notin \Phi }^n b_i\varepsilon _i^X>u)=0. \end{aligned} \end{aligned}$$

The limiting argument for $n\rightarrow \infty$ remains the same.$\square$

Proposition 3

Let $(\textbf{X},\textbf{Y})^\top$ follow the $\textrm{NAAR}(q)$ model, specified by

$$\begin{aligned} X_t&=f(X_{t-1}) + \varepsilon _t^X,\\ Y_t&=g_{1}(Y_{t-1}) + g_{2}(X_{t-q}) + \varepsilon _t^Y, \end{aligned}$$

where $f, g_1, g_2$ are continuous and satisfy $\lim _{x\rightarrow \infty }h(x)=\infty$ and $\lim _{x\rightarrow \infty }\frac{h(x)}{x}<1$, $h=f, g_1, g_2$. Moreover, let $\varepsilon , \varepsilon _t^X, \varepsilon _t^Y\overset{\text {iid}}{\sim } \textrm{RV}(\theta )$ be non-negative. If $(\textbf{X},\textbf{Y})^\top$ is stationary, then

$$\begin{aligned} \lim _{u\rightarrow \infty }\frac{\mathbb {P}(Y_t>u)}{\mathbb {P}(\varepsilon >u)}<\infty . \end{aligned}$$

Lemma 7

Under the assumptions from Proposition 3,

$$\begin{aligned} \lim _{u\rightarrow \infty }\frac{\mathbb {P}(X_t>u)}{\mathbb {P}(\varepsilon >u)}<\infty . \end{aligned}$$

Proof of Lemma 7

Let $c=\lim _{x\rightarrow \infty }\frac{f(x)}{x}\in [0,1)$. First, notice that

$$\begin{aligned} \lim _{u\rightarrow \infty }\frac{\mathbb {P}(f(X_t)>u)}{\mathbb {P}(X_t>u)}= c^\theta . \end{aligned}$$

Compute

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty }\frac{\mathbb {P}(X_t>u)}{\mathbb {P}(\varepsilon>u)}=\lim _{u\rightarrow \infty }\frac{\mathbb {P}(f(X_{t-1})+\varepsilon _t^X>u)}{\mathbb {P}(\varepsilon>u)} \\ {}&=1+\lim _{u\rightarrow \infty }\frac{\mathbb {P}(f(X_{t-1})>u)}{\mathbb {P}(\varepsilon>u)}\le 1+c^\theta \lim _{u\rightarrow \infty }\frac{\mathbb {P}(X_{t-1}>u)}{\mathbb {P}(\varepsilon>u)}\\ {}&= 1+c^\theta \lim _{u\rightarrow \infty }\frac{\mathbb {P}(X_t>u)}{\mathbb {P}(\varepsilon >u)}. \end{aligned} \end{aligned}$$

We have used the sum-equivalence, independence and the previous equation. Therefore, we have $\lim _{u\rightarrow \infty }\frac{\mathbb {P}(X_t>u)}{\mathbb {P}(\varepsilon >u)} \le \frac{1}{1-c^\theta }<\infty$.$\square$

Proof of Proposition 3

Find $c<1, K\in \mathbb {R}$ such that for all $x>0$ we have

$$\begin{aligned} f(x)<K+cx, \; g_1(x)<K+cx, \; g_2(x)<K+cx. \end{aligned}$$

Note that $f(x+y)\le (K+cx)+(K+cy)$. Then, the following holds a.s.

$$\begin{aligned} \begin{aligned} Y_0&=\varepsilon _0^Y+g_2(X_{-q}) + g_1(Y_{-1})\le \varepsilon _0^Y+g_2(X_{-q}) + K + cY_{-1}\\ {}&\le \varepsilon _0^Y+g_2(X_{-q}) + K + c(\varepsilon _{-1}^Y+g_2(X_{-q-1}) + K + cY_{-2})\\ {}&\le (\varepsilon _0^Y + c\varepsilon _{-1}^Y + c^2\varepsilon _{-2}^Y+\dots ) \\ {}&\quad +( g_2(X_{-q})+cg_2(X_{-q-1}) +c^2g_2(X_{-q-2})+\dots ) +(K+cK+c^2K+\dots )\\ {}&=\sum _{i=0}^\infty c^i\varepsilon _{-i}^Y+\sum _{i=0}^\infty c^iK + \sum _{i=0}^\infty c^ig_2(X_{q-i})\le \sum _{i=0}^\infty c^i\varepsilon _{-i}^Y + \frac{2K}{1-c} + \sum _{i=0}^\infty c^{i+1}X_{q-i}. \end{aligned} \end{aligned}$$

Finally, because $X_i$ and $\varepsilon _j^Y$ are all independent,

$$\begin{aligned} \begin{aligned}&\lim _{u\rightarrow \infty }\frac{\mathbb {P}(Y_t>u)}{\mathbb {P}(\varepsilon>u)}\le \lim _{u\rightarrow \infty }\frac{\mathbb {P}(\sum _{i=0}^\infty c^i\varepsilon _{-i}^Y + \frac{2K}{1-c} + \sum _{i=0}^\infty c^{i+1}X_{q-i}>u)}{\mathbb {P}(\varepsilon>u)}\\ {}&=\lim _{u\rightarrow \infty }\frac{\mathbb {P}(\sum _{i=0}^\infty c^i\varepsilon _{-i}^Y>u) + \mathbb {P}(\sum _{i=0}^\infty c^{i+1}X_{q-i}>u)}{\mathbb {P}(\varepsilon>u)}\\ {}&=\sum _{i=0}^\infty c^{i\theta } + \lim _{u\rightarrow \infty }\frac{\mathbb {P}(\sum _{i=0}^\infty c^{i+1}X_{q-i}>u)}{\mathbb {P}(\varepsilon >u)}<\infty , \end{aligned} \end{aligned}$$

where we used regular variation property, sum-equivalence, and Lemma 7.

Remark

We proved a stronger claim. We showed that for every $\textrm{hNAAR}(q, \theta )$ model, there exist a stable $\textrm{VAR}(q)$ sequence which is a.s. larger. Note that the $\textrm{VAR}(q)$ process defined by

$$\begin{aligned} X_t&=aX_{t-1}+ \varepsilon _t^X; \,\,\,\,\,\,\, Y_t=bY_{t-1}+dX_{t-q}+ \varepsilon _t^Y, \end{aligned}$$

with $0<a,b,d<1$, is stable.

B. Proofs

Observation

Let X, Y be continuous random variables with support on some neighborhood of infinity, and $F_X, F_Y$ their distribution functions. Then,

$$\lim _{u\rightarrow 1^-}{{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(Y)\mid F_X(X)>u]=1$$

if and only if $\lim _{v\rightarrow \infty }\mathbb {P}(Y>\lambda \mid X>v)=1$ for every $\lambda \in \mathbb {R}$.

Theorem 1

Let $(\textbf{X},\textbf{Y})^\top$ be a bivariate time series which follows either the $\textrm{hVAR}(q, \theta )$ model or the $\textrm{hNAAR}(q, \theta )$ model. If $\textbf{X}$ causes $\textbf{Y}$, then $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(q)=1$.

Proof for $\textrm{hVAR}(q, \theta )$ model

Since $\textbf{X}$ causes $\textbf{Y}$, we get $\delta _r>0$ for some $r\le q$. Then

$$\begin{aligned} \begin{aligned} \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(q)&=\lim _{u\rightarrow 1^-}{{\,\mathrm{\mathbb {E}\,}\,}}[\max \{F_Y(Y_0), \dots , F_Y(Y_q)\}\mid F_X(X_0)>u]\\&\ge \lim _{u\rightarrow 1^-}{{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(Y_r)\mid F_X(X_0)>u] =\lim _{v\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(Y_r)\mid X_0>v]. \end{aligned} \end{aligned}$$

Now, if we prove that $\lim _{v\rightarrow \infty }\mathbb {P}(Y_r>\lambda \mid X_0>v)=1$ for all $\lambda \in \mathbb {R}$, it will imply that $\lim _{v\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(Y_r)\mid X_0>v]=1$ (see Observation). Rewrite

$$\begin{aligned} \begin{aligned}&\mathbb {P}(Y_r>\lambda \mid X_0>v)\\ {}&=\mathbb {P}(\delta _rX_0 + \sum _{i=1}^q\beta _iY_{r-i}+\sum _{i=1; i\ne p}^q\delta _iX_{r-i} + \varepsilon _r^Y>\lambda \mid X_0>v)\\&\ge \mathbb {P}(\delta _rv + \sum _{i=1}^q\beta _iY_{r-i}+\sum _{i=1; i\ne r}^q\delta _iX_{r-i}+ \varepsilon _r^Y>\lambda \mid X_0>v). \end{aligned} \end{aligned}$$

Now, using causal representation, we can write

$$\begin{aligned} X_0&=\sum _{i=0}^\infty a_i\varepsilon ^X_{-i}+\sum _{i=0}^\infty c_i\varepsilon ^Y_{-i},\\ \sum _{i=1}^q\beta _iY_{r-i}+\sum _{i=1; i\ne r}^q\delta _iX_{r-i} + \varepsilon _r^Y&=\sum _{i=0}^\infty \phi _i\varepsilon ^X_{r-i}+\sum _{i=0}^\infty \psi _i\varepsilon ^Y_{r-i} \end{aligned}$$

for some $\phi _i, \psi _i\ge 0$.

We obtain

$$\begin{aligned} \begin{aligned}&\lim _{v\rightarrow \infty }\mathbb {P}(\delta _rv + \sum _{i=1}^q\beta _iY_{r-i}+\sum _{i=1; i\ne r}^q\delta _iX_{r-i}>\lambda \mid X_0>v)\\&=\lim _{v\rightarrow \infty }\mathbb {P}(\sum _{i=0}^\infty \phi _i\varepsilon ^X_{q-i}+\sum _{i=0}^\infty \psi _i\varepsilon ^Y_{q-i}>\lambda -\delta _rv\mid \sum _{i=0}^\infty a_i\varepsilon ^X_{-i}+\sum _{i=0}^\infty c_i\varepsilon ^Y_{-i}>v)\\&\ge \lim _{v\rightarrow \infty } \mathbb {P}(\sum _{i=0}^\infty \phi _i\varepsilon ^X_{q-i}+\sum _{i=0}^\infty \psi _i\varepsilon ^Y_{q-i}>\lambda -\delta _rv)=1, \end{aligned} \end{aligned}$$

where we used Proposition 1 in the last step. Therefore, $\lim _{v\rightarrow \infty }\mathbb {P}(Y_r>\lambda \mid X_0>v)\ge 1$, which proves the theorem.

Proof for $\textrm{hNAAR}(q, \theta )$ model

We proceed very similarly as in the proof of $\textrm{hVAR}(q, \theta )$ model. We rewrite $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(q) \ge \lim _{v\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(Y_q)\mid X_0>v]$, which is equal to 1 if $\lim _{v\rightarrow \infty }\mathbb {P}(Y_q>\lambda \mid X_0>v)=1$ for all $\lambda \in \mathbb {R}$. We rewrite

$$\begin{aligned} \lim _{v\rightarrow \infty }\mathbb {P}(Y_q>\lambda \mid X_0>v) = \lim _{v\rightarrow \infty }\mathbb {P}(g_1(Y_{q-1}) + g_2(X_0)+\varepsilon _q^Y>\lambda \mid X_0>v). \end{aligned}$$

Since $\textbf{X}$ causes $\textbf{Y}$, it holds that $g_2$ is not constant and $\lim _{x\rightarrow \infty }g_2(x)=\infty$. This implies that there exists $x_0\in \mathbb {R}$ such that $g_2(x)>\lambda$ for all $\lambda \in \mathbb {R}$. Therefore, for all $u>x_0$,

$$\mathbb {P}(g_2(X_0)>\lambda \mid X_0>v)=1.$$

Finally, we only use the fact that $\varepsilon _t^Y$ and $g_1$ are non-negative. Hence,

$$\begin{aligned} \begin{aligned}&\lim _{v\rightarrow \infty }\mathbb {P}(g_1(Y_{q-1}) + g_2(X_0)+\varepsilon _t^Y>\lambda \mid X_0>v)\\ {}&\ge \lim _{v\rightarrow \infty }\mathbb {P}(g_2(X_0)>\lambda \mid X_0>v)=1, \end{aligned} \end{aligned}$$

which we wanted to prove.$\square$

Theorem 2

Let $(\textbf{X},\textbf{Y})^\top$ be a bivariate time series which follows either the $\textrm{hVAR}(q, \theta )$ model or the $\textrm{hNAAR}(q, \theta )$ model. If $\textbf{Y}$ does not cause $\textbf{X}$, then $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)<1$ for all $p\in \mathbb {N}$.

Proof for $\textrm{hVAR}(q, \theta )$ model

Let $\lambda \in \mathbb {R}$ be such that $\mathbb {P}(X_0<\lambda )>0$. We show that

$$\lim _{v\rightarrow \infty }\mathbb {P}(\max (X_0, \dots , X_p)<\lambda \mid Y_0>v)>0,$$

from which it follows that $\lim _{v\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[\max (F_X(X_0), \dots , F_X(X_p))\mid Y_0>v]<1$.

Rewrite

$$\begin{aligned} \begin{aligned} \mathbb {P}(\max (X_0, \dots , X_p)<\lambda \, | \, Y_0>v)&= \mathbb {P}(X_0<\lambda , \dots , X_p<\lambda \, | \, Y_0>v) \\ {}&\ge \mathbb {P}(|X_0|+|X_1|+\dots + |X_p|<\lambda \, | \, Y_0>v). \end{aligned} \end{aligned}$$

Now, we use the causal representation of the time series, which, because we know that $\textbf{Y}$ does not cause $\textbf{X}$, can be written in the form

$$\begin{aligned} X_t&=\sum _{i=0}^\infty a_i\varepsilon ^X_{t-i}; \,\,\,\,\,\,\,\,\, Y_t=\sum _{i=0}^\infty b_i\varepsilon ^Y_{t-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{t-i}. \end{aligned}$$

We obtain

$$\begin{aligned} \begin{aligned}&\mathbb {P}(\sum _{t=0}^p|X_t|<\lambda \, | \, Y_0>v)= \mathbb {P}(\sum _{t=0}^p|\sum _{i=0}^\infty a_i\varepsilon ^X_{t-i}|<\lambda \, | \, Y_0>v)\\ {}&\ge \mathbb {P}(\sum _{t=0}^p\sum _{i=0}^\infty a_i|\varepsilon ^X_{t-i}|<\lambda \, | \, Y_0>v)\\ {}&= \mathbb {P}(\sum _{i=0}^\infty \phi _i|\varepsilon ^X_{p-i}|<\lambda \, | \, \sum _{i=0}^\infty b_i\varepsilon ^Y_{-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{-i}>v), \end{aligned} \end{aligned}$$

for $\phi _i=a_i+\dots + a_{i-p}$ (we define $a_j=0$ for $j<0$). Finally, it follows from the consequence of Proposition 2 that

$$\lim _{v\rightarrow \infty } \mathbb {P}(\sum _{i=0}^\infty \phi _i|\varepsilon ^X_{p-i}|<\lambda \, | \, \sum _{i=0}^\infty b_i\varepsilon ^Y_{-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{-i}>v)>0,$$

which we wanted to prove (Proposition 2 requires non-trivial sums, but if $\forall i: d_i=0$ then the series are independent and this inequality holds trivially).

Proof for $\textrm{hNAAR}(q, \theta )$ model

We have

$$\begin{aligned} X_t&=f(X_{t-1}) + \varepsilon _t^X; \,\,\,\,\,\,\, Y_t=g_1(Y_{t-1})+g_2(X_{t-q}) + \varepsilon _t^Y. \end{aligned}$$

Choose large $\lambda \in \mathbb {R}$, such that $\sup _{x<\lambda }f(x)<\lambda$ and such that^{Footnote 11}

$$\mathbb {P}\big (\varepsilon _0^X<\lambda -\sup _{x<\lambda }f(x)\big )>0.$$

Denote $\lambda ^\star =\sup _{x<\lambda }f(x)$. Rewrite

$$\begin{aligned} \begin{aligned}&\mathbb {P}(\max (X_0, \dots , X_q)<\lambda \, | \, Y_0>v) = \mathbb {P}(X_0<\lambda , \dots , X_q<\lambda \, | \, Y_0>v)\\ {}&=\prod _{i=0}^q \mathbb {P}(X_i<\lambda \, | \, X_0<\lambda , \dots , X_{i-1}<\lambda , Y_0>v). \end{aligned} \end{aligned}$$

Then, as in the proof for the Heavy-tailed VAR model case, if we show that this is strictly larger than 0, it will imply that $\Gamma _{\textbf{Y}\rightarrow \textbf{X}}^{time}(q)<1$. We know that for every $i\ge 1$ the following holds

$$\begin{aligned} \begin{aligned}&\lim _{v\rightarrow \infty } \mathbb {P}(X_i<\lambda \, | \, X_0<\lambda , \dots , X_{i-1}<\lambda , Y_0>v)\\ {}&=\lim _{v\rightarrow \infty } \mathbb {P}(f(X_{i-1})+\varepsilon _i^X<\lambda \, | \, X_0<\lambda , \dots , X_{i-1}<\lambda , Y_0>v)\\ {}&\ge \lim _{v\rightarrow \infty } \mathbb {P}(\lambda ^\star +\varepsilon _i^X<\lambda \, | \, X_0<\lambda , \dots , X_{i-1}<\lambda , Y_0>v)\\ {}&=\mathbb {P}(\lambda ^\star +\varepsilon _i^X<\lambda )>0. \end{aligned} \end{aligned}$$

We only need to show for the case when $i=0$ that $\lim _{v\rightarrow \infty } \mathbb {P}(X_0>\lambda \, | \, Y_0>v) <1$. Let $Z=g_1(Y_{-1})+ g_2(X_{-q})$, Z is independent with $\varepsilon _0^Y$. After rewriting, we obtain

$$\begin{aligned} \begin{aligned} \mathbb {P}(X_0>\lambda \, | \, Y_0>v) = \mathbb {P}(X_0>\lambda \, | \, \varepsilon _0^Y + Z>v) = \frac{\mathbb {P}(X_0>\lambda ; \, \varepsilon _0^Y + Z>v) }{\mathbb {P}(\varepsilon _0^Y + Z>v)}. \end{aligned} \end{aligned}$$

Let $\frac{1}{2}<\delta <1$ (we will later send $\delta \rightarrow 1$). The following events-relation is valid:

$$\begin{aligned} \begin{aligned}&\{X_0>\lambda ; \, \varepsilon _0^Y + Z>v\} \\&\subseteq \{X_0>\lambda ; \, \varepsilon _0^Y>\delta v\} \cup \{Z>\delta v\} \cup \{Z>(1-\delta )v; \, \varepsilon _0^Y>(1-\delta )v\}. \end{aligned} \end{aligned}$$

Applying it to the previous equation, we obtain

$$\begin{aligned} \begin{aligned}&\lim _{v\rightarrow \infty } \frac{\mathbb {P}(X_0>\lambda ; \, \varepsilon _0^Y + Z>v) }{\mathbb {P}(\varepsilon _0^Y + Z>v)}\\ {}&\le \lim _{v\rightarrow \infty } \frac{\mathbb {P}(X_0>\lambda ; \, \varepsilon _0^Y>\delta v) + \mathbb {P}(Z>\delta v) + \mathbb {P}(Z>(1-\delta )v; \, \varepsilon _0^Y>(1-\delta )v) }{\mathbb {P}(\varepsilon _0^Y + Z>v)}\\ {}&=\lim _{v\rightarrow \infty } \frac{\mathbb {P}(X_0>\lambda )\mathbb {P}(\varepsilon _0^Y>\delta v)}{\mathbb {P}(\varepsilon _0^Y + Z>v)} + \frac{ \mathbb {P}(Z>\delta v)}{\mathbb {P}(\varepsilon _0^Y + Z>v)}\\ {}&\quad +\lim _{v\rightarrow \infty }\mathbb {P}(Z>(1-\delta ))\frac{(\frac{1}{1-\delta })^\theta \mathbb {P}(\varepsilon _0^Y>v)}{\mathbb {P}(\varepsilon _0^Y + Z>v)}\\ {}&=\frac{1}{\delta ^\theta }\lim _{v\rightarrow \infty } \frac{\mathbb {P}(X_0>\lambda )\mathbb {P}(\varepsilon _0^Y>v)}{\mathbb {P}(\varepsilon _0^Y + Z>v)}+ \frac{ \mathbb {P}(Z>\delta v )}{\mathbb {P}(\varepsilon _0^Y + Z>v)} + 0. \end{aligned} \end{aligned}$$

The last summand is 0 because $\lim _{v\rightarrow \infty }\mathbb {P}(Z>(1-\delta )v)=0$ and $\frac{\mathbb {P}(\varepsilon _0^Y>v)}{\mathbb {P}(\varepsilon _0^Y + Z>v)}\le 1$ (simply because Z is a non-negative random variable).

Now, we will use the result from Proposition 3. In the case when $\lim _{v\rightarrow \infty } \frac{\mathbb {P}(Z>v)}{\mathbb {P}(\varepsilon _0^Y>v)}=0$, we obtain (see, e.g., Lemma 1.3.2 in Kulik and Soulier (2020)) $\lim _{v\rightarrow \infty }\frac{\mathbb {P}(\varepsilon _0^Y>v)}{\mathbb {P}(\varepsilon _0^Y+Z>v)}=1$ and $\lim _{v\rightarrow \infty }\frac{\mathbb {P}(Z>v)}{\mathbb {P}(\varepsilon _0^Y+Z>v)}=0$. Therefore,

$$\begin{aligned} \lim _{v\rightarrow \infty } \frac{ \frac{1}{\delta ^\theta } \mathbb {P}(X_0>\lambda )\mathbb {P}(\varepsilon _0^Y>v) + \mathbb {P}(Z>\delta v)}{\mathbb {P}(\varepsilon _0^Y + Z>v)} = \frac{1}{\delta ^\theta }\mathbb {P}(X_0>\lambda )<1, \end{aligned}$$

for $\delta$ close enough to 1.

On the other hand, if $\lim _{v\rightarrow \infty } \frac{\mathbb {P}(Z>v)}{\mathbb {P}(\varepsilon _0^Y>v)}=c\in \mathbb {R}^+$, we also have that $Z\sim \textrm{RV}(\theta )$ (this follows trivially from the definition of regular variation, tails behavior is the same up to a constant). Therefore, we can apply the sum-equivalence and we obtain

$$\begin{aligned} \begin{aligned}&\lim _{v\rightarrow \infty } \frac{\frac{\mathbb {P}(X_0>\lambda )}{\delta ^\theta }\mathbb {P}(\varepsilon _0^Y>v) + \mathbb {P}(Z>\delta v)}{\mathbb {P}(\varepsilon _0^Y + Z>v)} \\&= \frac{1}{\delta ^\theta }\lim _{v\rightarrow \infty } \frac{\mathbb {P}(X_0>\lambda )\mathbb {P}(\varepsilon _0^Y>v) + \mathbb {P}(Z>v)}{\mathbb {P}(\varepsilon _0^Y>v)+\mathbb {P}(Z>v)}\\&= \frac{1}{\delta ^\theta }\lim _{v\rightarrow \infty } \frac{\mathbb {P}(X_0>\lambda )\mathbb {P}(\varepsilon _0^Y>v) + c\mathbb {P}(\varepsilon _0^Y>v)}{\mathbb {P}(\varepsilon _0^Y>v)+c\mathbb {P}(\varepsilon _0^Y>v)}\\&=\frac{1}{\delta ^\theta }\frac{\mathbb {P}(X_0>\lambda )+c}{1+c}, \end{aligned} \end{aligned}$$

which is less than 1 for $\delta$ close enough to 1. Therefore, we obtain $\lim _{v\rightarrow \infty } \mathbb {P}(X_0>\lambda \, | \, Y_0>v) <1$, which we wanted to prove.$\square$

Lemma 1

The extremal causal condition holds in the $\textrm{hVAR}(q, \theta )$ model (i.e., where the coefficients are non-negative) when $\textbf{X}$ causes $\textbf{Y}$.

Proof

In the notion of Definition 2 and Theorem 2, if $\delta _p>0$, then

$$\begin{aligned} \begin{aligned} \sum _{i=0}^\infty d_i\varepsilon ^X_{p-i}+\sum _{i=0}^\infty b_i\varepsilon ^Y_{p-i}&=Y_p=\delta _pX_0+\dots =\delta _p(\sum _{i=0}^\infty a_i\varepsilon ^X_{-i}+\sum _{i=0}^\infty c_i\varepsilon ^Y_{-i})+\dots . \end{aligned} \end{aligned}$$

Therefore, if $a_i>0$, then $d_{i+p}\ge \delta _pa_i>0$.$\square$

Theorem 3

Let $(\textbf{X},\textbf{Y})^\top$ be a time series which follows the $\textrm{hVAR}(q, \theta )$ model, with possibly negative coefficients, satisfying the extremal causal condition. Moreover, let $\varepsilon _t^X, \varepsilon _t^Y$ have full support on $\mathbb {R}$, and $|\varepsilon _t^X|, |\varepsilon _t^Y|\sim \textrm{RV}(\theta )$. If $\textbf{X}$ causes $\textbf{Y}$, but $\textbf{Y}$ does not cause $\textbf{X}$, then $\Gamma ^{time}_{|\textbf{X}|\rightarrow |\textbf{Y}|}(q)=1$, and $\Gamma ^{time}_{|\textbf{Y}|\rightarrow |\textbf{X}|}(q)<1$.

Proof

First, we show that if $\textbf{Y}$ does not cause $\textbf{X}$, then $\Gamma ^{time}_{|\textbf{Y}|\rightarrow |\textbf{X}|}(q)<1$. This holds even without the extremal causal condition. Similarly as in the proof of Theorem 2, it is sufficient to show that $\exists \lambda >0:$

$$\begin{aligned} \begin{aligned}&\lim _{v\rightarrow \infty }\mathbb {P}(|X_t|>\lambda \, | \, |Y_0|>v) \\ {}&= \lim _{v\rightarrow \infty }\mathbb {P}(|\sum _{i=0}^\infty a_i\varepsilon ^X_{t-i}|>\lambda \, | \, |\sum _{i=0}^\infty b_i\varepsilon ^Y_{-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{-i}|>v)<1 \end{aligned} \end{aligned}$$

for $t\le q$.

We use the following fact. Since we assumed that $\varepsilon _i^X$ and $|\varepsilon _i^X|$ are $\textrm{RV}(\theta )$ ^{Footnote 12}, the following holds:

$$\mathbb {P}(|\sum _{i=0}^\infty a_i\varepsilon ^X_{t-i}|>v)\sim [\sum _{i=0}^\infty |a_i|^\theta ] \mathbb {P}(|\varepsilon ^X_{0}|>v)\sim \mathbb {P}(\sum _{i=0}^\infty |a_i||\varepsilon ^X_{t-i}|>v),$$

see, e.g., (Jessen and Mikosch 2006, Lemma 3.5). The second step follows simply from the max-sum equivalence. Finally, we use this fact and the triangle inequality to obtain the following relations

$$\begin{aligned} \begin{aligned}&\mathbb {P}(|\sum _{i=0}^\infty a_i\varepsilon ^X_{t-i}|>\lambda \, | \, |\sum _{i=0}^\infty b_i\varepsilon ^Y_{-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{-i}|>v)\\ {}&\le \frac{\mathbb {P}(\sum _{i=0}^\infty |a_i||\varepsilon ^X_{t-i}|>\lambda ; \sum _{i=0}^\infty |b_i||\varepsilon ^Y_{-i}|+\sum _{i=0}^\infty |d_i||\varepsilon ^X_{-i}|>v)}{\mathbb {P}(|\sum _{i=0}^\infty b_i\varepsilon ^Y_{-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{-i}|>v)} \\ {}&\sim \frac{\mathbb {P}(\sum _{i=0}^\infty |a_i||\varepsilon ^X_{t-i}|>\lambda ; \sum _{i=0}^\infty |b_i||\varepsilon ^Y_{-i}|+\sum _{i=0}^\infty |d_i||\varepsilon ^X_{-i}|>v)}{\mathbb {P}(\sum _{i=0}^\infty |b_i||\varepsilon ^Y_{-i}|+\sum _{i=0}^\infty |d_i||\varepsilon ^X_{-i}|>v)}\\ {}&=\mathbb {P}(\sum _{i=0}^\infty |a_i||\varepsilon ^X_{t-i}|>\lambda \, | \, \sum _{i=0}^\infty |b_i||\varepsilon ^Y_{-i}|+\sum _{i=0}^\infty |d_i||\varepsilon ^X_{-i}|>v). \end{aligned} \end{aligned}$$

This is for $v\rightarrow \infty$ less than 1 due to the classical non-negative case from Proposition 2 (for any $\lambda \in \mathbb {R}$ such that $\mathbb {P}(|\sum _{i=0}^\infty a_i\varepsilon ^X_{t-i}|>\lambda )<1$).

Second, we show that if $\textbf{X}$ causes $\textbf{Y}$, then $\Gamma ^{time}_{|\textbf{X}|\rightarrow |\textbf{Y}|}(q)=1$. Similarly, as in the proof of Theorem 2, it is sufficient to show that

$$\lim _{v\rightarrow \infty } \mathbb {P}(|Y_r |<\lambda \, | \, |X_0|>u)=0$$

for every $\lambda \in \mathbb {R}$. Here, $r\le q$ is some index with $\delta _r\ne 0$. Using the causal representation with the same notation as in the proof of Theorem 1,

$$\begin{aligned} \begin{aligned}&\lim _{v\rightarrow \infty } \mathbb {P}(|\sum _{i=0}^\infty b_i\varepsilon ^Y_{r-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{r-i}|<\lambda \, | \, |\sum _{i=0}^\infty a_i\varepsilon ^X_{-i}|>v)\\ {}&\le \lim _{v\rightarrow \infty } \mathbb {P}(\sum _{i=0}^\infty |b_i||\varepsilon ^Y_{r-i}|+\sum _{i=0}^\infty |d_i||\varepsilon ^X_{r-i}|<\lambda \, | \, |\sum _{i=0}^\infty |a_i||\varepsilon ^X_{-i}|>v), \end{aligned} \end{aligned}$$

where we used the same argument as in the first part of the proof. Therefore, we simplified our model and obtained the classical non-negative case. The result follows from the previous theory. Using Lemma 5 we obtain the result for finite n,

$$\lim _{v\rightarrow \infty } \mathbb {P}(\sum _{i=0}^n |b_i||\varepsilon ^Y_{r-i}|+\sum _{i=0}^n |d_i||\varepsilon ^X_{r-i}|<\lambda \, | \, |\sum _{i=0}^n |a_i||\varepsilon ^X_{-i}|>v)=0,$$

because $\Phi =\emptyset$ due to the extremal causal condition. The argument for limiting case $n\rightarrow \infty$ follows the same steps as those in the proof of Proposition 2.$\square$

Theorem 4

Let $(\textbf{X}, \, \textbf{Y}, \, \textbf{Z})^\top = ((X_t,Y_t,Z_t)^\top , t\in \mathbb {Z})$ follow the three-dimensional stable VAR(q) model, with iid regularly varying noise variables. Let $\textbf{Z}$ be a common cause of both $\textbf{X}$ and $\textbf{Y}$, and neither $\textbf{X}$ nor $\textbf{Y}$ cause $\textbf{Z}$. If $\textbf{Y}$ does not cause $\textbf{X}$, then $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p)<1$ for all $p\in \mathbb {N}$.

Proof

Let our series have the following representation:

$$\begin{aligned} Z_t&=\sum _{i=0}^{\infty } a_i\varepsilon _{t-i}^Z,\\ X_t&=\sum _{i=0}^{\infty } b_i\varepsilon _{t-i}^X+\sum _{i=0}^{\infty } c_i\varepsilon _{t-i}^Z,\\ Y_t&=\sum _{i=0}^{\infty } d_i\varepsilon _{t-i}^X+\sum _{i=0}^{\infty } e_i\varepsilon _{t-i}^Y+\sum _{i=0}^{\infty } f_i\varepsilon _{t-i}^Z. \end{aligned}$$

Just as in the proof of Theorem 2, it is sufficient to show that $\lim _{v\rightarrow \infty }\mathbb {P}(X_p>\lambda |Y_0>v)<1$ for some $\lambda >0$. After rewriting,

$$\lim _{v\rightarrow \infty }\mathbb {P}(\sum _{i=0}^{\infty } b_i\varepsilon _{p-i}^X+\sum _{i=0}^{\infty } c_i\varepsilon _{p-i}^Z>\lambda \, | \, \sum _{i=0}^{\infty } d_i\varepsilon _{-i}^X+\sum _{i=0}^{\infty } e_i\varepsilon _{-i}^Y+\sum _{i=0}^{\infty } f_i\varepsilon _{-i}^Z>v)<1,$$

which follows from Proposition 2 (two countable sums can be written as one countable sum).$\square$

Lemma 2

Let $(\textbf{X},\textbf{Y})^\top$ follow the $\textrm{hVAR}(q, \theta )$ model, where $\textbf{X}$ causes $\textbf{Y}$. Let s be the minimal delay. Then, $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(r)<1$ for all $r<s$, and $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(r)=1$ for all $r\ge s$.

Proof

Proving that $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(r)=1$ for all $r\ge s$, is a trivial consequence of the proof of Theorem 1 (in the first row of the proof, instead of choosing some $s\le q: \delta _s>0$, we choose s to be the minimal delay).

Concerning the first part, we only need to prove that $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(s-1)<1$, because then also $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(s-i)\le \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(s-1)<1$. As in the proof of Theorem 2, we only need to show that $\lim _{v\rightarrow \infty } \mathbb {P}(Y_{s-1}<\lambda |X_0>v)>0$ for some $\lambda >0$. By rewriting to its causal representation, we obtain the following relation

$$\begin{aligned} \lim _{v\rightarrow \infty } \mathbb {P}(\sum _{i=0}^\infty b_i\varepsilon ^Y_{s-1-i}+\sum _{i=0}^\infty d_i\varepsilon ^X_{s-1-i}<\lambda \, | \, \sum _{i=0}^\infty a_i\varepsilon ^X_{-i}+\sum _{i=0}^\infty c_i\varepsilon ^Y_{-i}>v)>0. \end{aligned}$$

We only need to realize that $d_i=0$ for $i\in \{1, \dots , s-1\}$ because s is the minimal delay. Therefore, $\varepsilon ^X_0$ is independent of $Y_{s-1}$ and the rest follows from Proposition 2 (where we deal with the two sums as one, and single $\varepsilon ^X_0$ is the second “sum”).$\square$

Theorem 5

Let $(\textbf{X},\textbf{Y})^\top =((X_t,Y_t)^\top , t\in \mathbb {Z})$ be a stationary bivariate time series, whose marginal distributions are absolutely continuous with support on some neighborhood of infinity. Let $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ exist. Let $k_n$ satisfy (5) and

$$\begin{aligned} \frac{n}{k_n}\mathbb {P}\left( \frac{n}{k_n} \sup _{x\in \mathbb {R}}|\hat{F}_X(x)-F(x)|>\delta \right) \overset{n\rightarrow \infty }{\longrightarrow }0, \,\,\, \forall \delta >0. \end{aligned}$$

Then, ${{\,\mathrm{\mathbb {E}\,}\,}}\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\overset{n\rightarrow \infty }{\rightarrow }\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$.

Proof

Throughout the proof, we use the fact that for a continuous $X_1$ always holds $\mathbb {P}(F_X(X_1)\le t)=t$ for $t\in [0,1]$ and the fact that follows from the stationarity $\mathbb {P}(\hat{F}_X(X_1)\le \frac{k}{n})=\mathbb {P}(X_1\le X_{(k)})=\frac{k}{n}$, for $k\le n, k\in \mathbb {N}$. Please note that $X_{(k)}$ is always meant with respect to (not written) index n.

First, notice the following (the third equation follows from the linearity of expectation and stationarity of our series; the fourth equation follows from the definition of conditional expectation; the fifth is quite trivial):

$$\begin{aligned} \begin{aligned} {{\,\mathrm{\mathbb {E}\,}\,}}\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)&= {{\,\mathrm{\mathbb {E}\,}\,}}\frac{1}{k_n}\sum _{i: X_i\ge \tau _{k_n}^X}\max \{\hat{F}_Y(Y_i), \dots , \hat{F}_Y(Y_{i+p})\}\\ {}&={{\,\mathrm{\mathbb {E}\,}\,}}\frac{1}{n}\sum _{i=1}^n\frac{n}{k_n}\max \{\hat{F}_Y(Y_i), \dots , \hat{F}_Y(Y_{i+p})\}\mathbb {1}(\hat{F}_X(X_i)>1-\frac{k_n}{n})\\ {}&=\frac{n}{k_n}{{\,\mathrm{\mathbb {E}\,}\,}}[\hat{F}_Y(\max \{Y_1, \dots , Y_{p+1}\})\mathbb {1}(\hat{F}_X(X_1)>1-\frac{k_n}{n})]\\ {}&=\frac{n}{k_n} \mathbb {P}(\hat{F}_X(X_1)>1-\frac{k_n}{n})\cdot \\ {}&\;\;\;\;\;\;\;\;\cdot {{\,\mathrm{\mathbb {E}\,}\,}}[\hat{F}_Y(\max \{Y_1, \dots , Y_{p+1}\}) \, | \, \hat{F}_X(X_1)>1-\frac{k_n}{n}]\\ {}&={{\,\mathrm{\mathbb {E}\,}\,}}[\hat{F}_Y(\max \{Y_1, \dots , Y_{p+1}\}) \, | \, \hat{F}_X(X_1)>1-\frac{k_n}{n}]. \end{aligned} \end{aligned}$$

Now, use $\hat{F}=F+ \hat{F}-F$ to obtain

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{\mathbb {E}\,}\,}}[\hat{F}_Y(\max \{Y_1, \dots , Y_{p+1}\}) \, | \, \hat{F}_X(X_1)>1-\frac{k_n}{n}]\\ {}&={{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(\max \{Y_1, \dots , Y_{p+1}\}) \, | \, \hat{F}_X(X_1)>1-\frac{k_n}{n}] \\ {}&\quad +{{\,\mathrm{\mathbb {E}\,}\,}}[(\hat{F}_Y-F_Y)(\max \{Y_1, \dots , Y_{p+1}\}) \, | \, \hat{F}_X(X_1)>1-\frac{k_n}{n}]. \end{aligned} \end{aligned}$$

The second term is less than ${{\,\mathrm{\mathbb {E}\,}\,}}[\sup _{x\in \mathbb {R}}|\hat{F}_Y(x)-F_Y(x)|]\rightarrow 0$ as $n\rightarrow \infty$ from the assumptions. All we need to show is that the first term converges to $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$. Rewrite

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(\max \{Y_1, \dots , Y_{p+1}\}) \, | \, \hat{F}_X(X_1)>1-\frac{k_n}{n}]\\ {}&={{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(\max \{Y_1, \dots , Y_{p+1}\}) \, | \, X_1>X_{(n-k_n)}]. \end{aligned} \end{aligned}$$

Therefore, all we need to show is the following

$$\begin{aligned} \begin{aligned} \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)&= \lim _{u\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(\max \{Y_1, \dots , Y_{p+1}\}) \, | \, X_1>u]\\ {}&\overset{}{=}\lim _{n\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[F_Y(\max \{Y_1, \dots , Y_{p+1}\}) \, | \, X_1>X_{(n-k_n)}]. \end{aligned} \end{aligned}$$

(8)

Denote $Z=F_Y(\max \{Y_1, \dots , Y_{p+1}\})$. Denote $u_n\in \mathbb {R}$ as $1-\frac{k_n}{n}$ quantiles of $X_1$, that is, numbers fulfilling $\mathbb {P}(X_1>u_n)=\frac{k_n}{n}$. Because $u_n\rightarrow \infty$, (8) is equivalent to

$$\begin{aligned} \lim _{n\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[Z \, | \, X_1>u_n]\overset{}{=} \lim _{n\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[Z \, | \, X_1>X_{(n-k_n)}]. \end{aligned}$$

(9)

Hence, if we prove (9), our proof will be complete. Rewrite (using identity $\mathbb {1}(a>b) = \mathbb {1}(c>a>b) + \mathbb {1}(a>c>b) + \mathbb {1}(a>b>c)$ when no ties are present):

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{\mathbb {E}\,}\,}}[Z \, | \, X_1>u_n]=\frac{1}{\mathbb {P}(X_1>u_n)}\int _{\Omega }Z\cdot \mathbb {1}(X_1>u_n)\,\textrm{d}\mathbb {P}=\frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(X_1>u_n)\,\textrm{d}\mathbb {P}\\ {}&= \frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(X_{(n-k_n)}>X_1>u_n)\,\textrm{d}\mathbb {P}+ \frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(X_1>X_{(n-k_n)}>u_n)\,\textrm{d}\mathbb {P}\\ {}&\quad + \frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(X_1>u_n>X_{(n-k_n)})\,\textrm{d}\mathbb {P}. \end{aligned} \end{aligned}$$

On the other hand, rewrite also

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{\mathbb {E}\,}\,}}[Z \, | \, X_1>X_{(n-k_n)}]=\frac{1}{\mathbb {P}(X_1>X_{(n-k_n)})}\int _{\Omega }Z\cdot \mathbb {1}(X_1>X_{(n-k_n)})\,\textrm{d}\mathbb {P}\\ {}&=\frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(X_1>X_{(n-k_n)})\,\textrm{d}\mathbb {P}= \frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(u_n>X_1>X_{(n-k_n)})\,\textrm{d}\mathbb {P}\\ {}&+ \frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(X_1>X_{(n-k_n)}>u_n)\,\textrm{d}\mathbb {P}+ \frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(X_1>u_n>X_{(n-k_n)})\,\textrm{d}\mathbb {P}. \end{aligned} \end{aligned}$$

Note that these two equations differ only in the first term. Therefore, to show (9), we only need to show that

$$\frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(X_{(n-k_n)}>X_1>u_n)\,\textrm{d}\mathbb {P}- \frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(u_n>X_1>X_{(n-k_n)})\,\textrm{d}\mathbb {P}\overset{n\rightarrow \infty }{\rightarrow }\ 0.$$

We show that the first term goes to 0. Analogously, the second term can be shown to converge to 0.

We know that $0\le Z \le 1$ and we have for the first term:

$$\begin{aligned} \begin{aligned}&\frac{n}{k_n}\int _{\Omega }Z\cdot \mathbb {1}(X_{(n-k_n)}>X_1>u_n)\,\textrm{d}\mathbb {P}\le \frac{n}{k_n} \mathbb {P}(X_{(n-k_n)}>X_1>u_n)\\ {}&=\mathbb {P}(X_{(n-k_n)}>X_1 \, | \, X_1>u_n)=\mathbb {P}(X_{(n-k_n)}>X_1 \, | \, F_X(X_1)>1-\frac{k_n}{n})\\ {}&=1-\mathbb {P}(X_1\ge X_{(n-k_n)} \, | \, F_X(X_1)>1-\frac{k_n}{n})\\ {}&=1-\mathbb {P}(\hat{F}_X(X_1)\ge 1-\frac{k_n}{n} \, | \, F_X(X_1)>1-\frac{k_n}{n})\\ {}&=1-\mathbb {P}(F_X(X_1) + (\hat{F}_X(X_1) - F_X(X_1))\ge 1-\frac{k_n}{n} \, | \, F_X(X_1)>1-\frac{k_n}{n})\\ {}&\le 1- \mathbb {P}(F_X(X_1) - \sup _{x\in \mathbb {R}}|\hat{F}_X(x) - F_X(x)|\ge 1-\frac{k_n}{n} \, | \, F_X(X_1)>1-\frac{k_n}{n}). \end{aligned} \end{aligned}$$

Denote $S_n:= \sup _{x\in \mathbb {R}}|\hat{F}_X(x) - F_X(x)|$. It is sufficient for our proof to show that

$$\mathbb {P}(F_X(X_1) - S_n\ge 1-\frac{k_n}{n} \, | \, F_X(X_1)>1-\frac{k_n}{n})\overset{n\rightarrow \infty }{\rightarrow }\ 1.$$

Choose $\varepsilon >1$, define $\delta =1-\frac{1}{\varepsilon }$. Rewrite

$$\begin{aligned} \begin{aligned}&\mathbb {P}(F_X(X_1) - S_n\ge 1-\frac{k_n}{n} \, | \, F_X(X_1)>1-\frac{k_n}{n}) \\ {}&=\frac{n}{k_n}\mathbb {P}(F_X(X_1) - S_n\ge 1-\frac{k_n}{n}; F_X(X_1)>1-\frac{k_n}{n}) \\ {}&\ge \frac{n}{k_n}\mathbb {P}(F_X(X_1) - S_n\ge 1-\frac{k_n}{n} ; F_X(X_1)>1-\frac{k_n/\varepsilon }{n})\\ {}&\ge \frac{n}{k_n}\mathbb {P}( S_n\le \frac{k_n - k_n/\varepsilon }{n} ; F_X(X_1)>1-\frac{k_n/\varepsilon }{n})\\ {}&= \frac{n}{k_n}\mathbb {P}( \frac{n}{k_n}S_n\le \delta ; F_X(X_1)>1-\frac{k_n/\varepsilon }{n}). \end{aligned} \end{aligned}$$

Use the identity $\mathbb {P}(A\cap B) = 1-\mathbb {P}(A^c) - \mathbb {P}(B^c) + \mathbb {P}(A^c\cap B^c)$ and continue

$$\begin{aligned} \begin{aligned}&\frac{n}{k_n}\mathbb {P}( \frac{n}{k_n}S_n\le \delta ; \, F_X(X_1)>1-\frac{k_n/\varepsilon }{n})\\ {}&=\frac{n}{k_n} [1-\mathbb {P}( \frac{n}{k_n}S_n>\delta ) - \mathbb {P}(F_X(X_1)\le 1-\frac{k_n/\varepsilon }{n})\\ {}&\quad + \mathbb {P}( \frac{n}{k_n}S_n>\delta ; \, F_X(X_1)\le 1-\frac{k_n/\varepsilon }{n})) ]\\ {}&\ge \frac{n}{k_n} [1-\mathbb {P}( \frac{n}{k_n}S_n>\delta ) - (1-\frac{k_n/\varepsilon }{n}) +0]\\ {}&=\frac{n}{k_n}[\frac{k_n/\varepsilon }{n} - \mathbb {P}( \frac{n}{k_n}S_n>\delta ) ]=\frac{1}{\varepsilon } -\frac{n}{k_n}\mathbb {P}( \frac{n}{k_n}S_n>\delta ) \overset{n\rightarrow \infty }{\rightarrow }\frac{1}{\varepsilon } \overset{\varepsilon \rightarrow 1}{\rightarrow }1. \end{aligned} \end{aligned}$$

Altogether, we proved that $\lim _{n\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[Z \, | \, X_1>u_n]=\lim _{n\rightarrow \infty }{{\,\mathrm{\mathbb {E}\,}\,}}[Z \, | \, X_1>X_{(n-k_n)}],$ from which the theorem follows.$\square$

Consequence 1

Let $\textbf{X}\rightarrow \textbf{Y}$. Under the assumptions of Theorems 1 and 5, the proposed estimator is consistent; that is, $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\overset{P}{\rightarrow }\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)$ as $n\rightarrow \infty$.

Proof

Since $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)=1$ from Theorem 1 and trivially $\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\overset{a.s.}{\le }1$, Theorem 5 implies that ${{\,\mathrm{\mathbb {E}\,}\,}}\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\overset{n\rightarrow \infty }{\rightarrow }1$ and ${{\,\textrm{var}\,}}\big (\hat{\Gamma }^{time}_{\textbf{X}\rightarrow \textbf{Y}}(p)\big )\overset{n\rightarrow \infty }{\longrightarrow }\ 0$. This implies consistency.$\square$

Lemma 3

Let $\textbf{X} = (X_t, t\in \mathbb {Z})$ has the form

$$X_t = \sum _{k=0}^\infty a_k \varepsilon _{t-k},$$

where $(\varepsilon _t, t \in \mathbb {Z})$ are iid random variables with density $f_\varepsilon$. Assume $|a_k|\le \gamma k^{-\beta }$ for some $\beta >1$, $\gamma >0$ and all $k \in \mathbb {N}$. Let $f^\star = \max (1, |f_\varepsilon |_{\infty }, |f'_{\varepsilon }|_{\infty }) < \infty$, where $|f|_{\infty } = \sup _{x\in \mathbb {R}}|f(x)|$ is the supremum norm. Assume ${{\,\mathrm{\mathbb {E}\,}\,}}|\varepsilon _0|^q<\infty$ for $q>2$ and $\mathbb {P}(|\varepsilon _0|>x) \le L(\log x)^{-r_0}x^{-q}$ for some constants $L>0, r_0>1$ and for every $x>1$. Then, if a sequence $(k_n)$ satisfies (5) and

$$\begin{aligned} \exists c>\max \left\{ \frac{1}{2},\frac{2}{1+q\beta }\right\} : \frac{k_n}{n^c}\rightarrow \infty , \text { as }n\rightarrow \infty , \end{aligned}$$

(10)

then the condition (6) is satisfied.

Proof

The result is a slight modification of Proposition 13 in Chen and Wu (2018). The proposition states that, under our conditions, there exist $\alpha >1/2$ and $n_0\in \mathbb {N}$ such that for all $n>n_0$ and for all $z\ge \gamma \sqrt{n}(\log n)^\alpha$ holds

$$\begin{aligned} \mathbb {P}( n\sup _{x\in \mathbb {R}}|\hat{F}_X(x)-F_X(x)|/f^\star > z) \le constant\frac{n}{z^{q\beta }(\log z)^{r_0}}. \end{aligned}$$

(11)

Choose $\delta >0$ and take $z=\frac{f^\star }{\delta } k_n$. Note that $z\ge \gamma \sqrt{n}(\log n)^\alpha$ for large n due to (7). Rewrite (11) into

$$\frac{n}{k_n}\mathbb {P}(\frac{n}{k_n} \sup _{x\in \mathbb {R}}|\hat{F}_X(x)-F_X(x)|>\delta ) \le constant\frac{n}{k_n^{q\beta }(\log \frac{f^\star }{\delta }k_n)^{r_0}}\frac{n}{k_n}.$$

Since $\frac{n}{k_n^{q\beta }}\frac{n}{k_n}\rightarrow 0$ as $n\rightarrow \infty$ due to (7), we obtain that the condition (6) is satisfied.$\square$

Claim 1

Under the setup in Section 4.5, where $\theta _Z\ge \theta _X, \theta _Y$, the following implications hold:

$\textbf{Y}$ does not cause $\textbf{X}$ and $\theta _X\ge \theta _Y \implies \Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(p) < 1 \text { for all } p\in \mathbb {N}$.
$\textbf{X}$ causes $\textbf{Y}$ and $\theta _X, \theta _Y>0\implies \Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(q)=1$.

On the other hand, if $\theta _X<\theta _Y$, then $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(q) =1$ can happen even if $\textbf{Y}$ does not cause $\textbf{X}$. Hence, the results from Section 2.2 are still valid as long as $\theta _X\ge \theta _Y$, while they might no longer be true if $\theta _X<\theta _Y$.

We do not provide rigorous proof of this claim. However, we give here some simple intuition why we believe it is true. $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(q)$ will be even smaller than in the case when $\theta _X = \theta _Y$, since the effect of $\textbf{X}$ on $\textbf{Y}$ in extremes will be much smaller if $\textbf{X}$ has lighter tails than $\textbf{Y}$. To make this more rigorous, it is possible to rewrite Proposition 2 with unequal tail indexes and claim that

$$\lim _{u\rightarrow \infty }\mathbb {P}(\sum _{i=0}^\infty a_i \varepsilon _i^X<\lambda \, | \, \sum _{i=0}^\infty b_i\varepsilon _i^X + \sum _{i=0}^\infty c_i\varepsilon _i^Y>u)\le \mathbb {P}(\sum _{i=0}^\infty a_i \varepsilon _i^X<\lambda )\frac{C+\sum _{i\in \Phi }b_i^\theta }{C+B}$$

for all $\lambda \in \mathbb {R}$, where the notation follows Proposition 2. The proof of Theorem 2 would follow the same steps with modified Proposition 2.

As for the second bullet-point, the proof of Theorem 1 does not use the regular variation condition. Consequently, if $\textbf{X}\rightarrow \textbf{Y}$, we deduce that $\Gamma ^{time}_{\textbf{X}\rightarrow \textbf{Y}}(q)=1$, irrespective of the tail indexes.

Model 3 with $\delta _Y=0, \delta _X = 1$ and $\theta _X< \theta _Y$ satisfies $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(3) =1$ while $\textbf{Y}$ does not cause $\textbf{X}$. This follows simply from the identity $\lim _{u\rightarrow \infty }\mathbb {P}(\varepsilon ^X_t>u \, | \, \varepsilon ^X_t+ \varepsilon _t^Y>u) = 1$ (follows from Lemma B.6.1 in Buraczewski et al. (2016)). Using the causal representation of $\textbf{X}$ and $\textbf{Y}$ in Model 3, we simply obtain $\Gamma ^{time}_{\textbf{Y}\rightarrow \textbf{X}}(3) =1$.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bodik, J., Paluš, M. & Pawlas, Z. Causality in extremes of time series. Extremes 27, 67–121 (2024). https://doi.org/10.1007/s10687-023-00479-5

Download citation

Received: 25 December 2021
Revised: 04 October 2023
Accepted: 05 October 2023
Published: 31 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10687-023-00479-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Causality in extremes of time series

Abstract

Similar content being viewed by others

Trend detection for heteroscedastic extremes

Time-Frequency Causal Inference Uncovers Anomalous Events in Environmental Systems

On the measurement and treatment of extremes in time series

1 Introduction

1.1 SCM and existing work on the causal tail coefficient

1.2 Motivating example and the main idea in the context of time series

1.3 Preliminaries and notation

2 The causal tail coefficient for time series

2.1 Models

Definition 1

Definition 2

Definition 3

Definition 4

2.2 Causal direction

Theorem 1

Theorem 2

Example 1

3 Properties and extensions

3.1 Real-valued coefficients

Example 2

Definition 5

Lemma 1

Theorem 3

3.2 Common cause and different tail behavior

Theorem 4

Example 3

3.3 Estimating the extremal delay p

Definition 6

Lemma 2

4 Estimations and simulations

4.1 Non-parametric estimator

Definition 7

Theorem 5

Consequence 1

Lemma 3

4.2 Some insight using simulations

Model 1

Conclusion

4.3 Choice of a threshold

4.4 Choice of the extremal delay

Model 2

4.5 Hidden confounder and different tail behavior of the noise

Model 3

4.6 Testing

4.7 Comparison with state-of-the-art methods

Model 4

5 Applications

5.1 Space weather

5.2 Hydrometeorology

6 Conclusion

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

A. Auxiliary propositions

Proposition 1

Proof

Proposition 2

Lemma 4

Lemma 5

Lemma 6

Proof of Lemma 4

Proof of Lemma 5

Proof of Lemma 6

Proof of Proposition 2

Consequence 2

Proof

Proposition 3