Bootstrapping Not Independent and Not Identically Distributed Data

Hrba, Martin; Maciak, Matúš; Peštová, Barbora; Pešta, Michal

doi:10.3390/math10244671

Open AccessArticle

Bootstrapping Not Independent and Not Identically Distributed Data

by

Martin Hrba

¹,

Matúš Maciak

¹

,

Barbora Peštová

² and

Michal Pešta

^1,*

¹

Department of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, Charles University, Sokolovská 49/83, 18675 Prague, Czech Republic

²

Department of Statistical Modelling, Institute of Computer Science, Czech Academy of Sciences, Pod Vodárenskou věží 271/2, 18207 Prague, Czech Republic

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(24), 4671; https://doi.org/10.3390/math10244671

Submission received: 15 November 2022 / Revised: 1 December 2022 / Accepted: 6 December 2022 / Published: 9 December 2022

(This article belongs to the Special Issue Probability Distributions and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Classical normal asymptotics could bring serious pitfalls in statistical inference, because some parameters appearing in the limit distributions are unknown and, moreover, complicated to estimated (from a theoretical as well as computational point of view). Due to this, plenty of stochastic approaches for constructing confidence intervals and testing hypotheses cannot be directly applied. Bootstrap seems to be a plausible alternative. A methodological framework for bootstrapping not independent and not identically distributed data is presented together with theoretical justification of the proposed procedures. Among others, bootstrap laws of large numbers and central limit theorems are provided. The developed methods are utilized in insurance and psychometry.

Keywords:

bootstrap; statistical inference; asymptotic normality; weakly dependent data; not identically distributed data; moving block bootstrap; law of large numbers; central limit theorem; psychometric evaluation; non-life insurance

MSC:

62E20; 62F40; 62P05; 62P15; 91G05

JEL Classification:

C15; C18; C22; C53; G22

1. Introduction and Motivation

Stochastic inference frequently relies on the limiting normal distribution, which allows the characterization of large sample (asymptotic) behavior of the considered statistics. However, some entities of the asymptotic distribution such as the variance–covariance matrix quite often depend on the parameters or other quantities, which cannot be estimated directly from the data. In addition, this estimation can be performed neither theoretically nor computationally. Bootstrap methods—a class of resampling techniques—serve as suitable alternatives. Additional problems arise, when the underlying data do not form a random sample, simply because they cannot be considered as independent or identically distributed. Thus, the bootstrap techniques developed for independent and identically distributed (IID) observations need to be adjusted and suited up for the not independent and not identically distributed (NINID) data. The main goal of this paper is to postulate a theoretical setup and a methodological framework, which enable us to show how the bootstrap procedures should be accommodated for the NINID observations and, simultaneously, to formally justify the validity of such approaches. It is prerequisite to realize that there are two sources of randomness when bootstrapping: the first randomness comes from the data themselves, the second randomness is brought by the chance from random resampling. Therefore, we need to access, quantify, and control both sources.

However, it is not sufficient to give just the algorithmic procedures for bootstrapping the parameters of interest. It is also desired to possess fundamental mathematical guaranties that the particular bootstrap technique provides reasonable results. A possible way how to postulate such validity is to formulate proper stochastic convergences, assuring that the bootstrapped parameter is distributionally close to the estimator of the unknown parameter of interest.

1.1. State of the Art

The bootstrap was introduced by Efron in [1]. This theoretical concept was consequently investigated by [2] in the case of independent and identically distributed data. Monographs [3] or [4] provide a bootstrap “first” course overview. A comprehensive theoretical summary for the bootstrap methods can be found in [5]. Various types of the dependent bootstrap procedures were proposed and [6] provided a extensive summary. Subsampling within the bootstrap framework was given by [7]. The block bootstrap approach was independently suggested by [8,9] for the case of sample mean. [10,11] extended these results, but still considered only strictly stationary processes. Reference [12] generalized previous approaches for non-stationary processes. Bootstrapping panel data with non-stationary, heteroscedastic, and outlying observations was elaborated by [13]. Permutation bootstrap for the exchangeable observations was summarized by [14]. An application of the dependent block bootstrap approach for the regression setup was performed in [15,16]. In contrast to this theoretical asymptotic result, the practical finite sample properties for the dependent bootstrap were studied by [17,18,19].

1.2. Structure of the Paper

The organization of the paper is as follows. The bootstrap procedures for IID and NINID data are summarized in Section 2. Consequently, Section 3 postulates asymptotic relations and closeness in the bootstrap world. Various bootstrap laws of large numbers are derived in Section 4. The validity of the considered bootstrap methods are guaranteed by the bootstrap central limit theorems provided in Section 5. Practical applications to the problems from insurance and psychometry are shown in Section 6. Conclusions are relegated to Section 7, while all the proofs are collected in the Appendix A.

2. Bootstrap Methods for NINID Data

A brief summary for bootstrapping the IID data is going to be provided. After that, an extension for the NINID observations is given.

2.1. Independent Bootstrap

An independent bootstrap requires no distributional assumptions and, moreover, independence of the input multivariate observations

{ξ_{i}}_{i = 1}^{n}

is assumed while resampling is being performed. It is also often called the nonparametric bootstrap and it refers to the simplest scheme of independent resampling from the original observations

[ξ_{1}, \dots, ξ_{n}]

. The main idea behind the independent bootstrap lies in the resampling of the independent column data

ξ_{i}

’s with replacement in order to obtain the bootstrapped data

ξ_{i}^{*}

. Now, an entity of interest is calculated from the new “starred” data

ξ_{i}^{*}

’s, e.g., a test statistic or an estimator of an unknown parameter. It is desired that the new distribution of the bootstrapped entity mimics the original distribution of the concerned statistics. From now on, we concentrate on the bootstrap for the mean parameter. However, the presented approaches can be generalized for other parameters of interest as well, but this goes beyond the scope of this paper.

Assuming identical distribution and finite variance of

ξ_{i}

’s, the central limit theorem holds for the sample mean

{\bar{ξ}}_{n} : = \frac{1}{n} \sum_{i = 1}^{n} ξ_{i} .

Thus,

\sqrt{n} ({\bar{ξ}}_{n} - E ξ_{1})

has asymptotic multivariate normal distribution. The bootstrapped version of

{\bar{ξ}}_{n}

becomes

{\bar{ξ}}_{n}^{*} : = \frac{1}{n} \sum_{i = 1}^{n} ξ_{i}^{*} .

Afterwards, it is necessary to asymptotically compare the distributions of

\sqrt{n} ({\bar{ξ}}_{n} - E ξ_{1}) and \sqrt{n} ({\bar{ξ}}_{n}^{*} - {\bar{ξ}}_{n})

In a proper mathematical way to be sure that the empirical distribution of the bootstrap estimate

{\bar{ξ}}_{n}^{*}

of the mean can be used instead of the distribution of

{\bar{ξ}}_{n}

. The asymptotic closeness of

\sqrt{n} ({\bar{ξ}}_{n} - E ξ_{1})

and

\sqrt{n} ({\bar{ξ}}_{n}^{*} - {\bar{ξ}}_{n})

will be clarified and proved later on in Section 5. An algorithm for the independent bootstrap is shown in Procedure 1 and its validity will be proved in Theorem 6.

Procedure 1 Independent bootstrap for the sample mean.

Input: Data consisting of n IID vectors of observations

ξ_{i}

.

Output: Empirical bootstrap distribution of

{\bar{ξ}}_{n}

, i.e., the empirical distribution, where the probability mass

1 / B

concentrates at each of

_{(1)} {\bar{ξ}}_{n}^{*}, \dots,_{(B)} {\bar{ξ}}_{n}^{*}

.

1:: calculate the sample mean ${\bar{ξ}}_{n}$
2:: for $b = 1$ to B do $/ /$ repeat in order to obtain empirical distribution of ${\bar{ξ}}_{n}$
3:: $[_{(b)} ξ_{1}^{*}, \dots,_{(b)} ξ_{n}^{*}]$ resampled with replacement from the columns of $[ξ_{1}, \dots, ξ_{n}]$
4:: re-calculate $_{(b)} {\bar{ξ}}_{n}^{*} \leftarrow n^{- 1} \sum_{i = 1}^{n}_{(b)} ξ_{i}^{*}$
5:: end for

It is silently supposed that the bootstrapped sample is of the same size as the original one, i.e.,

{ξ_{i}}_{i = 1}^{n}

and

{ξ_{i}^{*}}_{i = 1}^{n}

. Generally, one may consider resampled bootstrapped data

{ξ_{i}^{*}}_{i = 1}^{m}

consisting of m resamples from the data

{ξ_{i}}_{i = 1}^{n}

having sample size n. Thereby, an additional condition needs to be postulated on the rate of the sample sizes:

m = O (n), n \to \infty & n = O (m), m \to \infty .

There is, however, no need to distinguish between the same sample size and two different sample sizes (the original one and the bootstrap one), which are asymptotically equivalent, at least from the theoretical asymptotic point of view. However, there could be some computational improvements when considering various samples size of the resampled data, which is not the target of this research.

2.2. Moving Block Bootstrap

Intuitively, a problem will arise when considering dependent and/or not identically distributed observations. A generalization of the independent bootstrap (case sampling with replacement)—block bootstrap—can be used. Instead of sampling individual cases, the blocks of adjacent observations are resampled with replacement. Stacking individual adjacent cases together into one solid block partly preserves the dependence between consecutive observations. Since the “weak” dependence can be seen as an asymptotic independence, the blocks can be resampled independently. It is a way how to achieve that the dependence between faraway observations is vanishing.

A plethora of types of block bootstrap techniques were suggested. A comprehensive summary is provided by [6]. In general, the way of drawing the blocks of observations defines the difference among the block bootstrap versions. Bootstrapping blocks, which do not overlap, is performed in the non-overlapping block bootstrap. Since some observations are not allowed to be joined into the same block, this approach is less efficient for estimation [20]. Therefore, the moving block bootstrap (MBB) is considered here. The key idea of the MBB is forming the consecutive block of observations from the previous one through shifting the “stacking window” by one observation ahead. The MBB technique for the sample mean of multivariate observations is described in Procedure 2 in detail.

Procedure 2 Moving block bootstrap for the sample mean.

Input: Data consisting of n NINID vectors of observations

ξ_{i}

and

n = m b

.

Output: Empirical bootstrap distribution of sample mean

{\bar{ξ}}_{n}

, i.e., the empirical distribution, where the probability mass

1 / D

concentrates at each of

_{(1)} {\bar{ξ}}_{n}^{*}, \dots,_{(D)} {\bar{ξ}}_{n}^{*}

.

1:: define $B_{j}$ as the block of b consecutive $ξ_{i}$ ’s starting from $ξ_{j}$ , that is $B_{j} = [ξ_{j}, \dots, ξ_{j + b - 1}]$ for $j = 1, \dots, q$ , where $q : = n - b + 1$
2:: for $d = 1$ to D do $/ /$ repeat in order to obtain empirical distribution of ${\bar{ξ}}_{n}$
3:: resample with replacement $_{(d)} C_{1}, \dots,_{(d)} C_{m}$ independently from ${B_{1}, \dots, B_{q}}$ with equal probability $1 / q$ , where each $_{(d)} C_{i}, i = 1, \dots, m$ , is a block of size b with $_{(d)} C_{i} = [_{(d)} c_{i 1}, \dots,_{(d)} c_{i b}]$
4:: the MBB resample of size n, denoted by $_{(d)} ξ_{1}^{*}, \dots,_{(d)} ξ_{n}^{*}$ , is formed by joining the $_{(d)} C_{1}, \dots,_{(d)} C_{m}$ to one big block, i.e., $_{(d)} ξ_{i}^{*} =_{(d)} c_{τ ν}$ for $τ = [(i - 1) / b] + 1$ , $ν = i - b τ$ , and $i = 1, \dots, n / /$ $_{(d)} Ξ^{*} \equiv [_{(d)} ξ_{1}^{*}, \dots,_{(d)} ξ_{n}^{*}]$ is called the MBB version of $Ξ \equiv {[ξ_{1}, \dots, ξ_{n}]}^{⊤}$ .
5:: let the resample average be $_{(d)} {\bar{ξ}}_{n}^{*} \leftarrow n^{- 1} \sum_{i = 1}^{n}_{(d)} ξ_{i}^{*}$
6:: end for

Let

P^{*}

be the (bootstrap) distribution of

_{(d)} C_{i}

conditional on the sample

{ξ_{1}, \dots, ξ_{n}}

. So, given

ξ_{1}, \dots, ξ_{n}

, the m random blocks,

_{(d)} C_{1}, \dots,_{(d)} C_{m}

, are IID distributed according to

P^{*}

. The length of the blocks—blocksize—is denoted by

b \in N

. Without loss of generality to the asymptotic properties, let us suppose that

b ∣ n

, i.e., there exist

m \in N

such that

n = m b

. In other words, we just neglect an integer division problem. For practical and computational purposes, if

b ∤ n

, then we can truncate the quotient

n / b

to an integer value, see [21].

An extension of the MBB is a circular block bootstrap, where the observations are not ordered on a single line, but they are put into a circle. The order of the observations is preserved with the only exception that the last observation on the circle is followed by the first one. Hence, the stacking window can join the first and the last observations into one block. The application of the circular block bootstrap as an extension of the MBB is postponed for some further work and is not considered in this paper.

2.3. Blocksize

A length of the blocks–the blocksize b–in the MBB procedure is a crucial decision. It indeed affects the bootstrapped statistics and the consequent statistical inference. The blocksize can be therefore considered as a nuisance, but also a tuning parameter. It will be derived in the forthcoming theory that

b = o (n^{1 / 2})

as

n \to \infty

. However, for the practical and computational purposes, this asymptotic choice of b is rather cumbersome. With respect to the simulation studies from [12], it may be concluded that the blocksize choice

b = O (n^{1 / 3})

as

n \to \infty

could be asymptotically optimal according to the minimal mean square error of the moving block bootstrap variance estimator.

3. Types of Bootstrap Convergences

We would like to show that

\sqrt{n} ({\bar{ξ}}_{n} - E ξ_{1})

and

\sqrt{n} ({\bar{ξ}}_{n}^{*} - {\bar{ξ}}_{n})

asymptotically coincide. This means that using the bootstrap distribution is no worse than using the asymptotic normal approximation. However, it does not mean that the bootstrap distribution better approximates the finite sample distribution of

\sqrt{n} ({\bar{ξ}}_{n} - E ξ_{1})

. To state this mathematically, we firstly need some formalization. For the rest of this paper, we link

∥ \cdot ∥

together with any unitarily invariant matrix norm, e.g., the Frobenius matrix norm (and, in case of vectors, with the Euclidean norm).

Suppose that

{ξ_{n}, ξ_{n}^{*}, ζ_{n}, ζ_{n}^{*}, χ_{n}}_{n = 1}^{\infty}

are sequences of random vectors/matrices, which elements exist on a probability space

(Ω, F, P)

. The components of these sequences do not necessarily have to have the same dimension, e.g.,

ζ_{n}

and

ζ_{n + 1}

can have different dimensions for some

n \in N

. Let us define a conditional probability given

ζ_{n}

P_{ζ_{n}}^{*} [\cdot] : = E_{P} [I (\cdot) | ζ_{n}] .

Definition 1.

Conditional weak convergence almost surely and in probability to each other. Let

{ξ_{n}, ξ_{n}^{*}, ζ_{n}}_{n = 1}^{\infty}

be sequences of random vectors/matrices. If for every real-valued bounded continuous function f holds

E [f (ξ_{n}^{*})] - E [f (ξ_{n})] \underset{n \to \infty}{\to} 0,

then

ξ_{n}^{*}

and

ξ_{n}

are said to be approaching each other in distribution. In short, we write

ξ_{n}^{*} ⟷_{n \to \infty}^{D} ξ_{n} .

if for every real-valued bounded continuous function f holds

E [f (ξ_{n}^{*}) | ζ_{n}] - E [f (ξ_{n})] \to_{n \to \infty}^{[P] - a . s .} 0,

then

ξ_{n}^{*}

conditioned on

ζ_{n}

and

ξ_{n}

are said to be approaching each other in distribution

[P]

-almost surely along

ζ_{n}

. In short, we write

ξ_{n}^{*} | ζ_{n} ⟷_{n \to \infty}^{D ([P] - a . s .)} ξ_{n} .

if for every real-valued bounded continuous function f holds

E [f (ξ_{n}^{*}) | ζ_{n}] - E [f (ξ_{n})] \to_{n \to \infty}^{P} 0,

then

ξ_{n}^{*}

conditioned on

ζ_{n}

and

ξ_{n}

are said to be approaching each other in distribution in probability

P

along

ζ_{n}

. In short, we write

ξ_{n}^{*} | ζ_{n} ⟷_{n \to \infty}^{D (P)} ξ_{n} .

In the same manner as above, we may define the distributional convergence on the “conditional” (resampled) level to a random variable

ξ_{0}

(“constant” law).

Definition 2.

Conditional weak convergence almost surely and in probability to a constant law. Let

{ξ_{n}^{*}, ζ_{n}}_{n = 1}^{\infty}

be sequences of random vectors/matrices and

ξ_{0}

be a random vector/matrix. If for every real-valued bounded continuous function f holds

E [f (ξ_{n}^{*}) | ζ_{n}] \to_{n \to \infty}^{[P] - a . s .} E [f (ξ_{0})],

then

ξ_{n}^{*}

conditioned on

ζ_{n}

is said to converge to

ξ_{0}

in distribution

[P]

-almost surely along

ζ_{n}

. In short, we write

ξ_{n}^{*} | ζ_{n} \to_{n \to \infty}^{D ([P] - a . s .)} ξ_{0} .

if for every real-valued bounded continuous function f holds

E [f (ξ_{n}^{*}) | ζ_{n}] \to_{n \to \infty}^{P} E [f (ξ_{0})],

then

ξ_{n}^{*}

conditioned on

ζ_{n}

is said to converge to

ξ_{0}

in distribution in probability

P

along

ζ_{n}

. In short, we write

ξ_{n}^{*} | ζ_{n} \to_{n \to \infty}^{D (P)} ξ_{0} .

Approaching in distribution to each other is often called weakly approaching to each other (almost surely or in probability along some sequence). The portmanteau lemma [22] ensures that Definition 1 is indeed appropriate. Hence, it gives equivalent characterizations for the weak convergence in distribution. To define convergence in probability

P^{*}

, it means, convergences on the conditional level, two types of convergence in probability

P^{*}

are defined: convergence in probability

P

and convergence in

[P]

-almost surely.

Definition 3.

Convergence in conditional probability. Let

{ξ_{n}, ξ_{n}^{*}, ζ_{n}}_{n = 1}^{\infty}

be sequences of random vectors/matrices. To say that

ξ_{n}^{*} - ξ_{n}

converges in probability

P_{ζ_{n}}^{*}

to zero

[P]

-almost surely as n tends to infinity, i.e.,

ξ_{n}^{*} - ξ_{n} \to_{n \to \infty}^{P_{ζ_{n}}^{*} ([P] - a . s .)} 0,

means

\forall ϵ > 0 : P [lim_{n \to \infty} P_{ζ_{n}}^{*} [∥ ξ_{n}^{*} - ξ_{n} ∥ \geq ϵ] = 0] = 1 .

(1)

to say that

ξ_{n}^{*} - ξ_{n}

converges in probability

P_{ζ_{n}}^{*}

to zero in probability

P

as n tends to infinity, i.e.,

ξ_{n}^{*} - ξ_{n} \to_{n \to \infty}^{P_{ζ_{n}}^{*} (P)} 0,

means

\forall ϵ > 0, \forall τ > 0 : lim_{n \to \infty} P [P_{ζ_{n}}^{*} [∥ ξ_{n}^{*} - ξ_{n} ∥ \geq ϵ] \geq τ] = 0 .

(2)

Alternatively, (1) and (2) from Definition 3 can be read as

\forall ϵ > 0 : \{P_{ζ_{n}}^{*} [∥ ξ_{n}^{*} - ξ_{n} ∥ \geq ϵ] \to_{n \to \infty}^{[P] - a . s .} 0\}

and

\forall ϵ > 0 : \{P_{ζ_{n}}^{*} [∥ ξ_{n}^{*} - ξ_{n} ∥ \geq ϵ] \to_{n \to \infty}^{P} 0\},

respectively.

In the same manner as above, we may define the convergence in probability

P^{*}

on the resampled level to a random variable

ξ_{0}

(does not depend on n).

Definition 4.

Convergence in conditional probability to a variable. Let

{ξ_{n}^{*}, ζ_{n}}_{n = 1}^{\infty}

be sequences of random vectors/matrices and

ξ_{0}

is a random vector/matrix. To say that

ξ_{n}^{*}

converges to

ξ_{0}

in probability

P_{ζ_{n}}^{*}

[P]

-almost surely as n tends to infinity, i.e.,

ξ_{n}^{*} \to_{n \to \infty}^{P_{ζ_{n}}^{*} ([P] - a . s .)} ξ_{0},

means that

ξ_{n}^{*} - ξ_{0}

converges in probability

P_{ζ_{n}}^{*}

to zero

[P]

-almost surely as n tends to infinity. To say that

ξ_{n}^{*}

converges to

ξ_{0}

in probability

P_{ζ_{n}}^{*}

in probability

P

as n tends to infinity, i.e.,

ξ_{n}^{*} \to_{n \to \infty}^{P_{ζ_{n}}^{*} (P)} ξ_{0},

means that

ξ_{n}^{*} - ξ_{0}

converges in probability

P_{ζ_{n}}^{*}

to zero in probability

P

as n tends to infinity.

3.1. Properties of the Bootstrap Convergences

Important results concerning previously defined types of convergences summarized in the forthcoming will play a crucial role later on. Prokhorov’s theorem can be extended into our setup.

Lemma 1.

Assume that

{ξ_{n}}_{n = 1}^{\infty}

is tight. Then the following statements are equivalent:

(i): $ξ_{n}^{*} | ζ_{n} ⟷_{n \to \infty}^{D (P)} ξ_{n} .$
(ii): For each subsequence ${n_{i}}_{i = 1}^{\infty}$ such that

$ξ_{n_{i}} \to_{i \to \infty}^{D} ξ_{0}$

for some random vector/matrix $ξ_{0}$ ,

$ξ_{n_{i}}^{*} | ζ_{n_{i}} \to_{i \to \infty}^{D (P)} ξ_{0}$

too.
(iii): For each subsequence ${n_{i}}_{i = 1}^{\infty}$ there exists a subsequence ${n_{i_{k}}}_{k = 1}^{\infty}$ such that $ξ_{n_{i_{k}}}^{*}$ conditional on $ζ_{n_{i_{k}}}$ converges in distribution in probability $P$ to the distributional limit of $ξ_{n_{i_{k}}}$ as $k \to \infty$ .

We need to extend Slutsky’s theorem for our “bootstrap world”, i.e., to have a stability property for conditional distributions.

Theorem 1.

Slutsky’s extended theorem. Suppose that

{ξ_{n}^{*}, ζ_{n}^{*}, χ_{n}}_{n = 1}^{\infty}

are sequences of random vectors/matrices. Then,

ξ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D ([P] - a . s .)} ξ_{0}

(3)

and

ζ_{n}^{*} \to_{n \to \infty}^{P_{χ_{n}}^{*} ([P] - a . s .)} ζ_{0},

(4)

where

ξ_{0}

is a random matrix/vector and

ζ_{0}

is a non-random element, implies (for suitable vector/matrix dimensions):

(i): $[ξ_{n}^{*}, ζ_{n}^{*}] | χ_{n} \to_{n \to \infty}^{D ([P] - a . s .)} [ξ_{0}, ζ_{0}];$
(ii): $[ζ_{n}^{*}, ξ_{n}^{*}] | χ_{n} \to_{n \to \infty}^{D ([P] - a . s .)} [ζ_{0}, ξ_{0}];$
(iii): $ξ_{n}^{*} + ζ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D ([P] - a . s .)} ξ_{0} + ζ_{0}$ ;
(iv): $ξ_{n}^{*} ζ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D ([P] - a . s .)} ξ_{0} ζ_{0}$ ;
(v): $ζ_{n}^{*} ξ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D ([P] - a . s .)} ζ_{0} ξ_{0}$ ;
(vi): ${(ζ_{n}^{*})}^{- 1} ξ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D ([P] - a . s .)} ζ_{0}^{- 1} ξ_{0}$ , provided that $ζ_{n}^{*}$ and $ζ_{0}$ are invertible;
(vii): $ξ_{n}^{*} {(ζ_{n}^{*})}^{- 1} | χ_{n} \to_{n \to \infty}^{D ([P] - a . s .)} ξ_{0} ζ_{0}^{- 1}$ , provided that $ζ_{n}^{*}$ and $ζ_{0}$ are invertible.

Moreover,

ξ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D (P)} ξ_{0}

(5)

and

ζ_{n}^{*} \to_{n \to \infty}^{P_{χ_{n}}^{*} (P)} ζ_{0},

(6)

where

ξ_{0}

is a random matrix/vector and

ζ_{0}

is a non-random element, implies (for suitable vector/matrix dimensions):

(viii): $[ξ_{n}^{*}, ζ_{n}^{*}] | χ_{n} \to_{n \to \infty}^{D (P)} [ξ_{0}, ζ_{0}];$
(ix): $[ζ_{n}^{*}, ξ_{n}^{*}] | χ_{n} \to_{n \to \infty}^{D (P)} [ζ_{0}, ξ_{0}];$
(x): $ξ_{n}^{*} + ζ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D (P)} ξ_{0} + ζ_{0}$ ;
(xi): $ξ_{n}^{*} ζ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D (P)} ξ_{0} ζ_{0}$ ;
(xii): $ζ_{n}^{*} ξ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D (P)} ζ_{0} ξ_{0}$ ;
(xiii): ${(ζ_{n}^{*})}^{- 1} ξ_{n}^{*} | χ_{n} \to_{n \to \infty}^{D (P)} ζ_{0}^{- 1} ξ_{0}$ , provided that $ζ_{n}^{*}$ and $ζ_{0}$ are invertible;
(xiv): $ξ_{n}^{*} {(ζ_{n}^{*})}^{- 1} | χ_{n} \to_{n \to \infty}^{D (P)} ξ_{0} ζ_{0}^{- 1}$ , provided that $ζ_{n}^{*}$ and $ζ_{0}$ are invertible.

3.2. Weak Dependence

In order to overcome the assumption of independent observations, the dependence between data needs to be specified. It is assumed that

{ξ_{n}}_{n = 1}^{\infty}

is a sequence of random elements on a probability space

(Ω, F, P)

. For sub-

σ

-fields

A, B \subseteq F

, we define

\begin{matrix} α (A, B) & : = sup_{A \in A, B \in B} |P (A \cap B) - P (A) P (B)|, \\ φ (A, B) & : = sup_{A \in A, B \in B, P (A) > 0} |P (B | A) - P (B)| . \end{matrix}

Intuitively,

α

and

φ

measure the dependence of the events in

B

on those in

A

. Henceforth, let us define a filtration

F_{m}^{n} : = σ (ξ_{i} \in F, m \leq i \leq n)

.

There are many ways how to describe weak dependence or, in other words, asymptotic independence of random variables [23]. Here, we concentrate on two approaches, however other ways of defining dependence can be involved, [24]. A sequence

{ξ_{n}}_{n = 1}^{\infty}

of random elements (e.g., variables) is said to be strong mixing (

α

-mixing) if

α (n) : = sup_{k \in N} α (F_{1}^{k}, F_{k + n}^{\infty}) \to 0, n \to \infty;

(7)

Moreover, it is said to be uniformly strong mixing (

φ

-mixing) if

φ (n) : = sup_{k \in N} φ (F_{1}^{k}, F_{k + n}^{\infty}) \to 0, n \to \infty .

(8)

Uniformly strong mixing–introduced by [25]–implies strong mixing [26], which was presented by [27]. Coefficients of dependence

α (n)

and

φ (n)

measure how much dependence exists between events separated by at least n observations or time periods [28].

In [29], a class of m-dependent processes was comprehensively and intensively analyzed. These types of time series are

φ

-mixing, since are finite order ARMA processes with innovations satisfying Doeblin’s condition, ([30], p. 168) or ([31], p. 192). Hence, the ARMA processes with continuously distributed stationary innovations and bounded variance are

α

-mixing and, thus,

φ

-mixing. Finite order processes, which do not satisfy Doeblin’s condition, can be shown to be

α

-mixing ([32], pp. 312–313). Reference [33] provides general conditions under which stationary Markov processes are

α

-mixing. Since functions of mixing processes are themselves mixing [23], time-varying functions of any of the processes just mentioned are mixing as well.

It has to be emphasized that any form of errors’ stationarity is not assumed. Omitting this, sometimes restrictive, assumption strengthen our results. It is obvious that

α (A, B) = α (B, A)

for arbitrary sub-

σ

-fields

A, B \subseteq F

. This type of symmetry does not hold for

φ

-dependence. Indeed, ([33], pp. 213–214) constructed some strictly stationary Markov chains that are

φ

-mixing but not “time-reversed”

φ

-mixing. Therefore, it is not possible to “interchange” the past with the future regarding the definition of the

φ

-mixing coefficient.

A strong law of large numbers (SLLN) for

α

-dependent non-identically distributed variables needs to be recalled.

Lemma 2.

Strong law of large numbers for α-mixing. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of α-mixing random variables satisfying

sup_{n \in N} E {| ξ_{n} |}^{q} < \infty

(9)

for some

q > 1

. Suppose that there exists

δ > 0

such that as

n \to \infty

,

α (n) = \{\begin{matrix} O (n^{- \frac{q}{2 q - 2} - δ}) & i f 1 < q < 2, \\ O (n^{- \frac{2}{q} - δ}) & i f q \geq 2 . \end{matrix}

(10)

then

lim_{n \to \infty} \frac{\sum_{i = 1}^{n} (ξ_{i} - E ξ_{i})}{n} = 0 a . s .

Furthermore, a SLLN for

φ

-dependent non-identically distributed variables is desired as well.

Lemma 3.

Strong law of large numbers for φ-mixing. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean φ-mixing random variables satisfying

\sum_{n = 1}^{\infty} \sqrt{φ (n)} < \infty

(11)

and let

{b_{n}}_{n = 1}^{\infty}

be a non-decreasing unbounded sequence of positive numbers. Assume that

\sum_{n = 1}^{\infty} \frac{E ξ_{n}^{2}}{b_{n}^{2}} < \infty,

(12)

then

lim_{n \to \infty} \frac{\sum_{i = 1}^{n} ξ_{i}}{b_{n}} = 0 a . s .

For a given random sequence

ξ_{\circ} \equiv {ξ_{n}}_{n = 1}^{\infty}

of random elements, the dependence coefficients

α (n)

will be denoted

α (ξ_{\circ}, n)

. Analogous notation is used for

φ

-mixing sequences. Moreover, an auxiliary lemma for latter application of the SLLN for non-identically distributed random variables is stated.

The following lemma describes an asymptotic behavior of

α

- and

φ

-mixing coefficients of the corresponding random sequences after a transformation. More precisely, the Borel transformation preserves the property of

α

- and

φ

-mixing and, moreover, sustains the rate of the mixing coefficients.

Lemma 4.

Suppose that for each

m = 1, 2, \dots

,

ξ_{\circ}^{(m)} : = {ξ_{k}^{(m)}}_{k \in Z}

is a sequence of random variables. Suppose the sequences

ξ_{\circ}^{(m)}

,

m = 1, 2, \dots

are independent of each other. Suppose that for each

k \in Z

,

h_{k} : R \times R \times \dots \to R

is a Borel function. Define the sequence

ξ_{\circ} : = {ξ_{k}}_{k \in Z}

of random variables by

ξ_{k} : = h_{k} (ξ_{k}^{(1)}, ξ_{k}^{(2)}, \dots), k \in Z .

then for each

n \geq 1

, the following statements hold:

(i): $α (ξ_{\circ}, n) \leq \sum_{m = 1}^{\infty} α (ξ_{\circ}^{(m)}, n)$ ,
(ii): $φ (ξ_{\circ}, n) \leq \sum_{m = 1}^{\infty} φ (ξ_{\circ}^{(m)}, n)$ .

Let

S_{n} : = \sum_{i = 1}^{n} ξ_{i}

and

ς_{n}^{2} : = V a r S_{n}

. For a random element from the Skorokhod space

D [0, 1]

,

W_{n} (t) : = \frac{S_{[n t]}}{ς_{n}}, 0 \leq t \leq 1,

(13)

where

[\cdot]

denotes the nearest integer function, a functional central limit theorem—also called a weak invariance principle—can be applied. This principle, now for

α

-mixing variables, will be postulated.

Lemma 5.

WIP for α-mixing. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean α-mixing random variables with

sup_{n \in N} E {| ξ_{n} |}^{2 + ω} < \infty

(14)

and

\sum_{n = 1}^{\infty} α {(n)}^{ω / (2 + ω)} < \infty

(15)

for some

ω > 0

. Suppose that

\frac{E S_{n}^{2}}{n} \to ς^{2} > 0, n \to \infty

(16)

is satisfied. Then

W_{n} \overset{D [0, 1]}{\to} W, n \to \infty .

(17)

where

W

stands for the standard Wiener process.

Since the central limit theorem is just a special case of the weak invariance principle, then a corollary of previous Lemma 5 can be stated.

Corollary 1.

Central limit theorem for α-mixing. Suppose that all the assumptions of Lemma 5 on a sequence of zero mean α-mixing random variables

{ξ_{n}}_{n = 1}^{\infty}

are satisfied. Then

\frac{S_{n}}{ς_{n}} \overset{D}{\to} N (0, 1), n \to \infty .

(18)

Lemma 6.

Lindeberg central limit theorem for φ-mixing. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean φ-mixing random variables having finite variance. Suppose that the Lindeberg condition

\forall δ > 0 : lim_{n \to \infty} \frac{1}{ς_{n}^{2}} \sum_{i = 1}^{n} E ξ_{i}^{2} I {| ξ_{i} | > δ ς_{n}} = 0

(19)

is satisfied. Then

\frac{S_{n}}{ς_{n}} \overset{D}{\to} N (0, 1), n \to \infty .

The Lindeberg condition (19) can be replaced by a stronger type of the Lyapunov condition. This fact leads into the following corollary, which is more comfortable for us from the point of applicability.

Corollary 2.

Central limit theorem for φ-mixing. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean φ-mixing random variables such that

sup_{n \in N} E {| ξ_{n} |}^{2 + ω} < \infty

(20)

for some

ω > 0

and

\frac{E S_{n}^{2}}{n} \to ς^{2} > 0, n \to \infty .

(21)

then

\frac{S_{n}}{ς_{n}} \overset{D}{\to} N (0, 1), n \to \infty .

(22)

Assumption (21) may even be replaced by a weaker one:

\underset{n \to \infty}{lim inf} \frac{E S_{n}^{2}}{n} = ς^{2} > 0,

where the limit inferior is used instead of the original limit. Assuming that a sequence of random variables is

φ

-mixing implies that this sequence is

α

-mixing. On the other hand, the central limit theorem for

φ

-mixing (Corollary 2) has weaker assumptions than the central limit theorem for

α

-mixing (Corollary 1). Indeed, Corollary 2 does not require any assumption on mixing rate

φ (n)

such as Assumption (15) on

α

-mixing rates.

The previous theoretical results contain the variance

ς_{n}^{2}

for the underlying process of random variables. One may additionally assume that

lim_{n \to \infty} \frac{ς_{n}^{2}}{n} = ς^{2},

where

ς^{2}

is the so-called long run variance, which sometimes needs to be estimated. If

{ξ_{n}}_{n = 1}^{\infty}

are zero mean and stationary, then this long run variance can be decomposed in the following way

ς^{2} = E ξ_{1}^{2} + 2 \sum_{i = 1}^{\infty} E ξ_{1} ξ_{i + 1} .

Often, the Bartlett estimator is used to estimate the long run variance, i.e.,

{\hat{ς}}_{n}^{2} (m) = \hat{R} (0) + 2 \sum_{1 \leq k \leq m} (1 - \frac{k}{m}) \hat{R} (k), m < n,

where

\hat{R} (k) = \frac{1}{n} \sum_{1 \leq i \leq n - k} (ξ_{i} - {\bar{ξ}}_{n}) (ξ_{i + k} - {\bar{ξ}}_{n}), 0 \leq k < n .

The consistency properties of the above described Bartlett estimator and of its modification are studied in [34]. Other similar types of estimators can be used instead, for instance Parzen kernels [35].

4. Bootstrap Laws of Large Numbers

A theoretical mid-step for showing validity of the bootstrap methods are the bootstrap laws of large numbers. From now on, we mean by a bootstrap version of

ξ \equiv {[ξ_{1}, \dots, ξ_{n}]}^{⊤}

its (randomly) resampled sequence with replacement—denoted by

ξ^{*} \equiv {[ξ_{1}^{*}, \dots, ξ_{n}^{*}]}^{⊤}

—with the same length, where for each

i \in {1, \dots, n}

holds

P_{ξ}^{*} [ξ_{i}^{*} = ξ_{j}] = 1 / n, j = 1, \dots, n

. So,

ξ_{i}^{*}

has a discrete uniform distribution on

{ξ_{1}, \dots, ξ_{n}}

for every

i = 1, \dots, n

.

4.1. Bootstrap Weak LLN for Independent Data

First of all, a base stone for the consistency of the independent bootstrap lies in the bootstrap weak law of large numbers (BWLLN).

Theorem 2.

Bootstrap weak law of large numbers. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of independent random variables. If

sup_{n \in N} E_{P} ξ_{n}^{2} < \infty,

(23)

then

n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*} - n^{- 1} \sum_{i = 1}^{n} ξ_{i} \to_{n \to \infty}^{P_{ξ}^{*} (P)} 0,

(24)

where

ξ^{*} \equiv {[ξ_{1}^{*}, \dots, ξ_{n}^{*}]}^{⊤}

is the bootstrapped version of

ξ \equiv {[ξ_{1}, \dots, ξ_{n}]}^{⊤}

.

4.2. Bootstrap Weak LLNs for NINID

Now, the BWLLN for the non-stationary

α

-mixing is given.

Theorem 3.

Bootstrap weak law of large numbers for α-mixing. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean α-mixing random variables satisfying

sup_{n \in N} E ξ_{n}^{2} < \infty .

(25)

assume that there exists

δ > 0

such that

α (n) = O (n^{- 1 - δ}), n \to \infty .

(26)

if

b \to \infty

and

b = o (n^{1 / 2}), n \to \infty

, then, under the MBB Procedure 2,

n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*} - n^{- 1} \sum_{i = 1}^{n} ξ_{i} \to_{n \to \infty}^{P_{ξ}^{*} (P)} 0,

where

ξ^{*} \equiv {[ξ_{1}^{*}, \dots, ξ_{n}^{*}]}^{⊤}

is the MBB version of

ξ \equiv {[ξ_{1}, \dots, ξ_{n}]}^{⊤}

.

Similarly, the BWLLN is stated for the

φ

-mixing sequences.

Theorem 4.

Bootstrap weak law of large numbers for φ-mixing. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean φ-mixing random variables satisfying

\sum_{n = 1}^{\infty} \frac{E ξ_{n}^{2}}{n^{2}} < \infty

and

\sum_{n = 1}^{\infty} \sqrt{φ (n)} < \infty .

(27)

if

b \to \infty

and

b = o (n^{1 / 2}), n \to \infty

, then, under the MBB Procedure 2,

n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*} - n^{- 1} \sum_{i = 1}^{n} ξ_{i} \to_{n \to \infty}^{P_{ξ}^{*} (P)} 0,

where

ξ^{*} \equiv {[ξ_{1}^{*}, \dots, ξ_{n}^{*}]}^{⊤}

is the MBB version of

ξ \equiv {[ξ_{1}, \dots, ξ_{n}]}^{⊤}

.

5. Bootstrap Central Limit Theorems

If a statistic converges to a normal distribution, then the interest is to asymptotically compare this original limiting distribution to the bootstrap one. The main theoretical results for the asymptotic validity of the bootstrap methods lie in showing that the bootstrap distribution properly mimics the asymptotic one.

5.1. Bootstrap CLT for Independent Data

A pylon for the proof of central limit theorem for the bootstrapped sample is created by an extension of the well-known Berry–Esseen theorem.

Theorem 5.

Berry–Esseen-Katz theorem. Let g be a non-negative, even, non-decreasing function on

[0, \infty)

satisfying:

(i): ${lim}_{x \to \infty} g (x) = \infty$ ,
(ii): $x / g (x)$ is defined for all $x \in R$ and non-decreasing on $[0, \infty)$ .

Assume that

{ξ_{n}}_{n = 1}^{\infty}

are IID random variables such that

E ξ_{1} = 0

and

V a r ξ_{1} = ς^{2} > 0

. If

E ξ_{1}^{2} g (ξ_{1}) < \infty,

then there exists a constant

C > 0

, such that for all

n \in N

holds

sup_{x \in R} |P [\frac{1}{\sqrt{n ς^{2}}} \sum_{i = 1}^{n} ξ_{i} \leq x] - \int_{- \infty}^{x} \frac{1}{\sqrt{2 π}} exp \{- \frac{t^{2}}{2}\} d t| \leq \frac{C E ξ_{1}^{2} g (ξ_{1} / ς)}{ς^{2} g (\sqrt{n})} .

A bootstrap central limit theorem provides the desired approximate distributional closeness and, thus, appropriateness of the bootstrap.

Theorem 6.

Bootstrap central limit theorem for independent variables. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean independent random variables satisfying

sup_{n \in N} E_{P} ξ_{n}^{4} < \infty .

(28)

suppose that

ξ^{*} \equiv {[ξ_{1}^{*}, \dots, ξ_{n}^{*}]}^{⊤}

is the bootstrapped version of

ξ \equiv {[ξ_{1}, \dots, ξ_{n}]}^{⊤}

and denote

{\bar{ξ}}_{n} : = n^{- 1} \sum_{i = 1}^{n} ξ_{i}, {\bar{ξ}}_{n}^{*} : = n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*}, a n d ς_{n}^{2} : = \sum_{i = 1}^{n} V a r_{P} ξ_{i} .

if

\underset{n \to \infty}{lim inf} \frac{ς_{n}^{2}}{n} = ς^{2} > 0,

(29)

then

sup_{x \in R} |P_{ξ}^{*} [\frac{n}{\sqrt{ς_{n}^{2}}} ({\bar{ξ}}_{n}^{*} - {\bar{ξ}}_{n}) \leq x] - P [\frac{n}{\sqrt{ς_{n}^{2}}} {\bar{ξ}}_{n} \leq x]| \to_{n \to \infty}^{P} 0 .

(30)

Assumption (28) may seem a little bit restrictive. It may be weakened to

\sum_{n = 1}^{\infty} \frac{E_{P} ξ_{n}^{4}}{n^{2}} < \infty and sup_{n \in N} E_{P} {| ξ_{n} |}^{2 + ω} < \infty, for some ω > 0;

As found out while going through the proof of Theorem 6. On the other hand, the previous weaker premises are rather complicated to verify for transformed errors in the proof of bootstrap consistency, which will be noticed later on. Reference [36] strengthened Assumption (28) and replaced by

sup_{n \in N} E_{P} {| \sqrt{n} ξ_{n} |}^{4 + ω} < \infty,

(31)

For some

ω > 0

. Afterwards, bootstrap CLT 6 provides a stronger result:

sup_{x \in R} |P_{ξ}^{*} [\frac{n}{\sqrt{ς_{n}^{2}}} ({\bar{ξ}}_{n}^{*} - {\bar{ξ}}_{n}) \leq x] - P [\frac{n}{\sqrt{ς_{n}^{2}}} {\bar{ξ}}_{n} \leq x]| \to_{n \to \infty}^{[P] - a . s .} 0,

(32)

whereas the convergence in distribution in probability is replaced by the convergence in distribution almost surely. In spite of this, Assumption (31) can be considered as too much restrictive.

Our situation would become much easier, when IID variables are assumed. Let us have a look at the proof of Theorem 6. If

E_{P} {| ξ_{1} |}^{2 + ω} < \infty

for some

ω > 0

is additionally assumed, then the right-hand side of (A7) converges to zero

[P]

-almost surely. In addition, finite

(2 + ω)

-th moment would be enough to prove (ii) and, thus, Relation (32) holds under such conditions.

The equiboundedness of the fourth moments (28) in the bootstrap CLT is needed, because the second conditional moment is necessary for the existence of

V a r_{P_{ξ}^{*}} ξ_{1}^{*}

and, consequently, the equiboundedness of the second moment of the second conditional moment is used for the convergence

[P]

-almost surely of

V a r_{P_{ξ}^{*}} ξ_{1}^{*}

. The equiboundedness of the fourth moments in the BCLT can be weakened when bearing in mind identically distributed variables.

Let us concentrate on the conditional variance of sum of the resampled data. The conditional variance

V a r_{P_{ξ}^{*}} ξ_{1}^{*}

used for the normalization in (30) may be simply replaced by

n^{- 1} ς_{n}^{2}

, see the proof of the BCLT (6). Therefore, is makes the assertion of the BCLT even stronger and, hence, more applicable.

A utilization of the Cramér–Wold device helps us to derive a bootstrap version of the CLT for random vectors.

Theorem 7.

Bootstrap multivariate central limit theorem for independent vectors. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean independent q-dimensional random vectors satisfying

sup_{n \in N} E_{P} {| ξ_{j, n} |}^{4} < \infty, j \in {1, \dots, q},

(33)

where

ξ_{n} \equiv {[ξ_{1, n}, \dots, ξ_{q, n}]}^{⊤} \in R^{q}, n \in N

. Assume that

Ξ^{*} \equiv {[ξ_{1}^{*}, \dots, ξ_{n}^{*}]}^{⊤}

is the bootstrapped version of

Ξ \equiv {[ξ_{1}, \dots, ξ_{n}]}^{⊤}

. Denote

{\bar{ξ}}_{n} : = n^{- 1} \sum_{i = 1}^{n} ξ_{i}, {\bar{ξ}}_{n}^{*} : = n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*}, a n d Γ_{n} : = \sum_{i = 1}^{n} V a r_{P} ξ_{i} .

if

lim_{n \to \infty} \frac{1}{n} Γ_{n} = Γ > 0,

(34)

then

n Γ_{n}^{- 1 / 2} ({\bar{ξ}}_{n}^{*} - {\bar{ξ}}_{n}) | Ξ ⟷_{n \to \infty}^{D (P)} n Γ_{n}^{- 1 / 2} {\bar{ξ}}_{n}

(35)

and, moreover,

\sqrt{n} ({\bar{ξ}}_{n}^{*} - E_{P^{*}} ξ_{1}^{*}) | Ξ ⟷_{n \to \infty}^{D (P)} \sqrt{n} {\bar{ξ}}_{n} .

(36)

5.2. Bootstrap CLTs for NINID

Lastly, the central limit theorems for the bootstrapped sample mean from non-stationary strong mixing or uniformly strong mixing sequences are stated.

Theorem 8.

Bootstrap central limit theorem for α-mixing. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean α-mixing random variables with

sup_{n \in N} E {| ξ_{n} |}^{4 + ω} < \infty

(37)

and

α (n) = O (n^{- 1 - δ}), n \to \infty

(38)

for some

ω > 0

and

δ > 0

such that

\frac{4}{ω} < δ .

(39)

denote

{\bar{ξ}}_{n} : = n^{- 1} \sum_{i = 1}^{n} ξ_{i} a n d {\bar{ξ}}_{n}^{*} : = n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*} .

suppose that

\underset{n \to \infty}{lim inf} \frac{E S_{n}^{2}}{n} = ς^{2} > 0

(40)

is satisfied. If

b \to \infty

and

b = o (n^{1 / 2}), n \to \infty

, then, under the MBB Procedure 2,

sup_{x \in R} |P_{ξ}^{*} (\frac{n}{\sqrt{E S_{n}^{2}}} ({\bar{ξ}}_{n}^{*} - {\bar{ξ}}_{n}) \leq x) - P (\frac{n}{\sqrt{E S_{n}^{2}}} {\bar{ξ}}_{n} \leq x)| \overset{P}{\to} 0, n \to \infty,

(41)

where

ξ^{*} \equiv {[ξ_{1}^{*}, \dots, ξ_{n}^{*}]}^{⊤}

is the MBB version of

ξ \equiv {[ξ_{1}, \dots, ξ_{n}]}^{⊤}

.

Similar theorem for IID random variables was proved by [37]. Furthermore, a version of this BCLT in case of stationary

α

-mixing is given by [11].

Theorem 9.

Bootstrap central limit theorem for φ-mixing. Let

{ξ_{n}}_{n = 1}^{\infty}

be a sequence of zero mean φ-mixing random variables with

sup_{n \in N} E {| ξ_{n} |}^{4} < \infty

(42)

and

φ (n) = O (n^{- 2 - δ}), n \to \infty

(43)

for some

δ > 0

. Denote

{\bar{ξ}}_{n} : = n^{- 1} \sum_{i = 1}^{n} ξ_{i} a n d {\bar{ξ}}_{n}^{*} : = n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*} .

suppose that

\underset{n \to \infty}{lim inf} \frac{E S_{n}^{2}}{n} = ς^{2} > 0

(44)

is satisfied. If

b \to \infty

and

b = o (n^{1 / 2}), n \to \infty

, then, under the MBB Procedure 2,

sup_{x \in R} |P_{ξ}^{*} (\frac{n}{\sqrt{E S_{n}^{2}}} ({\bar{ξ}}_{n}^{*} - {\bar{ξ}}_{n}) \leq x) - P (\frac{n}{\sqrt{E S_{n}^{2}}} {\bar{ξ}}_{n} \leq x)| \overset{P}{\to} 0, n \to \infty,

(45)

where

ξ^{*} \equiv {[ξ_{1}^{*}, \dots, ξ_{n}^{*}]}^{⊤}

is the MBB version of

ξ \equiv {[ξ_{1}, \dots, ξ_{n}]}^{⊤}

.

With respect to Theorem 7, one may also postulate the multivariate version of the central limit theorem for the

α

- and

φ

-mixing accordingly, based on the bootstrap univariate CLT (8) and (9) using the Cramér–Wold device.

6. Real Data Analyses

If the data are indeed not independent and/or not identically distributed, using a proper resampling method can lead to, for instance, improvement in precision. We are going to deal with two real problems, where the first one comes from psychometric evaluation and the second one has grounds in non-life insurance. Handling the data as NINID will provide narrower confidence intervals for the mean parameter in both cases compared to the incorrect approach, when the data are falsely considered as IID.

6.1. Psychometry

Psychometric evaluation through a test with many binary (i.e., 0–1) answers from one subject naturally brings a question, what is the probability of the “correct” answer. This parameter of interest for the underlying Bernoulli distribution is nothing else than its mean. Thus, the sample mean is the consistent and asymptotically normal estimator of the parameter of probability of the correct answer. If the data are dependent, one cannot rely on normal asymptotics when constructing the confidence interval for the mean of the Bernoulli distribution, because the corresponding limiting variance is unknown [38,39,40]. Therefore, bootstrapping can provide a suitable solution. However, a proper bootstrap technique needs to be chosen and applied.

The above described issue is going to be illustrated on the dataset from “Programme for International Student Assessment (PISA) 2012 U.S. Math Assessment”, which can be downloaded from (https://tmsalab.github.io/edmdata/, accessed on 12 November 2022). This dataset was discussed in [41]. We concentrated on one subject, who was randomly selected (ID = 15) and s/he responded on

n = 34

items. The dependence between the subject’s answers comes from the fact that all the items are answered by the same subject, having the same abilities, knowledge, and state of mind during the examination.

We utilize the independent bootstrap (ignoring the dependence among the responses) and the moving block bootstrap with the blocksize

b = ⌊ n^{1 / 3} ⌋ = 3

. The results for resampling the sample mean are shown in Table 1 and in Figure 1.

One may notice, from Table 1 as well as from Figure 1, that the difference between the 97.5th percentile and the 2.5th percentile, which serves as a 95% confidence interval, is narrower in case of the moving block bootstrap approach. Indeed, the 95% confidence interval based on independent bootstrap is [0.2647, 0.6176], whereas the 95% confidence interval based on moving block bootstrap is [0.3125, 0.5938]. Hence, the lengths of the confidence intervals are 0.3529 and 0.2813 for the independent and the moving block bootstrap, respectively, which yields approximately 20% reduction in the confidence interval’s length and, thus, an increase in estimation precision. In the end, the standard deviation for the independent bootstrap is 0.0851, whereas the standard deviation for the mooving block bootstrap drops down to 0.0764, which also provides an increase in precision by taking into account the underlying dependence among the data.

6.2. Insurance

In non-life insurance, the claims are traditionally split into the attritional and large claims, see, for example, [42]. One of the most fundamental tasks in non-life insurance, conducted on regular basis, is risk reserving assessment analysis, which amounts to predict the overall loss reserves to cover possible future claims. In order to do that, one firstly needs to estimate stochastically the distributional characteristics of the historical claims [43]. Here, we concentrate on estimation of the unknown mean for a reinsurance caped layer of large claims. Since the considered reinsurance layer will be bounded, there is no issue of assuming that the expectation of the underlying claims belonging to this layer exists.

We use the large fire insurance claims in Denmark from Thursday 3rd January 1980 until Monday 31st December 1990. The data called “Danish Fire Insurance Claims” were supplied by Mette Rytgaard of Copenhagen Re. They were described in [44] and can be freely downloaded from (https://tmsalab.github.io/edmdata/, accessed on 12 November 2022). Note that these data form an irregular time series, which does not contradict the notion of weak dependence. On contrary, the concept of mixing is very suitable for the unequally spaced time series [45]. We think of the reinsurance caped layer covering the large claims starting from 20M DKK (Danish crowns) up to 100M DKK, which consists n = 33 fire claims. The parameter of interest for the underlying unknown distribution is the mean. Again, it exists due to the fact that the layer is bounded from below as well as from above. Hence, the sample mean is the consistent and asymptotically normal estimator of the mean parameter. If the data are dependent and non-stationary (and, thus, not identically distributed), one cannot rely on normal asymptotics when constructing the confidence interval for the mean, because the corresponding limiting variance becomes unknown [46,47,48]. Therefore, the proper bootstrap method can provide a suitable solution.

The independent bootstrap (ignoring the dependence and non-stationarity among the fire claims) and the moving block bootstrap with the blocksize

b = ⌊ n^{1 / 3} ⌋ = 3

are applied. The results for resampling the sample mean are numerically displayed in Table 2. Additional to that, the empirical distributions of the sample mean based on the both resampling techniques together with the empirical distributional quantities from Table 2 are graphically visualized in Figure 2.

Table 2 as well as Figure 2 clearly reveal that the 95% confidence interval, which is the difference between the 97.5th percentile and the 2.5th percentile, is narrower in case of the moving block bootstrap procedure. Truly, the 95% confidence interval based on independent bootstrap is [27.94, 35.90], whereas the 95% confidence interval based on moving block bootstrap is [29.39, 35.38]. Hence, the lengths of the confidence intervals are 7.96 and 5.99 for the independent and the moving block bootstrap, respectively, which gives approximately 25% reduction in the confidence interval’s length. This provides an increase in estimation precision, which can be incorporated and employed in the reserving techniques for not independent and not identically distributed data, see [49,50,51,52]. On top of that, the standard deviation for the independent bootstrap is 2.054, whereas the standard deviation for the mooving block bootstrap drops down to 1.521, which again confirms an increase in accuracy by assuming the underlying dependence in the data.

7. Conclusions and Discussion

Asymptotic normality of the estimators or test statistics might be computationally unattainable and, hence, it can fail in practical applications. The discussed disadvantageous asymptotic properties lead to usage of the distributional-free resampling methods even for the IID observations. On top of that, the bootstrap procedures for a class of not independent and not identically distributed data are given together with their theoretical validity. There are three main methodological contributions to the bootstrap inference developed in this paper: (i) asymptotic closeness of the unknown stochastic quantities and their bootstrap counterparts is properly mathematically formalized; (ii) laws of large numbers for the bootstrapped

α

-mixing and

φ

-mixing random variables and vectors are postulated and proved; (iii) central limit theorems for the bootstrapped NINID observations are provided, which serve as justifications of the popular computer intensive techniques. Lastly, the theoretically valid approaches are practically performed on the problems from psychometry and insurance.

The concept of weak dependence provides a very flexible framework and it allows to handle any autocorrelation structure for the observations, if the mixing conditions are satisfied. Furthermore, if the parameter of interest is not the mean, our developed methodology can still be useful. Suppose that the corresponding estimator of the unknown parameter can be somehow linearized and, thus, rewritten as a continuous functional of a mean of the random variables plus some remainder, which is negligible in probability. This can be achieved, for instance, through the stochastic Taylor expansion. Then, one may apply the derived bootstrap machinery on the mean from the continuous functional and, additionally, use the continuous mapping theorem to obtain validity for bootstrapping the parameter of interest.

Author Contributions

All the authors M.H., M.M., B.P., and M.P.–contributed equally. All authors have read and agreed to the published version of the manuscript.

Funding

The research of Martin Hrba, Matúš Maciak, and Michal Pešta was funded by the Czech Science Foundation project GAČR No. 21-13323S. The work of Barbora Peštová was supported by the Czech Science Foundation project GAČR No. 21-03658S.

Data Availability Statement

There are two datasets used in this paper and both are publicly available. The first dataset–Programme for International Student Assessment (PISA) 2012 U.S. Math Assessment–can be downloaded from https://tmsalab.github.io/edmdata/, accessed on 12 November 2022 [41]. The second dataset–Danish Fire Insurance Claims–can be downloaded from https://vincentarelbundock.github.io/, accessed on 12 November 2022 [44].

Acknowledgments

The authors would like to thank three anonymous referees for their constructive comments and valuable remarks that have significantly improved this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IID	Independent and identically distributed
NINID	Not independent and not identically distributed
LLN	Law of large numbers
SLLN	Strong law of large numbers
WLLN	Weak law of large numbers
BWLLN	Bootstrap weak law of large numbers
CLT	Central limit theorem
BCLT	Bootstrap central limit theorem

Appendix A. Proofs

Proof of Lemma 1.

A simple generalization of [53] (Lemma 1), where conditional law of

ξ_{n}^{*} | ζ_{n}

instead of

ξ_{n}^{*}

is considered in the proof as proposed by [54] (Proof of Theorem 4.1). □

Proof of Theorem 1.

First, we show that

[ξ_{n}^{*}, ζ_{0}] | χ_{n} ⟷_{n \to \infty}^{D ([P] - a . s .)} [ξ_{0}, ζ_{0}],

(A1)

i.e., for arbitrary bounded continuous function f holds

E_{P_{χ_{n}}^{*}} f ([ξ_{n}^{*}, ζ_{0}]) - E_{P} f ([ξ_{0}, ζ_{0}]) \to_{n \to \infty}^{[P] - a . s .} 0 .

(A2)

Let

f ([\cdot, \cdot])

be such arbitrary bounded continuous function. Now consider the function of a single argument

g (\cdot) : = f ([\cdot, ζ_{0}])

. This will obviously be a bounded and continuous non-random function as well. By assumption (3), we will have that

E_{P_{χ_{n}}^{*}} g (ξ_{n}^{*}) - E_{P} g (ξ_{n}) \to_{n \to \infty}^{[P] - a . s .} 0 .

However, the latter expression is equivalent to (A2). Therefore, we now know that

[ξ_{n}^{*}, ζ_{0}]

conditioned on

χ_{n}

and

[ξ_{0}, ζ_{0}]

approach each other in distribution

[P]

-almost surely along

χ_{n}

.

Secondly, consider

∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ = ∥ ζ_{n}^{*} - ζ_{0} ∥

. This expression converges in probability

P_{χ_{n}}^{*}

to zero

[P]

-almost surely due to Assumption (4). Thus, we have demonstrated two facts: (A1) and

[ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] \to_{n \to \infty}^{P_{χ_{n}}^{*} ([P] - a . s .)} 0 .

(A3)

Since each bounded continuous function is bounded Lipschitz and vice versa, consider any bounded Lipschitz function

h (\cdot, \cdot)

:

\begin{matrix} \exists K, M > 0, \forall x_{1}, x_{2}, y_{1}, y_{2} : \\ h ([x_{1}, x_{2}]) \leq M & | h ([x_{1}, x_{2}]) - h ([y_{1}, y_{2}]) | \leq K ∥ [x_{1}, x_{2}] - [y_{1}, y_{2}] ∥ . \end{matrix}

Take some arbitrary

ϵ > 0

and majorize

\begin{matrix} | E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{n}^{*}]) - E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{0}]) | \leq E_{P_{χ_{n}}^{*}} | h ([ξ_{n}^{*}, ζ_{n}^{*}]) - h ([ξ_{n}^{*}, ζ_{0}]) | \\ = E_{P_{χ_{n}}^{*}} [| h ([ξ_{n}^{*}, ζ_{n}^{*}]) - h ([ξ_{n}^{*}, ζ_{0}]) | I {∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ < ϵ}] \\ + E_{P_{χ_{n}}^{*}} [| h ([ξ_{n}^{*}, ζ_{n}^{*}]) - h ([ξ_{n}^{*}, ζ_{0}]) | I {∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ \geq ϵ}] \\ \leq E_{P_{χ_{n}}^{*}} [K ∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ I {∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ < ϵ}] \\ + E_{P_{χ_{n}}^{*}} [2 M I {∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ \geq ϵ}] \\ \leq K ϵ P_{χ_{n}}^{*} [∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ < ϵ] + 2 M P_{χ_{n}}^{*} [∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ \geq ϵ] \\ \leq K ϵ + 2 M P_{χ_{n}}^{*} [∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ \geq ϵ] [P] - a . s . \end{matrix}

Hence,

\begin{matrix} | E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{n}^{*}]) - E_{P} h ([ξ_{0}, ζ_{0}]) | \\ \leq | E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{n}^{*}]) - E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{0}]) | + | E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{0}]) - E_{P} h ([ξ_{0}, ζ_{0}]) | \\ \leq K ϵ + 2 M P_{χ_{n}}^{*} [∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ \geq ϵ] \\ + | E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{0}]) - E_{P} h ([ξ_{0}, ζ_{0}]) | [P] - a . s . \end{matrix}

(A4)

We take the limit in this expression as

n \to \infty

. Since (A4) holds for arbitrary

ϵ > 0

, the second term will go to zero

[P]

-almost surely due to (A3) and Definition 3. The third term (does not depend on

ϵ

) will also converge to zero

[P]

-almost surely by (A2). Thus,

lim_{n \to \infty} | E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{n}^{*}]) - E_{P} h ([ξ_{0}, ζ_{0}]) | \leq K ϵ [P] - a . s .

Since

ϵ

was arbitrary, we conclude that the limit must in fact be equal to zero

[P]

-almost surely. Therefore, (i) is proved.

In order to prove result (viii), consider again

∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ = ∥ ζ_{n}^{*} - ζ_{0} ∥

. This expression converges in probability

P_{χ_{n}}^{*}

in probability

P

to zero due to assumption (6). Thus

[ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] \to_{n \to \infty}^{P_{χ_{n}}^{*} (P)} 0,

(A5)

Which can be rewritten according to Definition 3 as

\forall ϵ > 0 : \{P_{χ_{n}}^{*} [∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ \geq ϵ] \to_{n \to \infty}^{P} 0\}

Let us fix

ϵ > 0

. Similarly as in the proof of part (i), it can be demonstrated that

E_{P_{χ_{n}}^{*}} f ([ξ_{n}^{*}, ζ_{0}]) - E_{P} f ([ξ_{0}, ζ_{0}]) \to_{n \to \infty}^{P} 0,

(A6)

When convergence

[P]

-almost surely is just replaced by convergence in probability

P

. Indeed, using inequality (A4), we get for arbitrary

τ > 0

(and above chosen fixed and sufficiently small

ϵ > 0

)

\begin{matrix} P [| E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{n}^{*}]) - E_{P} h ([ξ_{0}, ζ_{0}]) | \geq τ] \\ \leq P [K ϵ \geq τ] + P [2 M P_{χ_{n}}^{*} [∥ [ξ_{n}^{*}, ζ_{n}^{*}] - [ξ_{n}^{*}, ζ_{0}] ∥ \geq ϵ] \geq τ] \\ + P [| E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{0}]) - E_{P} h ([ξ_{0}, ζ_{0}]) | \geq τ] . \end{matrix}

If we take the limit in previous inequality as n goes to infinity, the first term is zero and the second term will go to zero due to (A5) and Definition 3. The third term will also converge to zero by (A6). Thus

lim_{n \to \infty} P [| E_{P_{χ_{n}}^{*}} h ([ξ_{n}^{*}, ζ_{n}^{*}]) - E_{P} h ([ξ_{0}, ζ_{0}]) | \geq τ] = 0 .

Since

τ

was arbitrary, (viii) is proved.

Assertions (ii)–(vii) are just corollaries of (i), when continuous mapping theorem (see, e.g., Theorems 2 and 3 [55]) is applied. Similarly, assertions (ix)–(xiv) are consequences of (viii). □

Proof of Lemma 2.

See [56] (Theorem 1). □

Proof of Lemma 3.

See [57] (Theorem 4.1). □

Proof of Lemma 4.

See [23] (Theorem 5.2). □

Proof of Lemma 5.

See [58] or [26] (Corollary 3.2.1). □

Proof of Corollary 1.

Since a functional distributional limit (a convergence in Skorokhod space) implies the pointwise distributional limit, this corollary is just a special case of Lemma 5, when

W_{n} (1)

is considered. □

Proof of Lemma 6.

See [59] (Corollary 4). □

Proof of Corollary 2.

We show that Assumptions (20) and (21) implies the Lindeberg Condition (19) from Lemma 6.

The first step is to show that Conditions (20) and (21) implies the so-called Lyapunov condition, i.e., having fixed

ω > 0

:

\frac{1}{ς_{n}^{2 + ω}} \sum_{i = 1}^{n} E | ξ_{i} |^{2 + ω} \leq \frac{1}{ς_{n}^{2 + ω}} \sum_{i = 1}^{n} sup_{ι \in N} E | ξ_{ι} |^{2 + ω} = \frac{n}{ς_{n}^{2 + ω}} sup_{ι \in N} E {| ξ_{ι} |}^{2 + ω} \to 0, n \to \infty .

Now, the Lyapunov condition

{lim}_{n \to \infty} ς_{n}^{- 2 - ω} \sum_{i = 1}^{n} E {| ξ_{i} |}^{2 + ω} = 0

holds and we fix

δ > 0

. Since

| ξ_{i} | > δ ς_{n}

implies

| ξ_{i} / δ ς_{n} |^{ω} > 1

, we obtain

\begin{matrix} \frac{1}{ς_{n}^{2}} \sum_{i = 1}^{n} E ξ_{i}^{2} I {| ξ_{i} | > δ ς_{n}} \leq \frac{1}{δ^{ω} ς_{n}^{2 + ω}} \sum_{i = 1}^{n} E | ξ_{i} |^{2 + ω} I {| ξ_{i} | > δ ς_{n}} \\ \leq \frac{1}{δ^{ω} ς_{n}^{2 + ω}} \sum_{i = 1}^{n} E {| ξ_{i} |}^{2 + ω} \to 0, n \to \infty . \end{matrix}

□

Proof of Theorem 2.

The SLLN applied for independent random variables together with Assumption (23) lead to

n^{- 1} \sum_{i = 1}^{n} (ξ_{i} - E_{P} ξ_{i}) \to_{n \to \infty}^{[P] - a . s .} 0 .

Markov’s inequality with (23) implies uniform equiboundedness in probability

P

of

ξ_{n}^{2}

. The conditional variance of the bootstrapped sample mean goes to zero as n increases to infinity, because

\begin{matrix} V a r_{P_{ξ}^{*}} (n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*}) = n^{- 1} V a r_{P_{ξ}^{*}} ξ_{1}^{*} = n^{- 1} [E_{P_{ξ}^{*}} ξ_{1}^{* 2} - {(E_{P_{ξ}^{*}} ξ_{1}^{*})}^{2}] \\ = n^{- 1} [\sum_{k = 1}^{n} n^{- 1} ξ_{k}^{2} - {(\sum_{k = 1}^{n} n^{- 1} ξ_{k})}^{2}] = O_{P} (n^{- 1}), n \to \infty . \end{matrix}

Hence, the weak law of large numbers in the “starred” world (for resampled variables) provides

n^{- 1} \sum_{i = 1}^{n} (ξ_{i}^{*} - E_{P_{ξ}^{*}} ξ_{i}) = n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*} - n^{- 1} \sum_{i = 1}^{n} ξ_{i} \to_{n \to \infty}^{P_{ξ}^{*} (P)} 0,

Because

ξ_{i}^{*}

are conditionally IID. □

Proof of Theorem 3.

The sequence

{ξ_{n}}_{n = 1}^{\infty}

is uniformly bounded in probability

P

, because (25) is assumed. By the assumptions for

{ξ_{n}}_{n = 1}^{\infty}

and Lemma 2, it is known that a SLLN holds for the sample mean

n^{- 1} \sum_{i = 1}^{n} ξ_{i}

. By Corollary A2 from [12], it follows that

V a r_{P_{ξ}^{*}} (n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*}) = O_{P} (n^{- 1}), n \to \infty .

Thus the claim holds due to Lemma A1 by [12], where

E_{P_{ξ}^{*}} n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{*} = n^{- 1} \sum_{i = 1}^{n} ξ_{i} + o_{P} (n^{- 1 / 2}), n \to \infty .

□

Proof of Theorem 4.

The proof is the same as the proof of previous Lemma 3 except one detail–the SLLN for

φ

-mixing (Lemma 3) has to be applied instead of Lemma 2 for

α

-mixing. □

Proof of Theorem 5.

See [60]. □

Proof of Theorem 6.

The Lyapunov condition for sequence of random variables

{ξ_{n}}_{n = 1}^{\infty}

is satisfied due to (28) and (29), i.e., for fixed

ω > 0

:

\frac{1}{ς_{n}^{2 + ω}} \sum_{i = 1}^{n} E | ξ_{i} |^{2 + ω} \leq \frac{1}{ς_{n}^{2 + ω}} \sum_{i = 1}^{n} sup_{ι \in N} E | ξ_{ι} |^{2 + ω} = \frac{n}{ς_{n}^{2 + ω}} sup_{ι \in N} E {| ξ_{ι} |}^{2 + ω} \to 0, n \to \infty .

Thereupon, the CLT for

{ξ_{n}}_{n = 1}^{\infty}

holds and

sup_{x \in R} |P [\frac{n}{\sqrt{ς_{n}^{2}}} {\bar{ξ}}_{n} \leq x] - \int_{- \infty}^{x} \frac{1}{\sqrt{2 π}} exp \{- \frac{t^{2}}{2}\} d t| \underset{n \to \infty}{\to} 0 .

Henceforth, to prove this theorem, it suffices to show the following three statements:

(i): $sup_{x \in R} |P_{ξ}^{*} [\frac{\sqrt{n}}{\sqrt{V a r_{P_{ξ}^{*}} ξ_{1}^{*}}} ({\bar{ξ}}_{n}^{*} - E_{P_{ξ}^{*}} {\bar{ξ}}_{n}^{*}) \leq x] - \int_{- \infty}^{x} \frac{1}{\sqrt{2 π}} exp \{- \frac{t^{2}}{2}\} d t| \to_{n \to \infty}^{P} 0;$
(ii): $V a r_{P_{ξ}^{*}} ξ_{1}^{*} - n^{- 1} ς_{n}^{2} \to_{n \to \infty}^{[P] - a . s .} 0;$
(iii): $E_{P_{ξ}^{*}} {\bar{ξ}}_{n}^{*} = {\bar{ξ}}_{n} [P] - a . s .$

Proving (iii) is trivial, because the bootstrapped variables

{ξ_{n}^{*}}_{n = 1}^{\infty}

are conditionally IID and, therefore,

E_{P_{ξ}^{*}} {\bar{ξ}}_{n}^{*} = E_{P_{ξ}^{*}} ξ_{1}^{*} = n^{- 1} \sum_{i = 1}^{n} ξ_{i} = {\bar{ξ}}_{n} [P] - a . s .

Let us calculate the conditional variance of the bootstrapped

ξ_{1}^{*}

:

V a r_{P_{ξ}^{*}} ξ_{1}^{*} = E_{P_{ξ}^{*}} ξ_{1}^{* 2} - {(E_{P_{ξ}^{*}} ξ_{1}^{*})}^{2} = n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{2} - {(n^{- 1} \sum_{i = 1}^{n} ξ_{i})}^{2} [P] - a . s .

The strong law of large numbers for independent non-identically distributed random variables with (28) provide

{\bar{ξ}}_{n} - n^{- 1} \sum_{i = i}^{n} E_{P} ξ_{i} = {\bar{ξ}}_{n} \to_{n \to \infty}^{[P] - a . s .} 0

and

0 \leftarrow_{n \to \infty}^{[P] - a . s .} n^{- 1} \sum_{i = 1}^{n} ξ_{i}^{2} - {(n^{- 1} \sum_{i = 1}^{n} ξ_{i})}^{2} - n^{- 1} \sum_{i = 1}^{n} E_{P} ξ_{i}^{2} = V a r_{P_{ξ}^{*}} ξ_{1}^{*} - n^{- 1} ς_{n}^{2} .

The last result of the SLLN is true, because (28) implies

\sum_{n = 1}^{\infty} \frac{V a r_{P} ξ_{n}^{2}}{n^{2}} \leq \sum_{n = 1}^{\infty} \frac{E_{P} ξ_{n}^{4}}{n^{2}} \leq [sup_{ι \in N} E_{P} ξ_{ι}^{4}] \sum_{n = 1}^{\infty} n^{- 2} < \infty .

Thus (ii) is proved.

The Berry–Esseen-Katz Theorem 5 with

g (x) = {| x |}^{ϵ}, ϵ > 0

for the bootstrapped sequence of IID (with respect to

P^{*}

) random variables

{ξ_{n}^{*}}_{n = 1}^{\infty}

results into

\begin{matrix} sup_{x \in R} |P_{ξ}^{*} [\frac{\sqrt{n}}{\sqrt{V a r_{P_{ξ}^{*}} ξ_{1}^{*}}} ({\bar{ξ}}_{n}^{*} - E_{P_{ξ}^{*}} {\bar{ξ}}_{n}^{*}) \leq x] - \int_{- \infty}^{x} \frac{1}{\sqrt{2 π}} exp \{- \frac{t^{2}}{2}\} d t| \\ \leq C n^{- ϵ / 2} E_{P_{ξ}^{*}} {|\frac{ξ_{1}^{*} - E_{P_{ξ}^{*}} ξ_{1}^{*}}{\sqrt{V a r_{P_{ξ}^{*}} ξ_{1}^{*}}}|}^{2 + ϵ} [P] - a . s ., \end{matrix}

(A7)

where

C > 0

is an absolute constant.

The Minkowski inequality and Jensen’s inequality provide an upper bound for the nominator from the right-hand side of (A7):

\begin{matrix} E_{P_{ξ}^{*}} {| ξ_{1}^{*} - E_{P_{ξ}^{*}} ξ_{1}^{*} |}^{2 + ϵ} = n^{- 1} \sum_{i = 1}^{n} {|ξ_{i} - n^{- 1} \sum_{j = 1}^{n} ξ_{j}|}^{2 + ϵ} \\ \leq n^{- 1} {\{{(\sum_{i = 1}^{n} {| ξ_{i} |}^{2 + ϵ})}^{1 / (2 + ϵ)} + n^{- (1 + ϵ) / (2 + ϵ)} |\sum_{j = 1}^{n} ξ_{j}|\}}^{2 + ϵ} \\ \leq 2^{1 + ϵ} n^{- 1} \sum_{i = 1}^{n} {| ξ_{i} |}^{2 + ϵ} + 2^{1 + ϵ} {|n^{- 1} \sum_{i = 1}^{n} ξ_{i}|}^{2 + ϵ} [P] - a . s . \end{matrix}

The right-hand side of the previously derived upper bound is uniformly bounded in probability

P

, because of Markov’s inequality and (28). In very deed, for fixed

τ > 0

P [n^{- 1} \sum_{i = 1}^{n} {| ξ_{i} |}^{2 + ϵ} \geq τ] \leq τ^{- 1} n^{- 1} \sum_{i = 1}^{n} E_{P} | ξ_{i} |^{2 + ϵ} \leq τ^{- 1} sup_{ι \in N} E_{P} {| ξ_{ι} |}^{2 + ϵ} < \infty, \forall n \in N

and

P [|n^{- 1} \sum_{i = 1}^{n} ξ_{i}| \geq τ] \leq τ^{- 1} n^{- 1} E_{P} |\sum_{i = 1}^{n} ξ_{i}| \leq τ^{- 1} sup_{ι \in N} E_{P} | ξ_{ι} | < \infty, \forall n \in N .

Since

E_{P_{ξ}^{*}} {| ξ_{1}^{*} - E_{P^{*}} ξ_{1}^{*} |}^{2 + ϵ}

is bounded in probability

P

uniformly over n and the denominator of the right-hand side of (A7) is uniformly bounded away from zero due to (29), then the left-hand side of (A7) converges in probability

P

to zero as n tends to infinity. So, (i) is proved as well. □

Proof of Theorem 7.

According to Cramér–Wold theorem, it is sufficient to ensure that all the assumptions of one-dimensional bootstrap CLT 6 are valid for any linear combination of the elements of random vector

ξ_{n}, n \in N

.

For arbitrary fixed

t \in R^{q}

using Jensen’s inequality, we get

sup_{n \in N} E_{P} | t^{⊤} ξ_{n} |^{4} \leq q^{3} sup_{n \in N} \sum_{j = 1}^{q} t_{j}^{4} E_{P} | ξ_{j, n} |^{4} \leq q^{4} max_{j = 1, \dots, q} t_{j}^{4} sup_{n \in N} E_{P} {| ξ_{j, n} |}^{4} < \infty .

Hence, Assumption (33) implies Assumption (28) for random variables

t^{⊤} ξ_{n}, n \in N

.

Similar situation arises, when assumption (34) implies assumption (29) for such an arbitrary linear combination, i.e., positive definiteness of matrix

Γ

yields

lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} V a r_{P} t^{⊤} ξ_{i} = lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} t^{⊤} (V a r_{P} ξ_{i}) t = t^{⊤} (lim_{n \to \infty} \frac{1}{n} Γ_{n}) t = t^{⊤} Γ t > 0 .

Finally, we need to realize that (34) holds, (35) has already been proved above,

{ξ_{i}^{*}}_{i = 1}^{n}

are conditionally IID, and

E_{P_{Ξ}^{*}} ξ_{1}^{*} = n^{- 1} \sum_{i = 1}^{n} ξ_{i} = {\bar{ξ}}_{n} .

□

Proof of Theorem 8.

The proof of this theorem goes along the lines the proof of Theorem 6 except three facts: the bootstrap WLLN for

α

-mixing–Lemma 3 is used; the CLT for

α

-mixing–Corollary 1 is employed; and another covariance inequality is applied, i.e., Lemma 1.2.5 by [26]. □

Proof of Theorem 9.

The proof of this theorem remains almost the same as the proof of Theorem 8 with the only exception that the following tools have to be used instead: the CLT for

φ

-mixing observations, i.e., Corollary 2; a different covariance inequality, i.e., Lemma 1.2.8 by [26]; and a suitable bootstrap WLLN, i.e., Lemma 4. □

References

Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar]
Bickel, P.J.; Freedman, D.A. Some asymptotic theory for the bootstrap. Ann. Stat. 1981, 9, 1196–1217. [Google Scholar] [CrossRef]
Efron, B.; Tibshirani, R. An Introduction to the Bootstrap; Chapman and Hall: London, UK, 1990. [Google Scholar]
Efron, B.; Hastie, T. Computer Age Statistical Inference, Student ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Hall, P. The Bootstrap and Edgeworth Expansion; Springer: New York, NY, USA, 2013. [Google Scholar]
Lahiri, S.N. Resampling Methods for Dependent Data; Springer: New York, NY, USA, 2003. [Google Scholar]
Politis, D.N.; Romano, J.P.; Wolf, M. Subsampling; Springer: New York, NY, USA, 1999. [Google Scholar]
Künsch, H.R. The jacknife and the bootstrap for general stationary observations. Ann. Stat. 1989, 17, 1217–1241. [Google Scholar] [CrossRef]
Liu, R.Y.; Singh, K. Moving blocks jackknife and bootstrap capture weak dependence. In Proceedings of the Exploring the Limits of Bootstrap; Lapage, R., Billard, L., Eds.; Wiley: New York, NY, USA, 1992; pp. 225–248. [Google Scholar]
Lahiri, S.N. Non-strong mixing autoregressive processes. Stat. Probabil. Lett. 1991, 11, 335–341. [Google Scholar]
Politis, D.N.; Romano, J.P. A general resampling scheme for triangular arrays of α-mixing random variables with application to the problem of spectral density estimation. Ann. Stat. 1992, 20, 1985–2007. [Google Scholar] [CrossRef]
Fitzenberger, B. The moving block bootstrap and robust inference for linear least squares and quantile regression. J. Econometr. 1998, 82, 235–287. [Google Scholar] [CrossRef]
Maciak, M.; Peštová, B.; Pešta, M. Structural breaks in dependent, heteroscedastic, and extremal panel data. Kybernetika 2019, 54, 1106–1121. [Google Scholar] [CrossRef]
Pesarin, F.; Salmaso, L. Permutation Tests for Complex Data: Theory, Applications and Software; Wiley: New York, NY, USA, 2010. [Google Scholar]
Pešta, M. Total least squares and bootstrapping with application in calibration. Statistics 2013, 47, 966–991. [Google Scholar] [CrossRef]
Pešta, M. Block bootstrap for dependent errors-in-variables. Commun. Stat. A-Theory 2017, 46, 1871–1897. [Google Scholar] [CrossRef]
Hall, P.; Horowitz, J.L.; Jing, B.Y. On blocking rules for the bootstrap with dependent data. Biometrika 1995, 82, 561–574. [Google Scholar]
Politis, D.N.; White, H. Automatic block-length selection for the dependent bootstrap. Econometr. Rev. 2004, 23, 53–70. [Google Scholar] [CrossRef]
Lahiri, S.; Furukawa, K.; Lee, Y.D. A nonparametric plug-in rule for selecting optimal block lengths for block bootstrap methods. Stat. Methodol. 2007, 4, 292–321. [Google Scholar] [CrossRef]
Kirch, C. Resampling Methods for the Change Analysis of Dependent Data. Ph.D. Thesis, University of Cologne, Cologne, Germany, 2006. [Google Scholar]
Peštová, B.; Pešta, M. Abrupt change in mean using block bootstrap and avoiding variance estimation. Comput. Stat. 2018, 33, 413–441. [Google Scholar]
Billingsley, P. Convergence of Probability Measures, 2nd ed.; John Wiley & Sons: New York, NY, USA, 1999. [Google Scholar]
Bradley, R.C. Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2005, 2, 107–144. [Google Scholar]
Gijbels, I.; Omelka, M.; Pešta, M.; Veraverbeke, N. Score tests for covariate effects in conditional copulas. J. Multivar. Anal. 2017, 159, 111–133. [Google Scholar]
Rosenblatt, M. A central limit theorem and a strong mixing condition. Proc. Natl. Acad. Sci. USA 1956, 42, 43–47. [Google Scholar] [CrossRef]
Lin, Z.; Lu, C. Limit Theory for Mixing Dependent Random Variables; Springer: New York, NY, USA, 1997. [Google Scholar]
Ibragimov, I.A. Some limit theorems for stochastic processes stationary in the strict sense. Dokl. Akad. Nauk. SSSR 1959, 125, 711–714. (In Russian) [Google Scholar]
Pešta, M. Asymptotics for weakly dependent errors-in-variables. Kybernetika 2013, 49, 692–704. [Google Scholar]
Anderson, T.W. An Introduction to Multivariate Statistical Analysis; John Wiley & Sons: New York, NY, USA, 1958. [Google Scholar]
Billingsley, P. Convergence of Probability Measures, 1st ed.; John Wiley & Sons: New York, NY, USA, 1968. [Google Scholar]
Doob, J.L. Stochastic Processes; John Wiley & Sons: New York, NY, USA, 1953. [Google Scholar]
Ibragimov, I.A.; Linnik, Y.V. Independent and Stationary Sequences of Random Variables; Wolters-Noordhoff: Amsterdam, The Netherlands, 1971. [Google Scholar]
Rosenblatt, M. Markov Processes: Structure and Asymptotic Behavior; Springer: Berlin/Heidelberg, Germany, 1971. [Google Scholar]
Antoch, J.; Hušková, M.; Prášková, Z. Effect of dependence on statistics for determination of change. J. Stat. Plan. Infer. 1997, 60, 291–310. [Google Scholar] [CrossRef]
Andrews, D.W.K. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 1991, 59, 817–858. [Google Scholar] [CrossRef]
Belyaev, Y.K. Bootstrap, Resampling and Mallows Metric. Lecture Notes 1; Institute of Mathematical Statistics, UmeåUniversity: Umeå, Sweden, 1995. [Google Scholar]
Singh, K. On the asymptotic accuracy of Efron’s bootstrap. Ann. Stat. 1981, 9, 1187–1195. [Google Scholar] [CrossRef]
Peštová, B.; Pešta, M. Testing structural changes in panel data with small fixed panel size and bootstrap. Metrika 2015, 78, 665–689. [Google Scholar]
Peštová, B.; Pešta, M. Erratum to: Testing structural changes in panel data with small fixed panel size and bootstrap. Metrika 2016, 79, 237–238. [Google Scholar] [CrossRef] [Green Version]
Peštová, B.; Pešta, M. Change point estimation in panel data without boundary issue. Risks 2017, 5, 7. [Google Scholar] [CrossRef] [Green Version]
Culpepper, S.A.; Balamuta, J.J. Inferring latent structure in polytomous data with a higher-order diagnostic model. Multivar. Behav. Res. 2021, 1–19. [Google Scholar] [CrossRef]
Maciak, M.; Mizera, I.; Pešta, M. Functional profile techniques for claims reserving. ASTIN Bull. 2022, 52, 449–482. [Google Scholar]
Gerthofer, M.; Pešta, M. Stochastic claims reserving in insurance using random effects. Prague Econ. Pap. 2017, 26, 542–560. [Google Scholar] [CrossRef] [Green Version]
McNeil, A.J. Estimating the tails of loss severity distributions using extreme value theory. ASTIN Bull. 1997, 27, 117–137. [Google Scholar] [CrossRef] [Green Version]
Pešta, M. Changepoint in error-prone relations. Mathematics 2021, 9, 89. [Google Scholar] [CrossRef]
Pešta, M.; Wendler, M. Nuisance-parameter-free changepoint detection in non-stationary series. Test 2020, 29, 379–408. [Google Scholar]
Maciak, M.; Pešta, M.; Peštová, B. Changepoint in dependent and non-stationary panels. Stat. Pap. 2020, 61, 1385–1407. [Google Scholar] [CrossRef]
Pešta, M.; Peštová, B.; Maciak, M. Changepoint estimation for dependent and non-stationary panels. Appl. Math.-Czech. 2020, 65, 299–310. [Google Scholar] [CrossRef]
Pešta, M.; Hudecová, Š. Asymptotic consistency and inconsistency of the chain ladder. Insur. Math. Econ. 2012, 51, 472–479. [Google Scholar] [CrossRef]
Hudecová, Š.; Pešta, M. Modeling dependencies in claims reserving with GEE. Insur. Math. Econ. 2013, 53, 786–794. [Google Scholar]
Pešta, M.; Okhrin, O. Conditional least squares and copulae in claims reserving for a single line of business. Insur. Math. Econ. 2014, 56, 28–37. [Google Scholar] [CrossRef] [Green Version]
Maciak, M.; Okhrin, O.; Pešta, M. Infinitely stochastic micro reserving. Insur. Math. Econ. 2021, 100, 30–58. [Google Scholar]
Belyaev, Y.K.; Sjöstedt-de Luna, S. Weakly approaching sequences of random distributions. J. Appl. Probab. 2000, 37, 807–822. [Google Scholar] [CrossRef] [Green Version]
Zagdaǹski, A. On the construction and properties of bootstrap-t prediction intervals for stationary time series. Probab. Math. Statist. 2005, 25, 133–153. [Google Scholar]
van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: New York, NY, USA, 1998. [Google Scholar]
Chen, X.; Wu, Y. Strong law for mixing sequence. Acta Math. Appl. Sin. 1989, 5, 367–371. [Google Scholar]
Xuejun, W.; Shuhe, H.; Yan, S.; Wenzhi, Y. Moment inequalities for φ-mixing sequences and its applications. J. Inequal. Appl. 2009, 2009, 12. [Google Scholar] [CrossRef] [Green Version]
Herrndorf, N. A functional central limit theorem for strongly mixing sequence of random variables. Probab. Theory Relat. Fields 1985, 69, 541–550. [Google Scholar] [CrossRef]
Utev, S.A. The central limit theorem for φ-mixing arrays of random variables. Theory Probab. Appl. 1990, 35, 131–139. [Google Scholar] [CrossRef]
Katz, M.L. Note on the Berry–Esseen theorem. Ann. Math. Stat. 1963, 34, 1107–1108. [Google Scholar]

Figure 1. Empirical distributions of the sample mean based on the independent bootstrap (green—incorrect approach) against moving block bootstrap (orange—proper approach) for the correct/incorrect answers by a subject from the Programme for International Student Assessment (PISA) 2012—U.S. Math Assessment.

Figure 2. Empirical distributions of the sample mean based on the independent bootstrap (blue—incorrect approach) against moving block bootstrap (red—proper approach) for the Danish Fire Insurance Claims belonging to the reinsurance layer starting from 20M DKK (Danish crowns) up to 100M DKK.

Table 1. Empirical distributional quantities of the sample mean based on the independent bootstrap (incorrect approach) against moving block bootstrap (proper approach) for the correct/incorrect answers by a subject from the Programme for International Student Assessment (PISA) 2012–U.S. Math Assessment; number of observations n = 34.

Empirical Quantity	Independent Bootstrap	Moving Block Bootstrap
$2.5$ th percentile	0.2647	0.3125
First quartile	0.3824	0.4062
Median	0.4412	0.4375
Third quartile	0.5000	0.4999
97.5th percentile	0.6176	0.5938

Table 2. Empirical distributional quantities of the sample mean based on the independent bootstrap (blue—incorrect approach) against moving block bootstrap (red—proper approach) for the Danish Fire Insurance Claims belonging to the reinsurance layer starting from 20M DKK (Danish crowns) up to 100M DKK; number of observations n = 33.

Empirical Quantity	Independent Bootstrap	Moving Block Bootstrap
2.5th percentile	27.94	29.39
First quartile	30.28	31.20
Median	31.64	32.20
Third quartile	33.08	33.29
97.5th percentile	35.90	35.38

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hrba, M.; Maciak, M.; Peštová, B.; Pešta, M. Bootstrapping Not Independent and Not Identically Distributed Data. Mathematics 2022, 10, 4671. https://doi.org/10.3390/math10244671

AMA Style

Hrba M, Maciak M, Peštová B, Pešta M. Bootstrapping Not Independent and Not Identically Distributed Data. Mathematics. 2022; 10(24):4671. https://doi.org/10.3390/math10244671

Chicago/Turabian Style

Hrba, Martin, Matúš Maciak, Barbora Peštová, and Michal Pešta. 2022. "Bootstrapping Not Independent and Not Identically Distributed Data" Mathematics 10, no. 24: 4671. https://doi.org/10.3390/math10244671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bootstrapping Not Independent and Not Identically Distributed Data

Abstract

1. Introduction and Motivation

1.1. State of the Art

1.2. Structure of the Paper

2. Bootstrap Methods for NINID Data

2.1. Independent Bootstrap

2.2. Moving Block Bootstrap

2.3. Blocksize

3. Types of Bootstrap Convergences

3.1. Properties of the Bootstrap Convergences

3.2. Weak Dependence

4. Bootstrap Laws of Large Numbers

4.1. Bootstrap Weak LLN for Independent Data

4.2. Bootstrap Weak LLNs for NINID

5. Bootstrap Central Limit Theorems

5.1. Bootstrap CLT for Independent Data

5.2. Bootstrap CLTs for NINID

6. Real Data Analyses

6.1. Psychometry

6.2. Insurance

7. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI