Some Facts about Bootstrapping

This post documents some facts about bootstrapping that I encountered during my PhD. In particular, this post highlights some asymptotic results relating to

Consider the setting where we sample TT observations and create BB samples of size TT by sampling with replacement. In other words, we have:

Part 1: Normality

Mean Absolute Deviation and Standard Deviation

Geary, R. C. proved the following: If XN(μ,σ)X \sim N(\mu, \sigma) then E[Xμ]=2πσ\mathbb{E}[|X - \mu|] = \sqrt{\frac{2}{\pi}} \sigma Geary, The Ratio of the Mean Deviation to the Standard Deviation as a Test of Normality., Biometrika, 27(3), 310–32, 1935.

Central limit theorem

The central limit theorem states that if {Xi}i[T]\{X_i \}_{i \in [T]} are TT samples drawn independently and identically distributed from a distribution with mean μ\mu and variance σ2\sigma^2, then Z=limTT(1Tt[T]Xtμ)/σZ = \lim_{T \rightarrow \infty} \sqrt{T} (\frac{1}{T}\sum_{t \in [T]}{X_t} - \mu)/{\sigma} follows a standard normal distribution. The sample estimate of the mean for batch b[B]b \in [B] is given by μˉ(b):=1Tt[T]Xt(b) \bar{\mu}^{(b)} := \frac{1}{T} \sum_{t \in [T]} X_t^{(b)}, therefore, for large TT:

1Tt[T]Xt(b)dN(μ,σT) \frac{1}{T} \sum_{t \in [T]} X_t^{(b)} \overset{d}{\rightarrow} N(\mu, \frac{\sigma}{\sqrt{T}})

In our setting, the distribution is the empirical distribution defined by the sample SS, implying that μ\mu and σ\sigma are given by

μ=ES[X]=1TxSx \mu = \mathbb{E}_S[X] = \frac{1}{T}\sum_{x \in S} x

and

σ=ES[(Xμ)2]=1TxS(xμ)2 \sigma = \mathbb{E}_S[(X - \mu)^2] = \frac{1}{T}\sum_{x \in S} (x - \mu)^2

Therefore μˉ(b)=1Tt[T]Xt(b)\bar{\mu}^{(b)} = \frac{1}{T} \sum_{t \in [T]} X^{(b)}_t can be thought of as a sample from N(μ,σT)N(\mu, \frac{\sigma}{\sqrt{T}})

Part 2: Quantile of Absolute Deviations

In the work by Ho, M., Sun, Z., and Xin, J. , the optimal portfolio solves:

minxxΣx+xdiag{αi}x+βxs.t. μxRˉ, xΔN \begin{align} \min_{x} \quad & x^{\intercal}\Sigma x + x^{\intercal} \text{diag}\{\alpha_i\} x + \beta^{\intercal} |x| \notag &\\ \textrm{s.t.} \ \quad & \mu^\intercal x \geq \bar{R},\ x \in \Delta_N \notag \\ \end{align}

where βR+N\beta \in \mathbb{R}_+^N and αiR+ i[N]\alpha_i \in \mathbb{R}_+ \ \forall i \in [N] denote the penalty parameters. Ho, M., Sun, Z., and Xin, J. show that αi\alpha_i and βi\beta_i correspond to the sizes of box-uncertainty sets used in the robust counterpart of the mean-variance optimization problem. Specifically, αi\alpha_i and βi\beta_i correspond to uncertainty in the diagonal of the estimated covariance matrix Σii\Sigma_{ii} and expected return μi\mu_i respectively for asset i=1,2,...,Ni = 1,2,..., N. Ho, M., Sun, Z., and Xin, J. use a bootstrapping approach to select the values of βi\beta_i and αi\alpha_i; they proceed as follows: Let {r(t)RN}t[T]\{\boldsymbol{r}^{(t)} \in \mathbb{R}^N\}_{t \in [T]} denote TT observations of the random variable r\boldsymbol{r} used to estimate Σ\boldsymbol{\Sigma} and μ\boldsymbol{\mu}. The estimation errors are computed by re-sampling TT observations with replacement from the original set of observations BB times. Each re-sampling yields estimates of the mean and covariance μˉ(b),Σˉ(b) b[B]\bar{\boldsymbol{\mu}}^{(b)}, \bar{\boldsymbol{\Sigma}}^{(b)} \ b \in [B]. Let p1p_1 and p2p_2 denote the investor's aversion to estimation risk of the mean and variance, respectively. The values for βi\beta_i and αi\alpha_i are defined as the quantiles of the bootstrapped deviations

αi=inf{t1Bb=1B1[Σˉii(b)Σii<t]>p1} \alpha_i = \inf {\Big \{}t {\Big |} \frac{1}{B} \sum_{b=1}^{B} {\mathbf{1}}{\Big [}|\bar{\Sigma}^{(b)}_{ii} - \Sigma_{ii}| < t {\Big ] } > p_1 {\Big \}}

and

βi=inf{t1Bb=1B1[μˉi(b)μi<t]>p2} \beta_i = \inf {\Big \{}t {\Big |} \frac{1}{B} \sum_{b=1}^{B} {\mathbf{1}}{\Big [}|\bar{\mu}^{(b)}_{i} - \mu_{i}| < t {\Big ] } > p_2 {\Big \}}

Quantile of absolute deviation for standard normal

One can express the quantile of absolute deviation of a normal variable in a closed-form equation. The results can be derived as follows: letting ZN(0,1)Z \sim N(0,1) then the quantile of the absolute deviation satisfies P(Zx)=pP(|Z| \leq x) = p which is the same as 2Φ(x)1=p2\Phi(x) - 1 = p which implies

x=Φ1(p+12) x = \Phi^{-1}(\frac{p+1}{2})

This is useful because we see that the sample means μˉ(b)\bar{\boldsymbol{\mu}}^{(b)} converge in distribution to a normal distribution. Ho, Sun & Xin, Weighted elastic net penalized mean-variance portfolio design and computation., SIAM Journal on Financial Mathematics, 6(1), 1220–1244, 2015.

Part 3: Distribution of standard deviations can be obtained via the delta method

The links below were useful in understanding the delta method and its application to the distribution of standard deviations.

The case of determining an approximation for the distribution of Σ\boldsymbol{\Sigma} is more complex. For simplicity, we consider the case where Σ\boldsymbol{\Sigma} is a scalar denoted by σ\sigma. An approach referred to as the delta method can be used to obtain the asymptotic distribution of Σ=1Ti[T]Xi2(1Ti[T]Xi)2\Sigma = \frac{1}{T}\sum_{i \in [T]} X_i^2 - (\frac{1}{T} \sum_{i \in [T]} X_i)^2.

Let W(2)W^{(2)} and W(1)W^{(1)} denote 1Ti[T]Xi2\frac{1}{T} \sum_{i \in [T]} X_i^2 and 1TtXt\frac{1}{T} \sum_t X_t respectively. W(1)W^{(1)} and W(2)W^{(2)} converge to a joint normal distribution by the multivariate central limit theorem with covariance ΞW(j),W(k)=E[W(j)W(k)]E[W(j)]E[W(k)]\Xi_{W^{(j)}, W^{(k)}} = \mathbb{E}[W^{(j)} W^{(k)}] - \mathbb{E}[W^{(j)}]\mathbb{E}[W^{(k)}] and expected value μW\boldsymbol{\mu}_{\mathbf{W}} i.e T(WμW)dN(0,Ξ)\sqrt{T}(\mathbf{W} - \boldsymbol{\mu}_{\mathbf{W}} ) \overset{d}{\rightarrow} N(\mathbf{0}, \Xi).

In general, for any differentiable g(W)g(\mathbf{W}) one can write the following first-order Taylor expansion g(W)g(μW)+g(μW)(WμW)g({\mathbf{W}}) \approx {g(\boldsymbol{\mu}_{\mathbf{W}}) + } \nabla g(\boldsymbol{\mu}_{\mathbf{W}})({\mathbf{W}} - \boldsymbol{\mu}_{\mathbf{W}}). One can approximate the covariance of g(W)g(\mathbf{W}) by V[g(W)]g(W)Ξg(W)\mathbb{V}[g({\mathbf{W}})] \approx \nabla g({\mathbf{W}})^{\intercal} {\mathbf \Xi} \nabla g({\mathbf{W}}), and as such T(g(W)g(μW))dN(0,g(W)Ξg(W))\sqrt{T}(g({\mathbf{W}}) - g({\boldsymbol{\mu}_{\mathbf{W}}}) )\overset{d}{\rightarrow} N(0,\nabla g({\mathbf{W}})^{\intercal} {\mathbf \Xi} \nabla g({\mathbf{W}})).

One can set g(W1,W2)=W(2)(W(1))2g(W_1,W_2) = W^{(2)} - (W^{(1)})^2 and use the first-order delta method as described above to obtain asymptotic convergence condition. The first-order delta method does not always work. Convergence to normality does not hold for Bernoulli variables with p=0.5p = 0.5 because the variance of gg ends up being zero.

To circumvent this, the second-order Taylor approximation must be used. In the case of Bernoulli random variables gg can be expressed as a function of a single random parameter:

p^:=1TtXt,\hat{p} := \frac{1}{T} \sum_t X_t,

since 1TtXt=1TtXt2\frac{1}{T} \sum_t X_t = \frac{1}{T} \sum_t X_t^2. In this case,

g(p^)=1TtXt2(1TtXt)2=p^(1p^).\begin{align*} g(\hat{p}) &= \frac{1}{T} \sum_t X_t^2 - \left(\frac{1}{T} \sum_t X_t\right)^2\\ &= \hat{p}(1-\hat{p}). \end{align*}

If the mean is p=1/2p=1/2, then taking the second-order taylor expansion around the mean implies p^(1p^)=1/4+0(p^0.5)+1/2(2)(p^1/2)2\hat{p}(1-\hat{p}) = 1/4 + 0*(\hat{p}-0.5) + 1/2 (-2) (\hat{p} - 1/2)^2. It is then true that p^(1p^)\hat{p}(1-\hat{p}) is asymptotically negative χ2\chi^2 and is not normal because p^=1TtXt\hat{p} = \frac{1}{T} \sum_t X_t is asymptotically normal by the central limit theorem and p^(1p^)\hat{p}(1-\hat{p}) is a constant minus an asymptotically normal variable squared (as shown in the above Taylor series. In the case that the first-order delta method is applicable and the first-order Taylor series introduces uncertainty in gg, it follows:

1TXi2(1TtXt)2dN(σ2,1TCM4(X)σ4+O(T2)), \frac{1}{T}\sum X_i^2 - (\frac{1}{T} \sum_t X_t)^2 \overset{d}{\rightarrow} N {\big(}\sigma^2, \frac{1}{\sqrt{T}}\sqrt{\text{CM}_4(X) - \sigma^4 + O(T^{-2})}{\big)},

where CM4(X)\text{CM}_4(X) denotes the 4th-order central moment of XX.

Google Colab Notebook

The notebook located here demonstrates the facts highlighted above.

CC BY-SA 4.0 David Islip. Last modified: May 27, 2025. Website built with Franklin.jl and the Julia programming language.