SOLUTION AT Australian Expert Writers

JOURNAL OF APPLIED ECONOMETRICSJ. Appl. Econ. 20: 873–889 (2005)Published online 30 March 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/jae.800A FORECAST COMPARISON OF VOLATILITY MODELS:DOES ANYTHING BEAT A GARCH(1,1)?PETER R. HANSENa* AND ASGER LUNDEba Department of Economics, Brown University, Providence, USAb Department of Information Science, Aarhus School of Business, DenmarkSUMMARYWe compare 330 ARCH-type models in terms of their ability to describe the conditional variance. The modelsare compared out-of-sample using DM–$ exchange rate data and IBM return data, where the latter is basedon a new data set of realized variance. We find no evidence that a GARCH(1,1) is outperformed by moresophisticated models in our analysis of exchange rates, whereas the GARCH(1,1) is clearly inferior to modelsthat can accommodate a leverage effect in our analysis of IBM returns. The models are compared with thetest for superior predictive ability (SPA) and the reality check for data snooping (RC). Our empirical resultsshow that the RC lacks power to an extent that makes it unable to distinguish ‘good’ and ‘bad’ models inour analysis. Copyright 2005 John Wiley & Sons, Ltd.1. INTRODUCTIONThe conditional variance of financial time series is important for pricing derivatives, calculatingmeasures of risk, and hedging. This has sparked an enormous interest in modelling the conditionalvariance and a large number of volatility models have been developed since the seminal paper ofEngle (1982); see Poon and Granger (2003) for an extensive review and references.The aim of this paper is to examine whether sophisticated volatility models provide a betterdescription of financial time series than more parsimonious models. We address this questionby comparing 330 GARCH-type models in terms of their ability to forecast the one-day-aheadconditional variance. The models are evaluated out-of-sample using six different loss functions,where the realized variance is substituted for the latent conditional variance. We use the test forsuperior predictive ability (SPA) of Hansen (2001) and the reality check for data snooping (RC)by White (2000) to benchmark the 330 volatility models to the GARCH(1,1) of Bollerslev (1986).These tests have the advantage that they properly account for the full set of models, without the useof probability inequalities, such as the Bonferroni bound, that typically lead to conservative tests.We compare the models using daily DM–$ exchange rate data and daily IBM returns. Thereare three main findings of our empirical analysis. First, in the analysis of the exchange rate datawe find no evidence that the GARCH(1,1) is inferior to other models, whereas the GARCH(1,1)is clearly outperformed in the analysis of IBM returns. Second, our model space includes modelswith many distinct characteristics that are interesting to compare,1 and some interesting detailsŁ Correspondence to: Peter R. Hansen, Brown University, Department of Economics, Box B, Providence, RI 02 912, USA.E-mail: peter [email protected] However, we have by no means included all the volatility models that have been proposed in the literature. For acomparison of a smaller set of models that also includes stochastic volatility models and fractionally integrated GARCHmodels, see Hansen et al. (2003).Copyright 2005 John Wiley & Sons, Ltd. Received 8 November 2002Revised 12 February 2004874 P. R. HANSEN AND A. LUNDEemerge from the out-of-sample analysis. The models that perform well in the IBM return dataare primarily those that can accommodate a leverage effect, and the best overall performance isachieved by the A-PARCH(2,2) model of Ding et al. (1993). Other aspects of the volatility modelsare more ambiguous. While the t-distributed specification of standardized returns generally leadsto a better average performance than the Gaussian in the analysis of exchange rates, the oppositeis the case in our analysis of IBM returns. The different mean specifications, zero-mean, constantmean and GARCH-in-mean, result in almost identical performances. Third, our empirical analysisshows that the RC has less power than the SPA test. This makes an important difference in ourapplication, because the RC cannot detect that the GARCH(1,1) is significantly outperformed byother models in the analysis of IBM returns. In fact, the RC even suggests that an ARCH(1) maybe the best model in many cases, which does not conform with the existing empirical evidence.The SPA test always finds the ARCH(1) model to be inferior, which shows that the SPA test haspower in these applications and is therefore more likely to detect superior models when such exist.Ideally, we would evaluate the models’ ability to forecast all aspects of the conditionaldistribution. However, it is not possible to extract precise information about the conditionaldistribution without making restrictive assumptions. Instead we focus on the central component ofthe models—the conditional variance—that can be estimated by the realized variance. Initially,it was common to substitute the squared return for the unobserved conditional variance in outof-sample evaluations of volatility models. This typically resulted in a poor performance, whichinstigated a discussion of the practical relevance of volatility models. However, Andersen andBollerslev (1998) showed that the ‘poor’ performance could be explained by the fact that thesquared return is a noisy proxy for the conditional variance. By substituting the realized variance(instead of the squared return), Andersen and Bollerslev (1998) showed that volatility modelsperform quite well. Hansen and Lunde (2003) provide another important argument for using therealized variance rather than the squared return. They show that substituting the squared returnsfor the conditional variance can severely distort the comparison, in the sense that the empiricalranking of models may be inconsistent for the true (population) ranking. So an evaluation thatis based on squared returns may select an inferior model as the ‘best’ with a probability thatconverges to one as the sample size increases. For this reason, our evaluation is based on therealized variance.Comparing multiple models is a non-standard inference problem, and spurious results are likelyto appear unless inference controls for the multiple comparisons. An inferior model can be ‘lucky’and perform better than all other models, and the more models that are being compared the higheris the probability that the best model (in population) has a much smaller sample performancethan some inferior model. It is therefore important to control for the full set of models and theirinterdependence when evaluating the significance of an excess performance. In our analysis weemploy the SPA test and the RC, which are based on the work of Diebold and Mariano (1995)and West (1996). These tests can evaluate whether a particular model (benchmark) is significantlyoutperformed by other models, while taking into account the large number of models that arebeing compared. In other words, these tests are designed to evaluate whether an observed excessperformance is significant or could have occurred by chance.This paper is organized as follows. Section 2 describes the 330 volatility models underconsideration and the loss functions are defined in Section 3. In Section 4, we describe ourmeasures of realized variance and Section 5 contains some details of the SPA test and itsbootstrap implementation. We present our empirical results in Section 6 and Section 7 containssome concluding remarks.Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)A FORECAST COMPARISON OF VOLATILITY MODELS 8752. THE GARCH UNIVERSEGiven a price process, pt, we define the compounded daily return by rt D logpt logpt1, t DR C 1, . . . , n. Later we split the sample into an estimation period (the first R observations) andan evaluation period (the last n observations).The conditional density of rt is denoted by frjFt1, where Ft1 is the -algebra inducedby variables that are observed at time t 1. We define the conditional mean by t ErtjFt1(the location parameter) and the conditional variance by t2 varrtjFt1 (the scale parameter),assuming that both are well defined. Subsequently we can define the standardized return,et rt t/t, and denote its conditional density by gejFt1. Following Hansen (1994) weconsider a parametric specification, frj Ft1; , where 2 ² q is a vector of parameters.It now follows that the time-varying vector of parameters, t Ft1; , can be divided intot D t, t2, t, where t is a vector of shape parameters for the conditional density of et. Thus,we have a family of density functions for rt, which is a location-scale family with (possibly timevarying) shape parameters, and we shall model t, t2 and t individually. Most GARCH-typemodels can be formulated in this framework and t typically does not depend on t.The notation for our modelling of the conditional mean and variance is mt D Ft1; andh2t D 2Ft1; , respectively, and we employ two specifications for gejt in our empiricalanalysis. One is a Gaussian specification that is free of parameters gejt D ge, and the otheris a t-specification that has degrees of freedom, , as the only parameter, gejt D gej.2 Ourspecifications for the conditional mean are: mt D 0 C 1t21 (GARCH-in-mean), mt D 0 andmt D 0.The conditional variance is the main object of interest and our analysis includes a large numberof parametric specifications for t that are listed in Table I. The use of acronyms has not beenfully consistent in the existing literature, for example, A-GARCH has been used to represent fourdifferent specifications. So to avoid any confusion we use ‘A-GARCH’ to refer to a model byEngle and Ng (1993) and use different acronyms for all other models, e.g., we use H-GARCH torefer to the model by Hentshel (1995). Several specifications nest other specifications, as is evidentfrom Table I. In particular, the flexible specifications of the H-GARCH and the Aug-GARCH, seeDuan (1997), nest many of the simpler specifications. An empirical comparison of several of themodels that are nested in the Aug-GARCH model can be found in Loudon et al. (2000).The evolution of volatility models has been motivated by empirical findings and economicinterpretations. Ding et al. (1993) used Monte Carlo simulations to demonstrate that both theGARCH specification (model for t2) and the TS-GARCH specification3 (model for t) are capableof producing the autocorrelation pattern that is seen in financial data. So in this respect there isno argument for modelling t rather than t2 or vice versa. More generally, we can considera modelling of tυ where υ is a parameter to be estimated, and this motivated the Box–Coxtransformations that involve t and εt. The empirically observed leverage effect motivated thedevelopment of models with an asymmetric response in volatility to positive and negative shocks.The leverage effect was first noted by Black (1976) and is best illustrated by the news impact curve,which was introduced by Pagan and Schwert (1990) and named by Engle and Ng (1993). Thiscurve is a plot of t2 against εt1 that illustrates how the volatility reacts to good and bad news.In our analysis, we have included the four combinations of p, q D 1, 2 for the lag lengthparameters, with the following exceptions: the ARCH is only estimated for q D 1; H-GARCH2 We do not restrict to be an integer.3 See Taylor (1986) and Schwert (1990).Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)876 P. R. HANSEN AND A. LUNDETable I. Specifications for the conditional varianceARCH: 2t D ω CqiD1˛iε2 tiGARCH: 2t D ω CqiD1˛iε2 ti CpjD1ˇjt2jIGARCH 2t D ω C ε2 t1 CqiD2˛iε2 ti ε2 t1 CpjD1ˇjt2j ε2 t1Taylor/Schwert: t D ω CqiD1˛ijεtij CpjD1ˇjtjA-GARCH: 2t D ω CqiD1[˛iε2 ti C iεti] CpjD1ˇjt2jNA-GARCH: 2t D ω CqiD1˛iεti C iti2 CpjD1ˇjt2jV-GARCH: 2t D ω CqiD1˛ieti C i2 CpjD1ˇjt2jThr.-GARCH: t D ω CqiD1˛i[1 iεC ti 1 C iε ti] CpjD1ˇjtjGJR-GARCH: 2t D ω CqiD1[˛i C iIεti>0]ε2 ti CpjD1ˇjt2jlog-GARCH: logt D ω CqiD1˛ijetij CpjD1ˇj logtjEGARCH: logt2 D ω CqtD1[˛ieti C ijetij Ejetij] CpjD1ˇj logt2jNGARCH:a υt D ω CqiD1˛ijεtijυ CpjD1ˇjtυjA-PARCH: υ D ω CqiD1˛i[jεtij iεti]υ CpjD1ˇjtυjGQ-ARCH: t2 D ω CqiD1˛iεti CpiD1˛iiε2 ti Cpi<j˛ijεtiεtj CpjD1ˇjt2jH-GARCH: υt D ω CqiD1˛iυtυi[jet j et ] CpjD1ˇjtυjAug-GARCH:b t2 D

jυexp t υtC11j1/υ

if if υυ 6DD 00

t D ω CqiD1[˛1ijεti j C ˛2i max0, εti]tjCqiD1[˛3ifjεti j, C ˛4ifmax0, εti, ]tjCpjD1ˇjt2ja This is the A-PARCH model without the leverage effect.b Here fx, D x 1/.Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)A FORECAST COMPARISON OF VOLATILITY MODELS 877and Aug-GARCH are only estimated for p, q D 1, 1, because these are quite burdensome toestimate. It is well known that an ARCH(1) model is unable to fully capture the persistence involatility, and this model is only included as a point of reference, and to verify that the tests, SPAand RC, have power. This is an important aspect of the analysis, because a test that is unableto reject that the ARCH(1) is the best model cannot be very informative about which is a bettermodel. Restricting the models to have two lags (or less) should not affect the main conclusionsof our empirical analysis, because it is unlikely that a model with more lags would outperforma simple benchmark in the out-of-sample comparison, unless the same model with two lags canoutperform the benchmark. This aspect is also evident from our analysis, where a model withp D q D 2 rarely performs better (out-of-sample) than the same model with fewer lags, eventhough most parameters are found to be significant (in-sample).3. FORECAST EVALUATIONA popular way to evaluate volatility models out-of-sample is in terms of the R2 from a

Mincer–Zarnowitz (MZ) regression, rt2 D a C bh

C ut, where squared returns are regressed on

t2 the model forecast of 2t and a constant. Or the logarithmic version, logrt2 D a C b loght2 C ut,that is less sensitive to outliers, as was noted by Pagan and Schwert (1990) and Engle and Patton(2001).4 However, the R2 of a MZ regression is not an ideal criterion for comparing volatilitymodels, because it does not penalize a biased forecast. For example, a poor biased forecast mayachieve a higher R2 than a good unbiased forecast, because the bias can be eliminated artificiallythrough estimates of (a, b) that differ from (0, 1).It is not obvious which loss function is more appropriate for the evaluation of volatility models,as discussed by Bollerslev et al. (1994), Diebold and Lopez (1996) and Lopez (2001). So ratherthan making a single choice we use the following six loss functions in our empirical analysis:MSE1 n1ntD1t ht2 MSE2 n1ntD1t2 ht22QLIKE n1ntD1loght2 C t2ht2 R2LOG n1ntD1[logt2ht2]2

MAE1 n1ntD1jt htj

MAE2 n1ntD1jt2 ht2j

The criteria MSE2 and R2LOG are similar to the R2 of the MZ regressions,5 and QLIKEcorresponds to the loss implied by a Gaussian likelihood. The mean absolute error criteria, MAE2and MAE1, are interesting because they are more robust to outliers than, say, MSE2. Additionaldiscussions of the MSE2, QLIKE and R2LOG criteria can be found in Bollerslev et al. (1994).4 Engle and Patton (2001) also point out that heteroskedastic returns imply (even more) heteroskedasticity in the squaredreturns, rt2. So parameters are estimated inefficiently and the usual standard errors are misleading.5 Provided that a D 0 and b D 1, which essentially requires the forecasts to be unbiased.Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)878 P. R. HANSEN AND A. LUNDE4. REALIZED VARIANCEIn our empirical analysis we substitute the realized variance for the latent t2. The realized variancefor a particular day is calculated from intraday returns, ri,m, where rt,i,m pti1/m pti/m fori D 1, . . . , m. Thus rt,i,m is the return over a time interval with length 1/m on day t, and we notethat rt D m iD1 rt,i,m. It will often be reasonable to assume that Ert,i,mjFt1 ‘ 0 and that intraday returns are conditionally uncorrelated, covrt,i,m, rt,j,mjFt1 D 0 for i 6D j, such that t2var m iD1 rt,i,mjFt1 D m iD1 varrt,i,mjFt1 ‘ m iD0 Ert,i,m 2 jFt1 D E[RVm t jFt1], where wehave defined the realized variance (at frequency m) RVm t m iD1 rt,i,m 2 . Thus RVm t is approximately unbiased for t2 (given our assumptions above), and it can often be shown thatE[RVm t t2]2 is decreasing in m, such that RVm t is an increasingly more precise estimatorof 2t as m increases.6 Further, the RVm t is (by definition) consistent for the quadratic variation ofpt, which is identical to the conditional variance, t2, for certain data generating processes (DGPs)such as the ARCH-type models considered in this paper.7Several assets are not traded 24 hours a day, because the market is closed overnight and overweekends. In these situations we only observe f m (of the m possible) intraday returns. Assumefor simplicity that we observe, rt,1,m, . . . , rt,f,m and define RVf/m t f iD1 rt,i,m 2 . Since RVf/m tonly captures the volatility during the part of the day that the market is open, we need to extendRVf/m

t

to a measure of volatility for the full day. One resolution is to add the squared close-to

open return to RVf/m

t

, but this leads to a noisy measure because ‘overnight’ returns are relatively

noisy. A better solution is to scale RVf/m

t

, and use the estimator

O 2t O c Ð RVf/m t where cOn1n tD1rt O t2n1n tD1RVf/m t1This yields an estimator that is approximately unbiased for t2 under fairly reasonable assumptions.See Martens (2002), Hol and Koopman (2002) and Fleming et al. (2003), who applied similarscaling estimators to obtain a measure of volatility for the whole day.5. TEST FOR SUPERIOR PREDICTIVE ABILITYWe divide the observations into an estimation period and an evaluation period:t DR C 1, . . . , 0,estimation period1, 2, . . . , nevaluation period6 In practice, m must be chosen moderately large, to avoid that intraday returns become correlated due to marketmicrostructure effects. In the technical appendix (Hansen and Lunde, 2001), we list the R2 values from two MZ regressions,

r2t D a C bht2 C ut and RV t288 D a C bht2 C ut, where the realized variance, RV t288

, is defined in the next section. The

R2 of the former typically lies between 2 and 4%, whereas the R2 of the latter lies between 35 and 45%. This stronglysuggests that RV t288 is a far more precise estimate of t2 than is rt2.7 For other DGPs the RVm t is consistent for the integrated variance, see Meddahi (2002) and Barndorff-Nielsen andShephard (2001), which need not equal t2. However, this does not change our main argument for using the realizedvariance, which is that RVm t is a more precise estimator of t2 than is rt2.Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)A FORECAST COMPARISON OF VOLATILITY MODELS 879The parameters of the volatility models are estimated using the first R interday observations, andthese estimates are used to make one-step-ahead forecasts for the remaining n periods. Duringthe evaluation period we calculate the realized variance from intraday returns and obtain Ot2 using(1). Thus model k yields a sequence of forecasts, hk, 2 1, . . . , hk,n 2 , that are compared to O12, . . . , On2,using a loss function L. Let the first model, k D 0, be the benchmark model that is compared tomodels k D 1, . . . , l. Each model leads to a sequence of losses, Lk,t LOt2, hk,t 2 , t D 1, . . . , n,and we define the relative performance variablesXk,t L0,t Lk,t, k D 1, . . ., l, t D 1, . . . , nOur null hypothesis is that the benchmark model is as good as any other model in terms of expectedloss. This can be formulated as the hypothesis H0 : k EXk,t 0 for all k D 1, . . . , l, becausek > 0 corresponds to the case where model k is better than the benchmark. In order to applythe stationary bootstrap of Politis and Romano (1994) in our empirical analysis, we assume thatXt D X1,t, . . . , Xl,t0 is strictly stationary, EjXtjrCυ < 1 for some r > 2 and some υ > 0, and thatXt is ˛-mixing of order r/r 2. These assumptions are due to Goncalves and de Jong (2003)and are weaker than those formulated in Politis and Romano (1994). The stationarity of fXtgwould be satisfied if frtg is strictly stationary, because fXtg is a function of frtg. Next, the momentcondition is not alarming, because fXtg measures the difference in performance of pairs of models,and it is unlikely that the predictions would be so different that the relative loss would violatethe moment condition, since the models are quite similar and have the same information. Finally,the mixing condition for fXtg is satisfied if it holds for rt. It is important to note that we havenot assumed that any of the volatility models are correctly specified. Nor is such an assumptionneeded, since our ranking of volatility models is entirely measured in terms of expected loss. Theassumptions about frtg will suffice for the comparison and inference, and it is not necessary tomake a reference to the true specification of the conditional variance. On the other hand, there isnothing preventing one of the volatility models being correctly specified.8The bootstrap implementation can be justified under weaker assumptions than those above.For example, the stationarity assumption about frtg can be relaxed and replaced by a near-epochcondition for Xt, see Goncalves and de Jong (2003). This is valuable to have in mind in the presentcontext, since the returns may not satisfy the strict stationarity requirement. A structural changein the DGP would be more critical for our analysis. While a structural change need not invalidatethe bootstrap inference (if the break occurs prior to the evaluation period), it would make it verydifficult to interpret the results, because the models are estimated using data that have differentstochastic properties.As stated above, the null hypothesis is given by H0 : l 0, where l D 1, . . . , l0. TheSPA test is based on the test statistic, TSPA n maxkD1,…,l Xk/ωO kk, where Xk is the kth element ofX n1 n tD1 Xt and ωO kk 2 is a consistent estimator of ωkk 2 limn!1 varpnXk,n, k D 1, . . . , l.Thus, TSPA n represents the largest t-statistic (of relative performance) and the relevant questionis whether TSPAn is too large for it to be plausible that l 0. This is precisely the question thatthe test for SPA is designed to answer, as it estimates the distribution of TSPA n under the nullhypothesis and obtains the critical value for TSPA n .A closely related test is the RC of White (2000) that employs the non-standardized test statisticTRCn maxkD1,…,l Xk. The critical values of the SPA test and the RC are derived in different ways,8 Even the IGARCH model produces a stationary returns series frtg, see Nelson (1990).Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)880 P. R. HANSEN AND A. LUNDEand this causes the latter to be sensitive to the inclusion of poor and irrelevant models, and to beless powerful, see Hansen (2003) for details. Power is important for our application, because amore powerful test is more likely to detect superior volatility models, if such exist.Given the assumptions stated earlier in this section, it holds that n1/2X l !d Nl0, ,where ‘!d ’ denotes convergence in distribution, where l D 1, . . . , l0 and limn!1E[nX lX l0]. This result makes it possible to test the hypothesis, H0 : l 0.5.1. Bootstrap ImplementationUnless n is large relative to l it is not possible to obtain a precise estimate of the l ð l covariancematrix, . It is therefore convenient to use a bootstrap implementation, which does not require anexplicit estimate of , and the tests of White (2000) and Hansen (2001) can both be implementedwith the stationary bootstrap of Politis and Romano (1994).9 From the bootstrap resamples,XŁ b,1, . . . , XŁ b,n, b D 1, . . . , B, we can construct random draws of quantities of interest, whichcan be used to estimate the distributions of these quantities. In our setting we seek an estimate ofω2kk and estimates of the distributions of TSPA n and TRC n . First we calculate the sample averages,XŁb n1 n tD1 XŁ b,t, b D 1, . . . , B, and it follows from Goncalves and de Jong (2003) that theempirical distribution of n1/2XŁ b converges to the true asymptotic distribution of n1/2X. Theresamples also allow us to calculate ωO kk 2 Bn B bD1XŁ b,k Xk2, which is consistent for ωkk 2 . Weseek the distribution of the test statistics, TSPA n and TRC n , under the null hypothesis, l 0, so wemust re-centre the bootstrap variables, such that they satisfy the null hypothesis.10 Ideally, thevariables should be re-centred about the true value of l, but since l is unknown we must use anestimate and Hansen (2001) proposed the estimates:Olk D minXk, 0, Oc k D Xk1fXk,nAk,ng and Ou k D 0where Ak,n 1 4n1/4ωO kk. Thus we define ZŁ b,k ,i D XŁ b,k giXk, for i D l, c, u, where glÐmaxx, 0, gcx x Ð 1fx>Ak,ng and gux x, and it follows that E[ZŁ b,k ,ijX1, . . . , Xn] D Oi k 0 fori D l, c, u. This enables us to approximate the distribution of TSPA n by the empirical distribution ofTSPAŁ,i

b,n

maxkD1,…,l

n1/2ZŁ,ib,kωO kk , b D 1, . . . , B, i D l, c, u 2and we calculate the p-value: pO i SPA B1 B bD1 1fTSPA b,n Ł,i>TSPA n g, for i D l, c, u. The null hypothesisis rejected for small p-values. In the event that TSPA n 0, there is no evidence against the nullhypothesis, and in this case we use the convention: pO SPA 1.The three choices for Ok will typically yield three different p-values, and Hansen (2001) hasshown that the p-value based on Oc k is consistent for the true p-value, whereas Ol k and Ou k providean upper and lower bound for the true p-value, respectively.11 We denote the three resulting testsby SPAl, SPAc and SPAu, where the subscripts refer to lower, consistent and upper. The purpose9 This procedure involves a dependence parameter, q, that serves to preserve possible time-dependence in Xt. We usedq D 0.5 and generated B D 10,000 bootstrap resamples in our empirical analysis.10 The bootstrap variables are constructed such that EXŁ bjX1, . . . , Xn D X, and typically we have X 6 0.11 The true p-value is defined as limn!1 PTSPA n > t, where t is the observed value of the test statistic and the probabilityis evaluated using the true (but unknown) values of and .Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)A FORECAST COMPARISON OF VOLATILITY MODELS 881of the correction factor, Ak,n, that defines O c k, is to ensure that limn!1 POc k D 0jk D 0 D 1 andlimn!1 PZŁ b,k,n 0jk < 0 D 1. This is important for the consistency, because the models withk < 0 do not influence the asymptotic distribution of TSPA n , see Hansen (2001). However, thechoice of Ak,n is not unique, and it is therefore useful to include the p-values of the two othertests, SPAl and SPAu, because they define the range of p-values that can be obtained by varyingthe choice for Ak,n. The p-values based on the tests statistic, TRC n , are obtained similarly. Theseare denoted by RCl, RCc and RCu, where RCu corresponds to the original RC of White (2000).6. DATA AND EMPIRICAL RESULTSThe models are estimated by maximum likelihood using the estimation sample, and the model’sforecasts are compared to the realized variance in the evaluation sample.The first data set consists of DM–$ spot exchange rate data, where the estimation samplespans the period from October 1, 1987 through September 30, 1992 (1254 observations) and theout-of-sample evaluation sample spans the period from October 1, 1992 through September 30,1993n D 260. The realized variance data for the exchange rate have previously been analysed inAndersen and Bollerslev (1998) and are based on m D 288 intraday returns per day. See Andersenand Bollerslev (1997) for additional details. We adjust their measure of realized variance and useO 2t O c Ð RV t288, where cO D 0.8418 is defined in (1).The second data set consists of IBM stock returns, where the estimation period spans the periodfrom January 2, 1990 through May 28, 1999 (2378 days) and the evaluation period spans the periodfrom June 1, 1999 through May 31, 2000 (n D 254). The realized variances were constructed fromhigh-frequency data that were extracted from the Trade and Quote (TAQ) database. The intradayreturns, rt,i,m, were constructed artificially by fitting a cubic spline to all mid-quotes of a giventrading day, using the time interval 9 : 30 EST–16 : 00 EST.12 From the splines we extract f D 130artificial three-minute returns per day (out of the hypothetical m D 480 three-minute returns) andcalculate RV t130/480. There are several other methods for constructing the realized variance andseveral of these are discussed in Andersen et al. (2003). Later we verify that our empirical resultsare not influenced by our choice of estimator, as we reach the same conclusions by using six othermeasures of the realized variance.The estimate of the adjustment coefficient, (1), is cO D 4.4938, which exceeds 480/130 ‘ 3.7.This indicates that RVf/m t underestimates the daily variance by more than would be expected ifthe daily volatility was evenly spread over the 24 hours of the day. There are several possibleexplanations to the fact that we need to adjust the volatilities by a number different than 3.7.First of all, it could be the result of sample variation, but this seems unlikely as n is too largefor sampling error to explain this large a difference. A second explanation is that our intradayreturns are positively autocorrelated. The autocorrelation can arise from the market microstructureeffects or can be an artifact of the way intraday returns are constructed. A third explanation isthat returns are relatively more volatile between close and open, than between open and close,measured per unit of time. This requires that more information arrives to the market while it isclosed than while it is open. This contradicts the findings of French and Roll (1986) and Baillieand Bollerslev (1989), so we find this explanation to be unrealistic. Finally, a fourth factor that can12 This is done by applying the Splus routine called smooth.spline, which is a one-dimensional cubic smoothing splinethat has a basis of B-splines, as discussed in chapters 1–3 of Green and Silverman (1994).Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)882 P. R. HANSEN AND A. LUNDEcreate a difference between squared interday returns and the sum of squared intraday returns is theomission of the conditional expected value Ert,i,mjFt1, i D 1, . . . , m in the calculations. Supposethat Ert,i,mjFt1 D 0 for i D 1, . . . , f, but is positive during the time the market is closed. Thenr2t would, on average, be larger than mf f iD1 rt,i,m 2 , even if intraday returns were independent andhomoskedastic. Such a difference between expected returns during the time the market is openand closed could be explained as a compensation for the lack of opportunities to hedge againstrisk overnight. It is not important which of the four explanations cause the difference, as long asour adjustment does not favour some models over others. Because the adjustment is made ex postand does not depend on the model forecasts, it is unlikely that a particular model would benefitmore than other models.6.1. Results from the Model ComparisonTable II contains the results from the model comparisons in the form of p-values.13 The p-valuescorrespond to the hypothesis that the benchmark model, ARCH(1) or GARCH(1,1), is the bestmodel. The naive p-value is the p-value that one would obtain by comparing the best performingmodel to the benchmark without controlling for the full set of models. So the naive p-value is nota valid p-value and it will often be too small, and therefore more likely to indicate an unjustified‘significance’. The p-values of the SPA test and the RC control for the full set of models. Thoseof SPAc and RCc are asymptotically valid p-values, whereas those with subscript l and u providelower and upper bounds for the p-values. Although the naive p-value is not valid, it can exceedthat of the SPAc, because the best performing model need not be the model that results in thelargest t-statistic.Panel A contains the results for the exchange rate data. The p-values clearly show that theARCH(1) is outperformed by other models, although the MSE2 criterion is a possible exception.However, there is no evidence that the GARCH(1,1) is outperformed and a closer inspection ofthe models reveals that the GARCH(1,1) has one of the best sample performances.Panels B and C contain the results from the IBM return data, based on the SPA test andthe RC, respectively. From Panel B it is evident that both the ARCH(1) and the GARCH(1,l)are significantly outperformed by other volatility models in terms of all loss functions, with thepossible exception of the R2LOG loss function. Thus there is strong evidence that the GARCH(1,1)is inferior to alternative models. The p-values in Panel C are based on the (non-standardized) teststatistic TRCn . The results in Panel C are alarmingly different from those in Panel B, becausethese p-values suggest the exact opposite conclusion in most cases. Panel C suggests that theGARCH(1,1) is not significantly outperformed, and even the ARCH(1) cannot be rejected asbeing superior to all other models for three of the six loss functions. The contradicting results areexplained by the fact that the TRC n is not properly standardized, and this causes the tests RCl, RCcand RCu to be sensitive to erratic models. The problem is that a model with a relatively largevarXk has a disproportional effect on the distribution of TRC n , in particular the right tail whichdefines the critical values, see Hansen (2003). The p-values in the right-most column (boldface)are those of the original RC by White (2000), and these provide little evidence against the twobenchmarks. So the results in Table II confirm that the RC is less powerful than the SPA test.The realized variance can be constructed in many ways and different measures of the realizedvariance could lead to different results. To verify that our results are not sensitive to our choice13 Additional results are given in a technical appendix (Hansen and Lunde, 2001).Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)A FORECAST COMPARISON OF VOLATILITY MODELS 883Table II. Exchange rate data (DM/USD)Panel A: Exchange rate data (DM/USD), SPA p-valuesBenchmark: ARCH(1) Benchmark: GARCH(1,1)Metric Naive SPAl SPAc SPAu Naive SPAl SPAc SPAu

MSE1MSE2QLIKER2LOGMAE1MAE2

0.00770.03920.0067<0.0001<0.00010.0002

0.01790.06950.01690.00020.00020.0011

0.01790.07480.01840.00020.00020.0011

0.0209 0.2911 0.3164 0.4589 0.78870.0797 0.2025 0.6006 0.7652 0.92790.0194 0.2528 0.5831 0.7707 0.96390.0002 0.0708 0.2144 0.3269 0.66270.0002 0.0636 0.2274 0.3296 0.63090.0012 0.1832 0.2177 0.2920 0.5663Panel B: IBM Data, SPA p-valuesBenchmark: ARCH(1) Benchmark: GARCH(1,1)Metric Naive SPAl SPAc SPAu Naive SPAl SPAc SPAu

MSE1MSE2QLIKER2LOGMAE1MAE2

0.00520.00610.00030.01080.00120.0014

0.00020.0001<0.00010.00110.00800.0097

0.00020.0001<0.00010.00110.00860.0100

0.0002 0.0355 0.0245 0.0300 0.03580.0001 0.0409 0.0260 0.0288 0.0316<0.0001 0.0213 0.0379 0.0463 0.05280.0014 0.0166 0.0526 0.0630 0.07410.0104 0.0026 0.0040 0.0051 0.00580.0115 0.0026 0.0054 0.0065 0.0078Panel C: IBM Data, RC p-valuesBenchmark: ARCH(1) Benchmark: GARCH(1,1)Metric Naive RCl RCc RCu Naive RCl RCc RCuMSE1 0.0052 0.0164 0.0164 0.0164 0.0355 0.1000 0.1499 0.2811MSE2 0.0061 0.0205 0.0205 0.0205 0.0409 0.1053 0.1056 0.1472QLIKE 0.0003 0.0017 0.0017 0.0017 0.0213 0.0943 0.1153 0.3750R2LOG 0.0108 0.0601 0.0713 0.0713 0.0166 0.2908 0.3535 0.6039MAE1 0.0012 0.0972 0.1227 0.1399 0.0026 0.0505 0.1144 0.1522MAE2 0.0014 0.1219 0.1649 0.1941 0.0026 0.0644 0.1135 0.1734Notes: The table presents p-values of the SPA test and the RC for two null hypotheses: that the benchmark model,ARCH(1) or GARCH(1,1), is the best model. Conclusions should be based on the SPAc test (boldface) in Panels A andB. The naive ‘p-value’ compares the best performing model to the benchmark, but ignores the full set of models. Sothe naive ‘p-value’ is not a valid p-value and the difference between it and that of SPAcRCc shows the effects ofdata mining. Panel C contains the p-values that are based on the RC non-standardized test statistic. The p-values of theoriginal RC are in boldface. A comparison of the results of Panels B and C shows that the SPA test is more powerfulthan the RC, and the latter is unable to detect the inferiority of the GARCH(1,1), and the ARCH(1) in some cases.of RV measure we repeat the empirical analysis of the IBM returns data using six other measures.These measures include: one based on a different spline method and sampling frequency; onebased on the Fourier method by Barucci and Reno (2002); two based on the previous-tick method;and two based on the linear interpolation method. The p-values of the SPAc test for the sevendifferent measures of the realized variance are presented in Table III. Fortunately, the p-valuesdo not differ much across the various measures of the realized variance, although most of thealternative measures provide slightly stronger evidence that the GARCH(1,1) is outperformed interms of the R2LOG loss function, and slightly weaker evidence in terms of the MAE1 and MAE2loss functions.Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)884 P. R. HANSEN AND A. LUNDETable III. Results for different measures of realized varianceCriterion Method for estimating realized varianceSpl-503 minSpl-2502 minFourierM D 85Linear5 minPrevious5 minLinear1 minPrevious1 minMSE1 0.0271 0.0230 0.0134 0.0125 0.0133 0.0111 0.0103MSE2 0.0280 0.0213 0.0135 0.0168 0.0181 0.0082 0.0082QLIKE 0.0457 0.0350 0.0166 0.0178 0.0175 0.0112 0.0118R2LOG 0.0651 0.0998 0.0462 0.0409 0.0505 0.0375 0.0340MAE1 0.0039 0.0635 0.0476 0.0690 0.0662 0.0960 0.0881MAE2 0.0056 0.0888 0.0724 0.0510 0.0600 0.0707 0.0749Notes: This table reports p-values of the SPAc test from the analysis of IBM returns where the GARCH(1,1) is used asthe benchmark. The p-values are obtained for seven different measures of the realized variance that are constructed withdifferent techniques (and sampling frequencies). Spl-50 and Spl-250 refer to a cubic spline method that use 50 and 250knot points, respectively; the third measure is based on the Fourier method; and the last four measures are based on thelinear interpolation and previous-tick methods.Density-0.140 -0.125 -0.110 -0.095 -0.0800102030405060708090100Loss

ARCH(1)GARCH(1,1)

Density-0.140 -0.125 -0.110 -0.095 -0.0800102030405060708090100

Gaussiant-dist.ARCH(1)GARCH(1,1)

LossDensity-0.140 -0.125 -0.110 -0.095 -0.0800102030405060708090100

Leverageno leverageARCH(1)GARCH(1,1)

LossDensity-0.140 -0.125 -0.110 -0.095 -0.0800102030405060708090100

GARCH in meancons. meanzero meanARCH(1)GARCH(1,1)

LossFigure 1. Population of model performance: exchange rate data and MSE2 loss function. The x-axis is thenegative value of average sample lossCopyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)A FORECAST COMPARISON OF VOLATILITY MODELS 885Density-0.29 -0.27 -0.25 -0.23 -0.21501015202530354045505560Density501015202530354045505560Density501015202530354045505560Density501015202530354045505560Loss-0.290 -0.270 -0.250 -0.230 -0.210Loss

ARCH(1)GARCH(1,1)

Gaussiant-dist.ARCH(1)GARCH(1,1)

Leverageno leverageARCH(1)GARCH(1,1)

GARCH in meancons. meanzero meanARCH(1)GARCH(1,1)

-0.290 -0.270 -0.250 -0.230 -0.210Loss-0.290 -0.270 -0.250 -0.230 -0.210LossFigure 2. Population of model performance: exchange rate data and MAE2 loss function. The x-axis is thenegative value of average sample lossFigures 1–4 show the ‘population’ of model performances for various loss functions (and thetwo data sets).14 The plots provide information about how similar/different the models’ sampleperformances were, and show the location of the ARCH(1) and GARCH(1,1) relative to the fullset of models. The x-axis is the (negative value of) average sample loss, such that the right tailrepresents the model with the best sample performance. Each figure contains four panels. Theupper left panel is the model density of all the models, whereas the last three panels show theperformance densities for different ‘types’ of models. The models are divided into groups accordingto their type: Gaussian vs. t-distributed specification; models with and without a leverage effect;and the three mean specifications.Figures 1 and 2, which display the results for the exchange rate data, show that the GARCH(1,1)is one of the best performing models, whereas the ARCH(1) has one of the worst sampleperformances. There are no major differences between the various types of models, althoughthere is a small tendency that the t-distributed specification leads to a better performance than aGaussian specification in Figure 2.14 To save space, we have only included the figures that correspond to the MSE2 and MAE2 loss functions. The figuresfor all six loss functions are given in Hansen and Lunde (2001).Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)886 P. R. HANSEN AND A. LUNDEDensity-31 -29 -27 -25 -230.00.050.100.150.200.250.30Density0.00.050.100.150.200.250.30Density0.00.050.100.150.200.250.30Density0.00.050.100.150.200.250.30Loss-31.0 -29.0 -27.0 -25.0 -23.0Loss-31.0 -29.0 -27.0 -25.0 -23.0Loss-31.0 -29.0 -27.0 -25.0 -23.0Loss

ARCH(1)GARCH(1,1)

Gaussiant-dist.ARCH(1)GARCH(1,1)

Leverageno leverageARCH(1)GARCH(1,1)

GARCH in meancons. meanzero meanARCH(1)GARCH(1,1)

Figure 3. Population of model performance: IBM data and MSE2 loss function. The x-axis is the negativevalue of average sample lossThe results for the IBM return data are displayed in Figures 3 and 4. From the SPA test weconcluded that the GARCH(1,1) was significantly outperformed by other models, and the twofigures also show that the GARCH(1,1) is ranked much lower in this sample. It now seems thatthe Gaussian specification does better than the t-distributed specification, on average. However,the very best performing model in terms of the MAE2 loss function is a model with a t-distributedspecification. From our analysis of the IBM data it is evident that models that can accommodatea leverage effect are superior to those that cannot, particularly in Figure 4.Although the conditional mean t D ErtjFt1 is likely to be small, it cannot ex ante beruled out that a more sophisticated specification for t, such as the GARCH-in-mean, leads tobetter forecasts of volatility than the zero-mean specification. However, the performance is almostidentical across the three mean specifications, as can be seen from Figures 1–4.7. CONCLUSIONSWe have compared a large number of volatility models, in terms of their ability to forecast theconditional variance in an out-of-sample setting.Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)A FORECAST COMPARISON OF VOLATILITY MODELS 887Density-3.5 -3.3 -3.1 -2.90.00.51.01.52.02.53.03.54.04.55.05.56.06.5Density0.00.51.01.52.02.53.03.54.04.55.05.56.06.5Density0.00.51.01.52.02.53.03.54.04.55.05.56.06.5Density0.00.51.01.52.02.53.03.54.04.55.05.56.06.5Loss

ARCH(1)GARCH(1,1)

-3.55 -3.40 -3.25 -3.10 -2.95 -2.80

Gaussiant-dist.ARCH(1)GARCH(1,1)

Loss-3.55 -3.40 -3.25 -3.10 -2.95 -2.80

Leverageno leverageARCH(1)GARCH(1,1)

Loss-3.55 -3.40 -3.25 -3.10 -2.95 -2.80Loss

GARCH in meancons. meanzero meanARCH(1)GARCH(1,1)

Figure 4. Population of model performance: IBM data and MAE2 loss function. The x-axis is the negativevalue of average sample lossOur analysis was limited to DM–$ exchange rates and IBM stock returns and a universe ofmodels that consisted of 330 different ARCH-type models. The main findings are that there isno evidence that the GARCH(1,1) model is outperformed by other models, when the models areevaluated using the exchange rate data. This cannot be explained by the SPA test lacking powerbecause the ARCH(1) model is clearly rejected and found to be inferior to other models. In theanalysis of IBM stock returns we found conclusive evidence that the GARCH(1,1) is inferior, andour results strongly suggested that good out-of-sample performance requires a specification thatcan accommodate a leverage effect.The performances of the volatility models were measured out-of-sample using six loss functions,where realized variance was used to construct an estimate of the unobserved conditional variance.The significance of relative performance was evaluated with the test for superior predictive abilityof Hansen (2001) and the reality check for data snooping of White (2000). Our empirical analysisillustrated the usefulness of the SPA test and showed that the SPA test is more powerful thanthe RC.The SPA test and the RC are not model selection criteria and therefore not designed toidentify the best volatility model (in population). It is also unlikely that our data contain sufficientinformation to conclude that the model with the best sample performance is significantly betterCopyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)888 P. R. HANSEN AND A. LUNDEthan all other models. Nevertheless, the use of a significance test, such as the SPA test, has clearadvantages over model selection criteria, because it allows us to make strong conclusions. In oursetting, the SPA test provided conclusive evidence that the GARCH(1,1) is inferior to other modelsin our analysis of IBM returns. However, in the analysis of the exchange rate data, there was noevidence against the claim that: ‘nothing beats a GARCH(1,1)’.ACKNOWLEDGEMENTSFinancial support from the Danish Research Agency, grant no 24-00-0363, and the SalomonResearch Award at Brown University is gratefully acknowledged. We thank Professor M. HashemPesaran (editor) and two anonymous referees for many suggestions that improved our paper, andwe thank Tim Bollerslev for sharing the realized variance data for the exchange rate and RobertoReno for constructing some of the realized varian ` ce data for the IBM returns. We also thank TimBollerslev, Frank Diebold, Rob Engle and seminar participants at Aarhus School of Business,Aarhus University, Brown University, Cornell University, University of Pennsylvania and ESEM2002 for valuable comments. We are responsible for all remaining errors.REFERENCESAndersen TG, Bollerslev T. 1997. Intraday periodicity and volatility persistence in financial markets. Journalof Empirical Finance 4: 115–158.Andersen TG, Bollerslev T. 1998. Answering the skeptics: Yes, standard volatility models do provide accurateforecasts. International Economic Review 39(4): 885–905.Andersen TG, Bollerslev T, Diebold FX. 2003. Parametric and nonparametric volatility measurement. InHandbook of Financial Econometrics, Vol. I, A¨ıt-Sahalia Y, Hansen LP (eds). Elsevier-North Holland:Amsterdam.Baillie RT, Bollerslev T. 1989. The message in daily exchange rates: a conditional variance tale. Journal ofBusiness & Economic Statistics 7(4): 297–305.Barndorff-Nielsen OE, Shephard N. 2001. Non-Gaussian Ornstein–Uhlenbeck-based models and some oftheir uses in financial economics (with discussion). Journal of the Royal Statistical Society, Series B 63(2):167–241.Barucci E, Reno R. 2002. On measuring volatility of diffusion processes with high frequency data. EconomicsLetters 74: 371–378.Black F. 1976. Studies in stock price volatility changes. Proceedings of the 1976 Business Meeting of theBusiness and Economics Section, American Statistical Association; 177–181.Bollerslev T. 1986. Generalized autoregressive heteroskedasticity. Journal of Econometrics 31: 307–327.Bollerslev T, Engle RF, Nelson D. 1994. ARCH models. In Handbook of Econometrics, Vol. IV, Engle RF,McFadden DL (eds). Elsevier Science B.V.: Amsterdam; 2961–3038.Diebold FX, Lopez JA. 1996. Forecast evaluation and combination. In Handbook of Statistics, Vol. 14,Statistical Methods in Finance, Maddala GS, Rao CR (eds). North-Holland: Amsterdam; 241–268.Diebold FX, Mariano RS. 1995. Comparing predictive accuracy. Journal of Business and Economic Statistics13: 253–263.Ding Z, Granger CWJ, Engle RF. 1993. A long memory property of stock market returns and a new model.Journal of Empirical Finance 1: 83–106.Duan J. 1997. Augmented GARCH(p,q) process and its diffusion limit. Journal of Econometrics 79(1):97–127.Engle RF. 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of U.K.inflation. Econometrica 45: 987–1007.Engle RF, Ng V. 1993. Measuring and testing the impact of news on volatility. Journal of Finance 48:1747–1778.Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)A FORECAST COMPARISON OF VOLATILITY MODELS 889Engle RF, Patton AJ. 2001. What good is a volatility model? Quantitative Finance 1(2): 237–245.Fleming J, Kirby C, Ostdiek B. 2003. The economic value of volatility timing using realised volatility.Journal of Financial Economics 67: 473–509.French KR, Roll R. 1986. Stock return variance: the arrival of information and the reaction of traders. Journalof Financial Economics 17: 5–26.Goncalves S, de Jong R. 2003. Consistency of the stationary bootstrap under weak moment conditions.Economics Letters 81: 273–278.Green PJ, Silverman BW. 1994. Nonparametric Regression and Generalized Linear Models. Chapman &Hall: London.Hansen BE. 1994. Autoregressive conditional density models. International Economic Review 35(3):705–730.Hansen PR. 2001. A test for superior predictive ability. Brown University, Department of Economics WorkingPaper 2001–06 (http://www.econ.brown.edu/fac/Peter Hansen).Hansen PR. 2003. Asymptotic tests of composite hypotheses. Brown University, Department of EconomicsWorking Paper 2003–09 (http://www.econ.brown.edu/fac/Peter Hansen).Hansen PR, Lunde A. 2001. Consistent ranking of volatility models (http://www.hha.dk/¾alunde/academic/research/papers/vola-mod-appendix.pdf).Hansen PR, Lunde A. 2003. Consistent preordering with an estimated criterion function, with an applicationto the evaluation and comparison of volatility models. Brown University Working Paper 2003–01(http://www.econ.brown.edu/fac/Peter Hansen).Hansen PR, Lunde A, Nason JM. 2003. Choosing the best volatility models: the model confidence setapproach. Oxford Bulletin of Economics and Statistics 65: 839–861.Hentshel L. 1995. All in the family: nesting symmetric and asymmetric garch models. Journal of FinancialEconomics 39: 71–104.Hol E, Koopman SJ. 2002. Stock index volatility forecasting with high frequency data. Manuscript, Department of Econometrics, Free University of Amsterdam.Lopez JA. 2001. Evaluation of predictive accuracy of volatility models. Journal of Forecasting 20(1):87–109.Loudon GF, Watt WH, Yadav PK. 2000. An empirical analysis of alternative parametric ARCH models.Journal of Applied Econometrics 15(1): 117–136.Martens M. 2002. Measuring and forecasting S&P 500 index futures volatility using high-frequency data.Journal of Futures Markets 22(6): 497–518.Meddahi N. 2002. A theoretical comparison between integrated and realized volatility. Journal of AppliedEconometrics 17: 479–508.Nelson DB. 1990. Stationarity and persistence in the GARCH(1,1) model. Econometric Theory 6: 318–334.Pagan AR, Schwert GW. 1990. Alternative models for conditional volatility. Journal of Econometrics 45:267–290.Politis DN, Romano JP. 1994. The stationary bootstrap. Journal of the American Statistical Association 89:1303–1313.Poon S-H, Granger C. 2003. Forecasting volatility in financial markets: a review. Journal of EconomicLiterature 41: 478–539.Schwert GW. 1990. Stock volatility and the crash of ‘87. Review of Financial Studies 3(1): 77–102.Taylor SJ. 1986. Modelling Financial Time Series. John Wiley & Sons: New York.West KD. 1996. Asymptotic inference about predictive ability. Econometrica 64: 1067–1084.White H. 2000. A reality check for data snooping. Econometrica 68: 1097–1126.Copyright 2005 John Wiley & Sons, Ltd. J. Appl. Econ. 20: 873–889 (2005)

- Assignment status: Already Solved By Our Experts
*(USA, AUS, UK & CA Ph. D. Writers)***CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS**

QUALITY: 100% ORIGINAL PAPER – **NO PLAGIARISM** – CUSTOM PAPER