Abstract
We first analyze some features of numerical weather predictions (NWP) for global solar radiation and notice that they are undersmooth. This finding opens a way to improvements via various smoothing strategies. Then we introduce a statistical modeling framework based on modern semiparametric regression. We use a numerical weather prediction (NWP) model output as one of the inputs for our statistical model. The statistical model is build on the modern regression formalism, utilizing nonparametric B-splines for nonlinear parts whose exact shape is unknown a priori (apart from physically motivated smoothness). Then we illustrate its abilities for systematic development of strategies for NWP calibration and further development. The results are useful both for practical forecasting and as a source of feedback for NWP modelers.
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Keywords
- Generalized additive model
- Numerical weather forecast model
- Global solar radiation
- Calibration
- Semiparametric modeling
1 Introduction
Increasing importance of solar radiation forecasting is tightly related to the recent rapid development of photovoltaic (PV) electricity production facilities and markets. The PV expansion is really fast and global. EPIA report [1] illustrates the amazing speed of the PV increase in Europe alone. The installed capacity was there less 1.306 GW in 2003, 5.312 GW in 2007, 30.504 GW in 2010, 81.488 GW in 2010. The (medium scenario) projection for 2018 is about 137 GW.
The solar radiation forecasting is nowadays conducted routinely by many teams and organizations. Nevertheless, it is still difficult to achieve good quality of the short to medium forecasts (especially in the moderate latitudes where the changes of cloud cover are frequent), despite substantial effort exercised in the past (as demonstrated recently e.g. by the multi–team forecasting exercise comparison within the COST WIRE project, [2]). Practical and theoretical challenges for the global solar radiation prediction are related both to relatively complex physics of the light transmission and to the substantial variability of weather which modifies the transmission conditions in fundamental way. The error variability that this brings into solar and consequently also to PV production forecast is substantial. In order to alleviate problems the PV output variability might bring into the power grid systems, it is highly desirable to decrease the prediction error (so that grid operators can manage grid balance more smoothly and more easily).
Many different strategies and even different paradigms have been tried for the solar irradiance or PV output forecasting [3, 4]. The more widely used approaches include time-series techniques [5], neural networks [6] and various modern statistical semi–parametric regression techniques [7], among others. Prominent role is played by the forecasting based on the numerical weather prediction models (NWP) [8,9,10].
It is interesting to investigate details of the prediction errors of the NWP forecasts for (at least) two reasons. The analysis of the random errors and systematic biases can provide a valuable feedback for the numerical modelers, showing various deficiencies and motivating specific imporovements. But by far, such analyses are not of only academic interest. With importance and market share of the PV energy increasing, it is quite essential to improve the solar radiation forecasts for short and medium horizons as much as possible.
2 NWP Errors and Their Statistical Corrections
2.1 Data
In order to explore behavior of the NWP forecast errors in detail and to delineate specific possibilities of the improvement via statistical calibration of NWP output, we launched a relatively large study, based on professional measurements from the Czech Hydrometeorological Institute (CHMI) and output of several NWP models and model versions. The CHMI measurements come from 15 stations and cover the area Czech Republic from May 2011 to April 2012. We separately computed 4 models for the same period in hourly resolution. In particular, we worked with MM5 3.6, MM5 3.7, WRF 2.2 and WRF 3.4, running in 9 \(\times \) 9 km spatial and hourly temporal resolution. They were evaluated for different horizons up to 72 h ahead, but in this paper we will concentrate on D0 (i.e. horizons 1 to 24 h ahead) only. The resulting output was then postprocessed by simple spatial averaging of all grid points within a square centered near the CHMI station. We used averaging over the square with size 0 (no spatial averaging, direct use of the nearest gridpoint), 27 km, 54 km and 108 km.
2.2 Empirical Behavior of the NWP Forecasts
It is quite clear that even though the NWP models are based on a highly formalized physics, described by partial differential equations solved through a carefully selected numerically stable scheme, they cannot be errorless. This is true especially with respect to inherently complicated variables like global solar radiation coming to a horizontal surface (measured in \(\text {W.m}^{-2}\)). It is natural to ask to what extent the errors are random and to which degree they show systematic features which are potentially repairable by subsequent calibration. It is also interesting to compare the behavior of the four different models. Figure 1 shows that all four models are quite far from ideal represented by diagonal line and corresponding to identity mapping. Nevertheless, their properties differ both in terms of systematic features and in random variability. While MM5 3.6 tends to underestimate, WRF 3.4 overestimates. MM5 3.7 with WRF 2.2 are about best, but random variability tends to be a bit smaller for MM5 3.7. Quite interestingly, higher version is not always better (WRF 2.2 is better than 3.4 which overestimates severely)! It is quite traditional to aim at (partial) correction of the systematic under- or over-estimation by very simple techniques which are typically denoted by the generic name of postprocessing or as the model output statistics (MOS) [11]. Most often, just a linear regression (or quantile regression, [12]) is used to this end. Figure 2 compares the raw output of MM5 3.7 (which is about the best among the four NWP tested) with the linear calibrations via ordinary least squares (OLS) and via quantile regression (for median). The scatterplots illustrate that while the most obvious part of the systematic error is improved, there is a huge residual random error that is untouched (or even slightly enhanced) by the procedure. Now we want to investigate whether we can do more, perhaps after discovering specific features of the NWP prediction errors amenable to structured statistical modeling.
2.3 Spatial Averaging
A simple but often neglected approach to the NWP calibration might involve spatial averaging. The averaging over all gridpoints within a given square centered at the gridpoint closest to the measurement station which we used is just one example of a very general approach based on the functionals of the predicted field. This is in contrast with the traditional MOS approach which is based on linear (or other simple) transformation of the output just at the gridpoint closest to the location where the prediction is needed. We can certainly think of more complicated (and perhaps better motivated) functionals like weighted integrals over elliptical regions given by the prevailing wind directions, but the flat weight averaging over the square serves as a simple and illustrative motivation for more structured approaches to come later.
Figure 3 shows how RMSE changes with different amount of spatial averaging of raw NWP’s. The boxplots depict the distribution of prediction bias or RMSE over all available CHMI stations (for WRF 3.4). Linearly calibrated versions (not shown here) behave quite similarly, in principle. While bias is essentially untouched by the spatial averaging, the RMSE is influenced quite dramatically. The RME decrease brought by the averaging is consistent across different NWPs. Even closer look at the RMSE behavior is offered by kriging RME surface [13] (not shown here). It demonstrates quite clearly that the spatial averaging not only decreases RMSE on average but it also makes its spatial field flatter (closer to the ideal represented by small, spatially constant RMSE). All together, it is quite clear that the spatial averaging decreases random error variability of the NWP output without introducing substantial bias. This is certainly interesting, especially when taking into account the relatively large extent of the maximum averaging tested. From these results, we can clearly see that the spatial averaging improves the prediction performance! This might seem counterintuitive at first sight - instead of using the original output (obtained at the cost of substantial computational effort), a coarser (spatially averaged) version is taken with success. The explanation of the phenomenon lies in that the NWP output contains non-negligible noise. Even that the NWP output closely resembles what can be a real spatio-temporal behavior of the global radiation, non-negligible part is not at a right place in right time, destroying prediction performance. This part is (partially) smoothed out by the spatial averaging.
2.4 Statistical Modeling for NWP Calibration
What is the general lesson to be taken from the NWP behavior just observed? First, the results from Sect. 2.3 show that the NWP as a predictor is undersmooth. This is strikingly different from a typical situations in which statistical predictor or potential covariates are oversmooth. Second, this suggests smoothing as a general strategy for the improvement of prediction performance. Not every smoothing is good, however. For instance, simple smoothing of NWP time trajectory would give disastrous results (much worse than the original NWP). Similarly, smoothing obtained by running NWP in much coarser spatial and/or temporal resolution would be very bad either. Most likely it would lead to unrealistic shape of the predicted global radiance field in the long term.
Successful strategy for improvement has to carefully distinguish between what to smooth out and what to retain in the predictor. A natural way to proceed is to utilize structured statistical modeling. In particular, modern semiparametric regression modeling like the generalized additive model (GAM) [14] framework offers substantial flexibility both for modeling, model selection, testing and prediction. The framework has already been successfully applied in a different context of fully empirical solar radiation modeling [7].
While the simple linear calibration (quite extreme form of a parameter-driven smoothing) often underlies standard MOS approach, we do not need to take it for granted. Generalizing to a larger model class having linearity as a special case, we can check whether linearity is plausible or not. A simple yet flexible generalization of the linear model can be formulated as the following GAM model:
where \(Y_t\) is the solar radiation measured at time t, \(X_t\) is the output of MM5 3.7 NWP, \(\epsilon _t\) is an independent Gaussian homoscedastic error. \(\beta _0\) and f(.) are unknown parameters to be estimated via optimization of the (penalized) likelihood. f is actually a “functional parameter” and it is modeled via thin plate spline with roughness penalty coefficient estimated by crossvalidation. Figure 4 shows that requiring a strict linearity is a bit too stringent. The GAM model is essentially linear in most of the NWP output range, but it becomes curved at both lower and upper extremes. Nonlinearity is more pronounced at low NWP values. This shape, together with slope much smaller than 1 in the central part of the plot, suggests that the model differentiates between predictions of different magnitudes. It believes less and less in larger predictions in general. But it suppresses very large predicions progressively and shrinks down the small predictions aggressively, reflecting substantial positive bias for very small or very large NWP predictions. One of big attractions of statistical modeling in the current context is the ability to access uncertainty in a natural and yet fully formalized way. Based on that, we can construct (asymptotic) tests of hypotheses. For instance, the model (1) is significantly different (better) than the model linear in \(X_t\) (p-value < 0.0001). Nonlinear model also reduces Akaike information criterion (AIC) by 908.6, compared to the (traditional) linear model. When checking the model (2), we can compare its performance to different physically plausible alternatives. One interesting alternative is to postulate a model linear in \(X_t\) but with a non–constant, time–varying slope. That is, we use \(\beta _0+\beta _1(U_t).X_t\) instead of time-constant (traditional) variant \(\beta _0+\beta _1.X_t\) in the predictor. From computational point of view, it is important that such a model is still a member of GAM class. Such a formulation seems to be quite appealing since one might suspect that calibration might vary during the time (e.g. due to different optical properties of air masses or due to climatological differences in cloud amount and character). Nevertheless, when we compare AIC for the linear (constant coefficient) model, nonlinear model (1), day-of-year–varying slope model (with \(U_t\) being position of day within a year), we get 102868.7, 101960.1, 102693.7 establishing the nonlinear model (1) as the winner.
The calibration can vary also in space, but proper formulation of the space-varying model is much more demanding. Very often, the models of this kind are specified in a Bayesian framework [13]. Nevertheless, with a carefully selected penalty matrix, they can be also fitted as GAMs. Modeling of this kind can be useful as a precursor for capacity planning and optimal PV allocation in the sense of statistical design theory.
GAM modeling can be employed to answer very specific questions of substantial practical relevance. If we are interested in whether using two NWPs can improve solar radiation prediction substantially (e.g. adding WRF 3.4 which scored poorly in 1 to well performing MM5 3.7), we can use
where g(.) is another “functional parameter” to be estimated from data and \(X_t-Z_t\) is the difference between WRF 3.4 and MM5 3.7. Difference parametrization is motivated by reduction of concurvity. While the estimate of the f component is similar as before, it is interesting to look closely at estimate of g. in Fig. 5. g shape (and much smaller than unitary slope) suggests that the model does not believe in WRF 3.7 too much in general. But it is is of substantial interest that the model suppresses influence of the WRF 3.7 most substantially when discrepancy between the two NWPs is large. Certainly, the optimal model form is quite far from simple average of the two NWP’s (simple, constant weight averaging of available “ensemble“ is often employed in practice).
The GAM framework is useful even for much more complicated scenarios and analyses. For instance, we might want to know how useful is the NWP output for prediction of the so called ramps - that is the situations where solar radiation increases suddenly. This is a problem of substantial practical interest. Abrupt increase in solar radiation leads to fast increase in PV output which in turn can cause problems in power grid and difficulties for the transmission grid operator. At the same time, it is not easy to grasp the ramp problem with traditional techniques because of both nonlinear character of the problem and “probabilistic” nature of the event of interest. Nevertheless, it is relatively straightforward to approach this problem with a suitably formulated GAM. The following logistic regression model for ramp being defined as hour-to-hour increase larger than 200 \(\text {W.m}^{-2}\) can serve as an example
where \(D_t:=X_t-X_{t-1}\) is the difference of NWP predictions, \(Y_t\) is the 0/1 indicator of whether the actual solar radiation difference between hours t and \(t-1\) was larger than 200 \(\text {W.m}^{-2}\). I(.) is an indicator function assuming the value of 1 if its argument is true and value of 0 otherwise. The model (3) was selected after an extensive AIC-based screening among many variants. It is a logistic regression (with the canonical, logistic link). It assumes Bernoulli distribution of ramp indicator, given the NWP output. In the core of the model is the relationship between the NWP lag-1 difference and probability of the ramp. This is described by the function g, which is nonlinear even on the logistic scale. Intercept depends on whether NWP output indicates that the ramp event occurred or not. \(\pi _t\) is also adjusted by a smooth function of the NWP prediction. Parameters \(\beta _{0L}\), \(\beta _{0H}\) and “functional parameters” f(.), g(.) are estimated simultaneously via penalized version of the IRLS (iterative least squares) algorithm [14]. All of the effects are significant - so that not only difference but also endpoint NWP prediction is important. Close inspection of the function g (not shown here) illustrates from another viewpoint that the NWP output is a very noisy predictor - the probability of a ramp increases slowly with the (modified) NWP difference \(\max \{D_t,0\}\), far beyond the nominal cutoff 200 \(\text {W.m}^{-2}\) (Fig. 6).
For illustration purposes, we focused on solar radiation modeling, but experience shows that the GAM modeling is equally important and useful when one constructs a model for predicting PV output based on NWP output. Formulation of such models involves Sun and panel geometry and hence it brings an additional level of complexity.
The semiparametric regression models discussed in this paper can be fitted to data in R [15] relatively easily, for instance using the mgcv library [14] (frequentist style with some Bayesian elements) or using the computationally efficient Bayesian approach INLA (Integrated Nested Laplace Approximation) [16].
3 Conclusions
NWP output is an important predictor useful even for variables difficult to predict, like the solar radiation. It is important to realize that such an output is rarely usable as it is. It contains not only systematic biases but also a lot of noise which needs to be filtered/snoothed out. It is important that the smoothing procedure has to be designed very carefully since ad hoc approaches might easily lead to further deterioration. Semiparametric statistical models like those from GAM class are extremely useful both for detecting the problems and also for delineating the correction in a constructive way - leading to a calibrated model with properly smoothed NWP output.
References
EPIA 2014: European Photovoltaic Industry Association. Global Market Outlook for Photovoltaics 2014–2018. http://www.cleanenergybusinesscouncil.com/site/resources/files/reports/EPIA_Global_Market_Outlook_for_Photovoltaics_2014-2018_-_Medium_Res.pdf
COST Action ES1002: Weather Intelligence for Renewable Energies. http://www.wire1002.ch/
Diagne, H.M., David, M., Lauret, P., Boland, J.: Solar irradiance forecasting: state-of-the-art and proposition for future developments for small-scale insular grids. In: Proceedings of the World Renewable Energy Forum 2012, Denver, USA (2012)
Paulescu, M., Paulescu, E., Gravila, P., Badescu, V.: Weather Modeling and Forecasting of PV System Operation. Springer, London (2013)
Reikard, G.: Predicting solar radiation at high resolution: a comparison of time series forecasts. Sol. Energy 83, 342–349 (2009)
Kumar, R., Aggarwal, R.D., Sharma, J.D.: New regression model to estimated global solar radiation using artificial neural network. Adv. Energy Eng. 1, 66–73 (2013)
Brabec, M., Paulescu, M., Badescu, V.: Tailored vs black-box models for forecasting hourly average solar irradiance. Sol. Energy 111, 320–331 (2015)
Warner, T.T.: Weather Modeling and Forecasting of PV System Operation. Cambridge University Press, Cambridge (2011)
Mathiesen, P., Kleissl, J.: Evaluation of numerical weather prediction for intra-day solar forecasting in the continental United States. Sol. Energy 85, 967–977 (2011)
Perez, R., Lorenz, E., Pelland, S., Beauharnois, M., Knowe, G.V., Hemker, K., Heinemann, D., Remund, J., et al.: Comparison of numerical weather prediction solar irradiance forecasts in the US Canada and Europe. Sol. Energy 94, 305–326 (2013)
Glahn, B., Gilbert, K., Cosgrove, R., Ruth, D., Sheets, K.: The gridding of MOS. Weather Forecast. 24, 520–529 (2009)
Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)
Banerjee, S., Carlin, B.P., Gelfand, A.E.: Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC Press, London (2004)
Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC, London (2006)
R Core Team 2016. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
The R-INLA project 2016. http://www.r-inla.org/
Acknowledgments
The work on this article was partly supported by the long-term strategic development financing of the Institute of Computer Science (RVO:67985807) and by the Czech Science Foundation grant GA13-34856S, Advanced random field methods in data assimilation for short-term weather prediction.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Brabec, M., Eben, K., Pelikan, E., Krc, P., Resler, J., Jurus, P. (2018). Statistical Modeling for Improvement of Numerical-Model-Based Solar Radiation Forecasts. In: Abraham, A., Haqiq, A., Ella Hassanien, A., Snasel, V., Alimi, A. (eds) Proceedings of the Third International Afro-European Conference for Industrial Advancement — AECIA 2016. AECIA 2016. Advances in Intelligent Systems and Computing, vol 565. Springer, Cham. https://doi.org/10.1007/978-3-319-60834-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-60834-1_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60833-4
Online ISBN: 978-3-319-60834-1
eBook Packages: EngineeringEngineering (R0)