Keywords

1 Introduction

Increasing importance of solar radiation forecasting is tightly related to the recent rapid development of photovoltaic (PV) electricity production facilities and markets. The PV expansion is really fast and global. EPIA report [1] illustrates the amazing speed of the PV increase in Europe alone. The installed capacity was there less 1.306 GW in 2003, 5.312 GW in 2007, 30.504 GW in 2010, 81.488 GW in 2010. The (medium scenario) projection for 2018 is about 137 GW.

The solar radiation forecasting is nowadays conducted routinely by many teams and organizations. Nevertheless, it is still difficult to achieve good quality of the short to medium forecasts (especially in the moderate latitudes where the changes of cloud cover are frequent), despite substantial effort exercised in the past (as demonstrated recently e.g. by the multi–team forecasting exercise comparison within the COST WIRE project, [2]). Practical and theoretical challenges for the global solar radiation prediction are related both to relatively complex physics of the light transmission and to the substantial variability of weather which modifies the transmission conditions in fundamental way. The error variability that this brings into solar and consequently also to PV production forecast is substantial. In order to alleviate problems the PV output variability might bring into the power grid systems, it is highly desirable to decrease the prediction error (so that grid operators can manage grid balance more smoothly and more easily).

Many different strategies and even different paradigms have been tried for the solar irradiance or PV output forecasting [3, 4]. The more widely used approaches include time-series techniques [5], neural networks [6] and various modern statistical semi–parametric regression techniques [7], among others. Prominent role is played by the forecasting based on the numerical weather prediction models (NWP) [8,9,10].

It is interesting to investigate details of the prediction errors of the NWP forecasts for (at least) two reasons. The analysis of the random errors and systematic biases can provide a valuable feedback for the numerical modelers, showing various deficiencies and motivating specific imporovements. But by far, such analyses are not of only academic interest. With importance and market share of the PV energy increasing, it is quite essential to improve the solar radiation forecasts for short and medium horizons as much as possible.

2 NWP Errors and Their Statistical Corrections

2.1 Data

In order to explore behavior of the NWP forecast errors in detail and to delineate specific possibilities of the improvement via statistical calibration of NWP output, we launched a relatively large study, based on professional measurements from the Czech Hydrometeorological Institute (CHMI) and output of several NWP models and model versions. The CHMI measurements come from 15 stations and cover the area Czech Republic from May 2011 to April 2012. We separately computed 4 models for the same period in hourly resolution. In particular, we worked with MM5 3.6, MM5 3.7, WRF 2.2 and WRF 3.4, running in 9 \(\times \) 9 km spatial and hourly temporal resolution. They were evaluated for different horizons up to 72 h ahead, but in this paper we will concentrate on D0 (i.e. horizons 1 to 24 h ahead) only. The resulting output was then postprocessed by simple spatial averaging of all grid points within a square centered near the CHMI station. We used averaging over the square with size 0 (no spatial averaging, direct use of the nearest gridpoint), 27 km, 54 km and 108 km.

2.2 Empirical Behavior of the NWP Forecasts

It is quite clear that even though the NWP models are based on a highly formalized physics, described by partial differential equations solved through a carefully selected numerically stable scheme, they cannot be errorless. This is true especially with respect to inherently complicated variables like global solar radiation coming to a horizontal surface (measured in \(\text {W.m}^{-2}\)). It is natural to ask to what extent the errors are random and to which degree they show systematic features which are potentially repairable by subsequent calibration. It is also interesting to compare the behavior of the four different models. Figure 1 shows that all four models are quite far from ideal represented by diagonal line and corresponding to identity mapping. Nevertheless, their properties differ both in terms of systematic features and in random variability. While MM5 3.6 tends to underestimate, WRF 3.4 overestimates. MM5 3.7 with WRF 2.2 are about best, but random variability tends to be a bit smaller for MM5 3.7. Quite interestingly, higher version is not always better (WRF 2.2 is better than 3.4 which overestimates severely)! It is quite traditional to aim at (partial) correction of the systematic under- or over-estimation by very simple techniques which are typically denoted by the generic name of postprocessing or as the model output statistics (MOS) [11]. Most often, just a linear regression (or quantile regression, [12]) is used to this end. Figure 2 compares the raw output of MM5 3.7 (which is about the best among the four NWP tested) with the linear calibrations via ordinary least squares (OLS) and via quantile regression (for median). The scatterplots illustrate that while the most obvious part of the systematic error is improved, there is a huge residual random error that is untouched (or even slightly enhanced) by the procedure. Now we want to investigate whether we can do more, perhaps after discovering specific features of the NWP prediction errors amenable to structured statistical modeling.

Fig. 1.
figure 1

Comparison of the prediction performance of the four NWP raw models. Hradec Kralove, D0 horizons. Prediction on the vertical, measurement on the horizontal axis.

Fig. 2.
figure 2

MM5 3.7, raw and simply postprocessed output - ordinary least squares (OLS) and quantile regression (QR) in different panels. Hradec Kralove, D0 horizons. Postprocessed NWP on vertical, measurement on horizontal axis.

2.3 Spatial Averaging

A simple but often neglected approach to the NWP calibration might involve spatial averaging. The averaging over all gridpoints within a given square centered at the gridpoint closest to the measurement station which we used is just one example of a very general approach based on the functionals of the predicted field. This is in contrast with the traditional MOS approach which is based on linear (or other simple) transformation of the output just at the gridpoint closest to the location where the prediction is needed. We can certainly think of more complicated (and perhaps better motivated) functionals like weighted integrals over elliptical regions given by the prevailing wind directions, but the flat weight averaging over the square serves as a simple and illustrative motivation for more structured approaches to come later.

Fig. 3.
figure 3

Distribution of RMSE comparing different spatial averaging of raw NWP, D0 horizons. Different panels correspond to combinations of NWP and level of spatial averaging. RMSE plotted along horizontal axis.

Figure 3 shows how RMSE changes with different amount of spatial averaging of raw NWP’s. The boxplots depict the distribution of prediction bias or RMSE over all available CHMI stations (for WRF 3.4). Linearly calibrated versions (not shown here) behave quite similarly, in principle. While bias is essentially untouched by the spatial averaging, the RMSE is influenced quite dramatically. The RME decrease brought by the averaging is consistent across different NWPs. Even closer look at the RMSE behavior is offered by kriging RME surface [13] (not shown here). It demonstrates quite clearly that the spatial averaging not only decreases RMSE on average but it also makes its spatial field flatter (closer to the ideal represented by small, spatially constant RMSE). All together, it is quite clear that the spatial averaging decreases random error variability of the NWP output without introducing substantial bias. This is certainly interesting, especially when taking into account the relatively large extent of the maximum averaging tested. From these results, we can clearly see that the spatial averaging improves the prediction performance! This might seem counterintuitive at first sight - instead of using the original output (obtained at the cost of substantial computational effort), a coarser (spatially averaged) version is taken with success. The explanation of the phenomenon lies in that the NWP output contains non-negligible noise. Even that the NWP output closely resembles what can be a real spatio-temporal behavior of the global radiation, non-negligible part is not at a right place in right time, destroying prediction performance. This part is (partially) smoothed out by the spatial averaging.

2.4 Statistical Modeling for NWP Calibration

What is the general lesson to be taken from the NWP behavior just observed? First, the results from Sect. 2.3 show that the NWP as a predictor is undersmooth. This is strikingly different from a typical situations in which statistical predictor or potential covariates are oversmooth. Second, this suggests smoothing as a general strategy for the improvement of prediction performance. Not every smoothing is good, however. For instance, simple smoothing of NWP time trajectory would give disastrous results (much worse than the original NWP). Similarly, smoothing obtained by running NWP in much coarser spatial and/or temporal resolution would be very bad either. Most likely it would lead to unrealistic shape of the predicted global radiance field in the long term.

Successful strategy for improvement has to carefully distinguish between what to smooth out and what to retain in the predictor. A natural way to proceed is to utilize structured statistical modeling. In particular, modern semiparametric regression modeling like the generalized additive model (GAM) [14] framework offers substantial flexibility both for modeling, model selection, testing and prediction. The framework has already been successfully applied in a different context of fully empirical solar radiation modeling [7].

While the simple linear calibration (quite extreme form of a parameter-driven smoothing) often underlies standard MOS approach, we do not need to take it for granted. Generalizing to a larger model class having linearity as a special case, we can check whether linearity is plausible or not. A simple yet flexible generalization of the linear model can be formulated as the following GAM model:

$$\begin{aligned} Y_t = \beta _0 + f(X_t)+\epsilon _t, \end{aligned}$$
(1)

where \(Y_t\) is the solar radiation measured at time t, \(X_t\) is the output of MM5 3.7 NWP, \(\epsilon _t\) is an independent Gaussian homoscedastic error. \(\beta _0\) and f(.) are unknown parameters to be estimated via optimization of the (penalized) likelihood. f is actually a “functional parameter” and it is modeled via thin plate spline with roughness penalty coefficient estimated by crossvalidation. Figure 4 shows that requiring a strict linearity is a bit too stringent. The GAM model is essentially linear in most of the NWP output range, but it becomes curved at both lower and upper extremes. Nonlinearity is more pronounced at low NWP values. This shape, together with slope much smaller than 1 in the central part of the plot, suggests that the model differentiates between predictions of different magnitudes. It believes less and less in larger predictions in general. But it suppresses very large predicions progressively and shrinks down the small predictions aggressively, reflecting substantial positive bias for very small or very large NWP predictions. One of big attractions of statistical modeling in the current context is the ability to access uncertainty in a natural and yet fully formalized way. Based on that, we can construct (asymptotic) tests of hypotheses. For instance, the model (1) is significantly different (better) than the model linear in \(X_t\) (p-value < 0.0001). Nonlinear model also reduces Akaike information criterion (AIC) by 908.6, compared to the (traditional) linear model. When checking the model (2), we can compare its performance to different physically plausible alternatives. One interesting alternative is to postulate a model linear in \(X_t\) but with a non–constant, time–varying slope. That is, we use \(\beta _0+\beta _1(U_t).X_t\) instead of time-constant (traditional) variant \(\beta _0+\beta _1.X_t\) in the predictor. From computational point of view, it is important that such a model is still a member of GAM class. Such a formulation seems to be quite appealing since one might suspect that calibration might vary during the time (e.g. due to different optical properties of air masses or due to climatological differences in cloud amount and character). Nevertheless, when we compare AIC for the linear (constant coefficient) model, nonlinear model (1), day-of-year–varying slope model (with \(U_t\) being position of day within a year), we get 102868.7, 101960.1, 102693.7 establishing the nonlinear model (1) as the winner.

The calibration can vary also in space, but proper formulation of the space-varying model is much more demanding. Very often, the models of this kind are specified in a Bayesian framework [13]. Nevertheless, with a carefully selected penalty matrix, they can be also fitted as GAMs. Modeling of this kind can be useful as a precursor for capacity planning and optimal PV allocation in the sense of statistical design theory.

Fig. 4.
figure 4

Semiparametric estimate of the f(.) function of model (1) with MM5 3.7, D0 horizons (1 to 24 h ahead) in Hradec Kralove site. Estimate (solid line) and pointwise constructed 95% confidence limits (dashed lines).

GAM modeling can be employed to answer very specific questions of substantial practical relevance. If we are interested in whether using two NWPs can improve solar radiation prediction substantially (e.g. adding WRF 3.4 which scored poorly in 1 to well performing MM5 3.7), we can use

$$\begin{aligned} Y_t = \beta _0 + f(X_t)+g(Z_t-X_t)+\epsilon _t, \end{aligned}$$
(2)

where g(.) is another “functional parameter” to be estimated from data and \(X_t-Z_t\) is the difference between WRF 3.4 and MM5 3.7. Difference parametrization is motivated by reduction of concurvity. While the estimate of the f component is similar as before, it is interesting to look closely at estimate of g. in Fig. 5. g shape (and much smaller than unitary slope) suggests that the model does not believe in WRF 3.7 too much in general. But it is is of substantial interest that the model suppresses influence of the WRF 3.7 most substantially when discrepancy between the two NWPs is large. Certainly, the optimal model form is quite far from simple average of the two NWP’s (simple, constant weight averaging of available “ensemble“ is often employed in practice).

Fig. 5.
figure 5

Semiparametric estimate of the g(.) function of model (2), D0 horizons (1 to 24 h ahead) in Hradec Kralove site. Estimate (solid line) and pointwise constructed 95% confidence limits (dashed lines).

The GAM framework is useful even for much more complicated scenarios and analyses. For instance, we might want to know how useful is the NWP output for prediction of the so called ramps - that is the situations where solar radiation increases suddenly. This is a problem of substantial practical interest. Abrupt increase in solar radiation leads to fast increase in PV output which in turn can cause problems in power grid and difficulties for the transmission grid operator. At the same time, it is not easy to grasp the ramp problem with traditional techniques because of both nonlinear character of the problem and “probabilistic” nature of the event of interest. Nevertheless, it is relatively straightforward to approach this problem with a suitably formulated GAM. The following logistic regression model for ramp being defined as hour-to-hour increase larger than 200 \(\text {W.m}^{-2}\) can serve as an example

$$\begin{aligned} Y_t\sim & {} Bin(\pi _t,1) \\ \log \left( \frac{\pi _t}{1-\pi _t} \right)= & {} \beta _{0L}.I(D_t\le 200) +\beta _{0H}.I(D_t>200) + f(X_t)+g(\max \{D_t,0\}), \nonumber \end{aligned}$$
(3)

where \(D_t:=X_t-X_{t-1}\) is the difference of NWP predictions, \(Y_t\) is the 0/1 indicator of whether the actual solar radiation difference between hours t and \(t-1\) was larger than 200 \(\text {W.m}^{-2}\). I(.) is an indicator function assuming the value of 1 if its argument is true and value of 0 otherwise. The model (3) was selected after an extensive AIC-based screening among many variants. It is a logistic regression (with the canonical, logistic link). It assumes Bernoulli distribution of ramp indicator, given the NWP output. In the core of the model is the relationship between the NWP lag-1 difference and probability of the ramp. This is described by the function g, which is nonlinear even on the logistic scale. Intercept depends on whether NWP output indicates that the ramp event occurred or not. \(\pi _t\) is also adjusted by a smooth function of the NWP prediction. Parameters \(\beta _{0L}\), \(\beta _{0H}\) and “functional parameters” f(.), g(.) are estimated simultaneously via penalized version of the IRLS (iterative least squares) algorithm [14]. All of the effects are significant - so that not only difference but also endpoint NWP prediction is important. Close inspection of the function g (not shown here) illustrates from another viewpoint that the NWP output is a very noisy predictor - the probability of a ramp increases slowly with the (modified) NWP difference \(\max \{D_t,0\}\), far beyond the nominal cutoff 200 \(\text {W.m}^{-2}\) (Fig. 6).

Fig. 6.
figure 6

Semiparametric estimate of the g(.) function of model (3), D0 horizons in Hradec Kralove site. Estimate (solid line) and pointwise constructed 95% confidence limits (dashed lines).

For illustration purposes, we focused on solar radiation modeling, but experience shows that the GAM modeling is equally important and useful when one constructs a model for predicting PV output based on NWP output. Formulation of such models involves Sun and panel geometry and hence it brings an additional level of complexity.

The semiparametric regression models discussed in this paper can be fitted to data in R [15] relatively easily, for instance using the mgcv library [14] (frequentist style with some Bayesian elements) or using the computationally efficient Bayesian approach INLA (Integrated Nested Laplace Approximation) [16].

3 Conclusions

NWP output is an important predictor useful even for variables difficult to predict, like the solar radiation. It is important to realize that such an output is rarely usable as it is. It contains not only systematic biases but also a lot of noise which needs to be filtered/snoothed out. It is important that the smoothing procedure has to be designed very carefully since ad hoc approaches might easily lead to further deterioration. Semiparametric statistical models like those from GAM class are extremely useful both for detecting the problems and also for delineating the correction in a constructive way - leading to a calibrated model with properly smoothed NWP output.