At the heart of the lmForc package is the
Forecast
class. Base R does not provide a good format for
working with forecasts, so lmForc addresses this by introducing
a new class for storing forecasts that is simple and rigorous. The
Forecast
class is paramount to the lmForc
philosophy of simple syntax and realistic tests. Forecast
is an S4 class that contains equal length vectors with the following
data:
origin
The time when the forecast was made.future
The time that is being forecasted.forecast
The forecast itself.realized
If available, the realized value at the time
being forecasted.The Forecast
class also includes an additional
length-one slot h_ahead
for representing how many periods
ahead are being forecasted. This slot is optional, but becomes useful
for documentation and performing out-of-sample forecast tests.
We demonstrate the Forecast
class by constructing a
simple Forecast
object. This forecast contains four
observations at the quarterly frequency.
my_forecast <- Forecast(
origin = as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31")),
future = as.Date(c("2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31")),
forecast = c(4.21, 4.27, 5.32, 5.11),
realized = c(4.40, 4.45, 4.87, 4.77),
h_ahead = 4L
)
Note that we have chosen Date objects at the quarterly frequency for
our origin
and future
slots. This forecast is
for four quarters ahead, so we fill in the h_ahead
slot
with the integer four.
The origin
, future
, and
h_ahead
slots can be filled with any values. This is where
the general nature of the Forecast
class shines. In the
origin
and future
slots we could use dates at
a daily or yearly frequency, the POSIXct
class to include
minutes and seconds, or integers to represent discrete periods. We can
also store different types of forecasts. In the example above, we have
forecasts made at different origin
times for a constant
four quarter ahead forecast horizon. We could also store a forecast made
at a single origin
time for a horizon of
future
times by setting all of the origin
values to one time and using future
to represent the
horizon of times that we are forecasting. In this case the
h_ahead
slot becomes irrelevant and it is left as
NULL
. The flexibility of these slots allows us to represent
any type of numeric forecast.
The forecast
and realized
slots take
numeric vectors. In the forecast
slot we see the forecast
that was made at each origin
time and in the
realized
slot we see the true value that was realized at
each future
time. The realized values may not exist yet, so
this slot may be partially populated or not populated at all. If the
realized
slot is set to NULL
then it will be
populated with a vector of NA
values.
The Forecast
class strikes a balance between simplicity
and rigor. It is simple enough to store any numeric forecast, but it is
rigorous enough to create a useful data structure. For example, we can
quickly calculate a number of forecast accuracy metrics for the
Forecast
object we created above using only one
argument.
mse(my_forecast)
#> [1] 0.09665
rmse(my_forecast)
#> [1] 0.3108858
mae(my_forecast)
#> [1] 0.29
mape(my_forecast)
#> [1] 0.06182814
R2(my_forecast)
#> [1] 0.9973145
Because the forecast
and realized
slots
must be numeric vectors, and all slots must be of the same length, we
can calculate forecast accuracy metrics without having to worry about
input validation or coercing multiple vectors to the correct format. The
forecast accuracy metrics available in the lmForc package are
calculated as follows where: $$
n = \text{forecast vector length}\\
\quad Y_i = \text{realized values}\\
\hat{Y_i} = \text{forecast values}
$$
MSE is calculated as: $$ MSE = \frac{1}{n} \sum_{i=1}^{n}{(Y_i - \hat{Y_i})^2} $$
RMSE is calculated as: $$ RMSE = \sqrt{MSE} $$ MAE is calculated as: $$ MAE = \frac{1}{n} \sum_{i=1}^{n}{|\hat{Y_i} - Y_i|} $$ MAPE is calculated as: $$ MAPE = \frac{1}{n} \sum_{i=1}^{n}\frac{|Y_i - \hat{Y_i}|}{{Y_i}} $$ R2 is calculated as: $$ R^2 = cor(\hat{Y_i}, \ Y_i)^2 $$
Note that these equations require two inputs, but because both inputs
are already stored in the Forecast
object we only need to
pass one argument to the mse()
and rmse()
functions. Calculating forecast accuracy is a simple use case of the
Forecast
class. More complex use cases exist in the
lmForc package, where many of the functions require inputs to
be of the Forecast
class. When weighting multiple forecasts
or testing a linear model that is conditional on another forecast, the
consistent structure of the class results in simple functions that
execute correctly, no matter the type of forecast passed to the
function. Furthermore, all lmForc functions return objects of
the Forecast
class which creates synergy between functions.
One can take two linear models, test their performance out-of-sample
with the oos_realized_forc()
function which returns
Forecast
objects, and then pass these two objects to the
performance_weighted_forc()
function to find the weighted
out-of-sample performance of both models.
One fear may be that the novel Forecast
class will not
play well with functions and packages that already exist in the R
language. The lmForc package provides methods for accessing all
of the vectors stored in a Forecast
object as well as the
forc2df()
function which returns one or multiple
Forecast
objects as a data frame. With these methods, one
can easily pass the data in a Forecast
object to other
functions.
forc2df(my_forecast)
#> origin future forecast realized
#> 1 2010-03-31 2011-03-31 4.21 4.40
#> 2 2010-06-30 2011-06-30 4.27 4.45
#> 3 2010-09-30 2011-09-30 5.32 4.87
#> 4 2010-12-31 2011-12-31 5.11 4.77
origin(my_forecast)
#> [1] "2010-03-31" "2010-06-30" "2010-09-30" "2010-12-31"
future(my_forecast)
#> [1] "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-31"
forc(my_forecast)
#> [1] 4.21 4.27 5.32 5.11
realized(my_forecast)
#> [1] 4.40 4.45 4.87 4.77
Examples throughout the rest of the vignette will use a stylized
dataset with a date
column of ten quarterly dates, a
dependent variable y
, and two independent variables
x1
and x2
. Equations are also written in terms
of the variables y
, x1
, and
x2
.
date <- as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30"))
y <- c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99)
x1 <- c(4.22, 3.86, 4.27, 5.60, 5.11, 4.31, 4.92, 5.80, 6.30, 4.17)
x2 <- c(10.03, 10.49, 10.85, 10.47, 9.09, 10.91, 8.68, 9.91, 7.87, 6.63)
data <- data.frame(date, y, x1, x2)
head(data)
#> date y x1 x2
#> 1 2010-03-31 1.09 4.22 10.03
#> 2 2010-06-30 1.71 3.86 10.49
#> 3 2010-09-30 1.09 4.27 10.85
#> 4 2010-12-31 2.46 5.60 10.47
#> 5 2011-03-31 1.78 5.11 9.09
#> 6 2011-06-30 1.35 4.31 10.91
To demonstrate the _general
set of functions which
extend standard lmForc functions to accommodate non-linear
models, we will use a logit regression and a modified version of the
above dataset in which the dependent variable y
is
binary.
date <- as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30", "2012-09-30", "2012-12-31",
"2013-03-31", "2013-06-30", "2013-09-30", "2013-12-31"))
y <- c(1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0)
x1 <- c(8.22, 3.86, 4.27, 3.37, 5.88, 3.34, 2.92, 1.80, 3.30, 7.17, 3.22, 3.86, 4.27, 3.37, 5.88, 3.34)
x2 <- c(4.03, 2.46, 2.04, 2.44, 6.09, 2.91, 1.68, 2.91, 3.87, 1.63, 4.03, 2.46, 2.04, 2.44, 6.09, 2.91)
dataLogit <- data.frame(date, y, x1, x2)
head(dataLogit)
#> date y x1 x2
#> 1 2010-03-31 1 8.22 4.03
#> 2 2010-06-30 0 3.86 2.46
#> 3 2010-09-30 0 4.27 2.04
#> 4 2010-12-31 0 3.37 2.44
#> 5 2011-03-31 1 5.88 6.09
#> 6 2011-06-30 1 3.34 2.91
The is_forc()
function produces an in-sample forecast
based on a linear model. The function takes a linear model call and an
optional vector of time data associated with the linear model as inputs.
The linear model is estimated once over the entire sample period and the
coefficients are multiplied by the realized values in each period of the
sample. This is functionally identical to the predict()
function in the stats package, but it returns a
Forecast
object instead of a numeric vector.
For all observations i in the sample, coefficients are estimated as:
Yi = β0 + β1X1i + β2X2i + ϵi for all i And forecasts are estimated as:
forecasti = β0 + β1X1i + β2X2i
is_forc(
lm_call = lm(y ~ x1 + x2, data),
time_vec = data$date
)
#> h_ahead = 0
#>
#> origin future forecast realized
#> 1 2010-03-31 2010-03-31 1.394370 1.09
#> 2 2010-06-30 2010-06-30 1.138708 1.71
#> 3 2010-09-30 2010-09-30 1.423339 1.09
#> 4 2010-12-31 2010-12-31 2.358107 2.46
#> 5 2011-03-31 2011-03-31 2.024964 1.78
#> 6 2011-06-30 2011-06-30 1.450924 1.35
#> 7 2011-09-30 2011-09-30 1.894861 2.89
#> 8 2011-12-31 2011-12-31 2.502394 2.11
#> 9 2012-03-31 2012-03-31 2.867846 2.97
#> 10 2012-06-30 2012-06-30 1.384488 0.99
Note that because we are creating an in-sample forecast,
h_ahead
is set to 0 and the origin
time equals
the future
time. This test evaluates how well a linear
forecast model fits the historical data.
The is_forc_general()
function produces an in-sample
forecast based on a model function and prediction function specified by
the user. The is_forc_general
function takes a model
function, a prediction function, input data for estimating the model,
and an optional vector of time data associated with the model. The model
is estimated once over the entire sample period using the input data and
model function. Model parameters are then combined with the input data
using the prediction function to generate in-sample forecasts. This is
functionally similar to the is_forc()
function, but allows
for usage of non-linear models such as GLMs, logit regressions, decision
trees, or any custom model.
For all observations i in the sample, coefficients are estimated as:
Yi = model_function(Xi) for all i And forecasts are estimated as:
forecasti = prediction_function(model_function, Xi)
is_forc_general(
model_function = function(data) {glm(y ~ x1 + x2, data = data, family = binomial)},
prediction_function = function(model_function, data) {as.vector(predict(model_function, data, type = "response"))},
data = dataLogit,
realized = dataLogit$y,
time_vec = dataLogit$date
)
#> h_ahead = 0
#>
#> origin future forecast realized
#> 1 2010-03-31 2010-03-31 0.99229531 1
#> 2 2010-06-30 2010-06-30 0.22353367 0
#> 3 2010-09-30 2010-09-30 0.26932349 0
#> 4 2010-12-31 2010-12-31 0.13692703 0
#> 5 2011-03-31 2011-03-31 0.96280428 1
#> 6 2011-06-30 2011-06-30 0.16711236 1
#> 7 2011-09-30 2011-09-30 0.05650550 0
#> 8 2011-12-31 2011-12-31 0.03098593 0
#> 9 2012-03-31 2012-03-31 0.24950963 0
#> 10 2012-06-30 2012-06-30 0.90240710 1
#> 11 2012-09-30 2012-09-30 0.24889484 0
#> 12 2012-12-31 2012-12-31 0.22353367 0
#> 13 2013-03-31 2013-03-31 0.26932349 0
#> 14 2013-06-30 2013-06-30 0.13692703 1
#> 15 2013-09-30 2013-09-30 0.96280428 1
#> 16 2013-12-31 2013-12-31 0.16711236 0
Note that because we are creating an in-sample forecast,
h_ahead
is set to 0 and the origin
time equals
the future
time. This test evaluates how well the model
fits the historical data.
The oos_realized_forc()
function produces an h
period ahead out-of-sample forecast that is conditioned on realized
values. The function takes a linear model call, an integer number of
periods ahead to forecast, a period to end the initial coefficient
estimation and begin forecasting, an optional vector of time data
associated with the linear model, and an optional integer number of past
periods to estimate the linear model over. The linear model is
originally estimated with data up to estimation_end
minus
the number of periods specified in the estimation_window
argument. For instance, if the linear model is being estimated on
quarterly data and the estimation_window
is set to
20L
, coefficients will be estimated using five years of
data up to estimation_end
. If
estimation_window
is set to NULL
then the
linear model is estimated with all available data up to
estimation_end
. Coefficients are multiplied by realized
values of the covariates h_ahead
periods ahead to create an
h_ahead
period ahead forecast. This process is iteratively
repeated for each period after estimation_end
with
coefficients updating in each period as more information would have
become available to the forecaster. In each period, coefficients are
updated based on all available information if
estimation_window
is set to NULL
, or a rolling
window of past periods if estimation_window
is set to an
integer value.
In the sample i, for each period p greater than or
equal to estimation_end
, coefficients are updated as:
Yi = β0 + β1X1i + β2X2i + ϵi for
all p − w ≤ i ≤ p Where w is
the estimation_window
. h_ahead
h
forecasts are estimated as:
forecastp + h = β0 + β1X1p + h + β2X2p + h
oos_realized_forc(
lm_call = lm(y ~ x1 + x2, data),
h_ahead = 2L,
estimation_end = as.Date("2011-03-31"),
time_vec = data$date,
estimation_window = NULL,
return_betas = FALSE
)
#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2011-03-31 2011-09-30 1.623750 2.89
#> 2 2011-06-30 2011-12-31 2.341664 2.11
#> 3 2011-09-30 2012-03-31 3.415198 2.97
#> 4 2011-12-31 2012-06-30 2.708308 0.99
Note that the oos_realized_forc
function returns an
out-of-sample forecast that conditions on realized values that
would not have been available to the forecaster at the
forecast origin. This test evaluates the performance of a linear
forecast model had it been conditioned on perfect information.
The oos_realized_forc_general()
function produces an
h period ahead out-of-sample forecast that is conditioned on
realized values. The function takes a model function, a prediction
function, input data for estimating the model, realized values of the
dependent variable, an integer number of periods ahead to forecast, a
period to end the initial coefficient estimation and begin forecasting,
a vector of time data associated with the model, and an optional integer
number of past periods to estimate the model over. The model is
originally estimated using the input data and model function with data
up to estimation_end
minus the the number of periods
specified in estimation_window
. If
estimation_window
is left NULL
then the model
is estimated with all available data up to estimation_end
.
Model parameters are then combined with realized values of the input
data h_ahead
periods ahead to generate an
h_ahead
period ahead forecast. This process is iteratively
repeated for each period after estimation_end
with model
parameters updating in each period.
In the sample i, for each period p greater than or
equal to estimation_end
, coefficients are updated as:
Yi = model_function(Xi) for
all p − w ≤ i ≤ p Where X is
the input data and w is the estimation_window
.
h_ahead
h forecasts are estimated as:
forecastp + h = prediction_function(model_function, Xp + h)
forc <- oos_realized_forc_general(
model_function = function(data) {glm(y ~ x1 + x2, data = data, family = binomial)},
prediction_function = function(model_function, data) {
as.vector(predict(model_function, data, type = "response"))
},
data = dataLogit,
realized = dataLogit$y,
h_ahead = 2L,
estimation_end = as.Date("2012-06-30"),
time_vec = dataLogit$date,
estimation_window = NULL
)
Note that the oos_realized_forc_general
function returns
an out-of-sample forecast that conditions on realized values that
would not have been available to the forecaster at the
forecast origin. This test evaluates the performance of a forecasting
model had it been conditioned on perfect information.
The oos_vintage_forc()
function produces an
out-of-sample forecast conditioned on h period ahead forecasts
of the linear model covariates. The function takes a linear model call,
a vector of time data associated with the linear model, a vintage
forecast for each covariate in the linear model, and an optional integer
number of past periods to estimate the linear model over. For each
period in the vintage forecasts, coefficients are updated based on
information that would have been available to the forecaster at the
forecast origin. Coefficients are estimated over information from the
last estimation_window
number of periods. If
estimation_window
is left NULL
then
coefficients are estimated over all of the information that would have
been available to the forecaster. Coefficients are then multiplied by
vintage forecast values to produce a replication of real time
forecasts.
In the sample i, for each period p in the vintage forecasts VF1 and VF2, coefficients are updated as:
Yi = β0 + β1X1i + β2X2i + ϵi for all p − w ≤ i ≤ p
And h_ahead
h forecasts are estimated as:
forecastp + h = β0 + β1VF1p + β2VF2p
We introduce stylized vintage forecasts of X1 and X2 to demonstrate
the oos_vintage_forc()
function. Using four quarter ahead
forecasts of the covariates X1 and X2, we create an out-of-sample
forecast based on the coefficients and covariate forecasts that the
forecaster would have used in each period.
x1_forecast_vintage <- Forecast(
origin = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
future = as.Date(c("2011-09-30", "2011-12-31", "2012-03-31", "2012-06-30")),
forecast = c(6.30, 4.17, 5.30, 4.84),
realized = c(4.92, 5.80, 6.30, 4.17),
h_ahead = 4L
)
x2_forecast_vintage <- Forecast(
origin = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
future = as.Date(c("2011-09-30", "2011-12-31", "2012-03-31", "2012-06-30")),
forecast = c(7.32, 6.88, 6.82, 6.95),
realized = c(8.68, 9.91, 7.87, 6.63),
h_ahead = 4L
)
oos_vintage_forc(
lm_call = lm(y ~ x1 + x2, data),
time_vec = data$date,
x1_forecast_vintage, x2_forecast_vintage,
estimation_window = NULL,
return_betas = FALSE
)
#> h_ahead = 4
#>
#> origin future forecast realized
#> 1 2010-09-30 2011-09-30 -2.497310 2.89
#> 2 2010-12-31 2011-12-31 1.194088 2.11
#> 3 2011-03-31 2012-03-31 1.620716 2.97
#> 4 2011-06-30 2012-06-30 1.470027 0.99
This test replicates the forecasts that a linear model conditional on
forecasts of covariates would have produced in real-time. Here we see
the strength of the Forecast
class. Because the vintage
forecasts of X1 and X2 share the same data structure, we can calculate a
forecast that is conditional on these objects without fearing
inconsistency across inputs.
The oos_vintage_forc_general()
function produces an
out-of-sample forecast conditioned on h period ahead forecasts
of the model parameters. The function takes a model function, prediction
function, input data for estimating the model, realized values of the
dependent variable, a vector of time data associated with the model, a
forecast for each parameter in the model, and an optional integer number
of past periods to estimate the model over. For each period in the
vintage forecasts, model parameters are estimated with data up to the
current period minus the number of periods specified in
estimation_window
. If estimation_window
is
left NULL
then the model is estimated with all available
data up to the current period. Model parameters are then combined with
vintage forecast values to generate a forecast. Note that
data
input to the prediction_function
takes
the form of a data.frame with a number of columns equal to the number of
input vintage forecasts and the prediction_function
needs
to be able to take this input format and generate a forecast based on
it. Returns an out-of-sample forecast conditional on vintage forecasts
that would have been available at the forecast origin.
Replicates the forecasts that a conditional forecasting model would have
produced in real time.
In the sample i, for each period p in the vintage forecasts VF1 and VF2, coefficients are updated as:
Yi = model_function(Xi) for all p − w ≤ i ≤ p
Where X is the input data and w is the
estimation_window
. h_ahead
h
forecasts are estimated as:
forecastp + h = β0 + β1VF1p + β2VF2p forecastp + h = prediction_function(model_function, [VF1p, VF2p])
We introduce stylized vintage forecasts of X1 and X2 to demonstrate
the oos_vintage_forc()
function. Using four quarter ahead
forecasts of the covariates X1 and X2, we create an out-of-sample
forecast based on the coefficients and covariate forecasts that the
forecaster would have used in each period.
x1_forecast_vintageLogit <- Forecast(
origin = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
future = as.Date(c("2013-09-30", "2013-12-31", "2014-03-31", "2014-06-30")),
forecast = c(6.34, 4.17, 2.98, 1.84),
realized = c(5.88, 3.34, 2.92, 1.80),
h_ahead = 4L
)
x2_forecast_vintageLogit <- Forecast(
origin = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
future = as.Date(c("2013-09-30", "2013-12-31", "2014-03-31", "2014-06-30")),
forecast = c(7.32, 3.22, 2.21, 2.65),
realized = c(6.09, 2.91, 1.68, 2.91),
h_ahead = 4L
)
oos_vintage_forc_general(
model_function = function(data) {glm(y ~ x1 + x2, data = data, family = binomial)},
prediction_function = function(model_function, data) {
names(data) <- c("x1", "x2")
as.vector(predict(model_function, data, type = "response"))
},
data = dataLogit,
realized = dataLogit$y,
time_vec = dataLogit$date,
x1_forecast_vintageLogit, x2_forecast_vintageLogit,
estimation_window = NULL
)
#>
#> Note* the data argument passed to prediction_function has the following form:
#> X1 X2
#> 1 6.34 7.32
#>
#> h_ahead = 4
#>
#> origin future forecast realized
#> 1 2012-09-30 2013-09-30 0.99460268 1
#> 2 2012-12-31 2013-12-31 0.33526949 0
#> 3 2013-03-31 2014-03-31 0.02961128 NA
#> 4 2013-06-30 2014-06-30 0.03721942 NA
Note that in the example above the
prediction_function
is adapted to take a specific
data.frame as input and generate a forecast based on it. The form of
this specific data.frame is printed to the console for reference. This
test replicates the forecasts that a model conditional on forecasts of
the model parameters would have produced in real-time. Here we see the
strength of the Forecast
class. Because the vintage
forecasts of X1 and X2 share the same data structure, we can calculate a
forecast that is conditional on these objects without fearing
inconsistency across inputs.
The conditional_forc()
function produces a forecast
conditioned on forecasts of the linear model covariates. The function
takes a linear model call, a vector of time data associated with the
linear model, and a forecast for each covariate in the linear model. The
linear model is estimated once over the entire sample period and the
coefficients are multiplied by the forecasts of each covariate.
For all observations i in the sample, coefficients are estimated as:
Yi = β0 + β1X1i + β2X2i + ϵi for all i
And for all periods p in the covariate forecasts F1 and F2, forecasts are estimated as:
forecastp + h = β0 + β1F1p + β2F2p
The difference between conditional_forc()
and
oos_vintage_forc()
is that in the
conditional_forc()
function coefficients are only estimated
once over all observations. Coefficients do not update based on what
information would have been available to the forecaster at any given
point in time. We introduce stylized forecasts of X1 and X2 to
demonstrate the conditional_forc()
function. Because in
this example we are making a conditional forecast for the future instead
of testing past forecasts, we can condition on a horizon of forecasts.
This is in contrast to the oos_vintage_forc()
example where
we test the performance of four quarter ahead vintage forecasts.
x1_forecast <- Forecast(
origin = as.Date(c("2012-06-30", "2012-06-30", "2012-06-30", "2012-06-30")),
future = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
forecast = c(4.14, 4.04, 4.97, 5.12),
realized = NULL,
h_ahead = NULL
)
x2_forecast <- Forecast(
origin = as.Date(c("2012-06-30", "2012-06-30", "2012-06-30", "2012-06-30")),
future = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
forecast = c(6.01, 6.05, 6.55, 7.45),
realized = NULL,
h_ahead = NULL
)
conditional_forc(
lm_call = lm(y ~ x1 + x2, data),
time_vec = data$date,
x1_forecast, x2_forecast
)
#> h_ahead =
#>
#> origin future forecast realized
#> 1 2012-06-30 2012-09-30 1.368054 NA
#> 2 2012-06-30 2012-12-31 1.297686 NA
#> 3 2012-06-30 2013-03-31 1.945655 NA
#> 4 2012-06-30 2013-06-30 2.044105 NA
This function is used to create a forecast for the present period or
replicate a forecast made at a specific period in the past. Note that
because we are forecasting into the future, realized
is
NULL
. Also, because we are forecasting a horizon of dates,
h_ahead
is NULL
.
The conditional_forc_general()
function produces a
forecast conditioned on forecasts of the model parameters. The function
takes a model function, prediction function, input data for estimating
the model, a vector of time data associated with the model, and
forecasts for each parameter in the model. The model is estimated once
over the entire sample period and the model parameters are combined with
the forecasts of each parameter to generate a forecast.
For all observations i in the sample, coefficients are estimated as:
Yi = model_function(Xi) for all i
And for all periods p in the covariate forecasts F1 and F2, forecasts are estimated as:
forecastp + h = prediction_function(model_function, [VF1p, VF2p])
The difference between conditional_forc_general()
and
oos_vintage_forc_general()
is that in the
conditional_forc_general()
function coefficients are only
estimated once over all observations. Coefficients do not update based
on what information would have been available to the forecaster at any
given point in time. We introduce stylized forecasts of X1 and X2 to
demonstrate the conditional_forc_general()
function.
Because in this example we are making a conditional forecast for the
future instead of testing past forecasts, we can condition on a horizon
of forecasts. This is in contrast to the
oos_vintage_forc_general()
example where we test the
performance of four quarter ahead vintage forecasts.
# Parameter Forecasts.
x1_forecastLogit <- Forecast(
origin = as.Date(c("2013-12-31", "2013-12-31", "2013-12-31", "2013-12-31")),
future = as.Date(c("2014-03-31", "2014-06-30", "2014-09-30", "2014-12-31")),
forecast = c(2.11, 6.11, 6.75, 4.30),
realized = NULL,
h_ahead = NULL
)
x2_forecastLogit <- Forecast(
origin = as.Date(c("2013-12-31", "2013-12-31", "2013-12-31", "2013-12-31")),
future = as.Date(c("2014-03-31", "2014-06-30", "2014-09-30", "2014-12-31")),
forecast = c(1.98, 7.44, 7.86, 5.98),
realized = NULL,
h_ahead = NULL
)
conditional_forc_general(
model_function = function(data) {glm(y ~ x1 + x2, data = data, family = binomial)},
prediction_function = function(model_function, data) {
names(data) <- c("x1", "x2")
as.vector(predict(model_function, data, type = "response"))
},
data = dataLogit,
time_vec = dataLogit$date,
x1_forecastLogit, x2_forecastLogit
)
#>
#> Note* the data argument passed to prediction_function has the following form:
#> X1 X2
#> 1 2.11 1.98
#> 2 6.11 7.44
#> 3 6.75 7.86
#> 4 4.30 5.98
#>
#> h_ahead =
#>
#> origin future forecast realized
#> 1 2013-12-31 2014-03-31 0.02637805 NA
#> 2 2013-12-31 2014-06-30 0.98668135 NA
#> 3 2013-12-31 2014-09-30 0.99508344 NA
#> 4 2013-12-31 2014-12-31 0.78686161 NA
This function is used to create a forecast for the present period or
replicate a forecast made at a specific period in the past. Note that
because we are forecasting into the future, realized
is
NULL
. Also, because we are forecasting a horizon of dates,
h_ahead
is NULL
.
The oos_lag_forc()
function produces an h
period ahead out-of-sample forecast conditioned on present period
values. The function takes a linear model call, an integer number of
periods ahead to forecast, a period to end the initial coefficient
estimation and begin forecasting, an optional vector of time data
associated with the linear model, and an optional integer number of past
periods to estimate the linear model over. The linear model data is
lagged by h_ahead
periods and the linear model is
re-estimated with data up to estimation_end
minus the
number of periods specified in the estimation_window
argument to create a lagged linear model. If
estimation_window
is left NULL
then the linear
model is estimated with all available lagged data up to
estimation_end
. Coefficients are multiplied by present
period realized values of the covariates to create a forecast for
h_ahead
periods ahead. This process is iteratively repeated
for each period after estimation_end
with coefficients
updating in each period as more information would have become available
to the forecaster. In each period, coefficients are updated based on all
available information if estimation_window
is set to
NULL
, or a rolling window of past periods if
estimation_window
is set to an integer value.
In the sample i, for each period p greater than or
equal to estimation_end
, coefficients are updated as:
Yi = β0 + β1X1i − h + β2X2i − h + ϵi for
all p − w ≤ i ≤ p Where w is
the estimation_window
and h is
h_ahead
. h_ahead
forecasts are estimated
as:
forecastp + h = β0 + β1X1p + β2X2p
oos_lag_forc(
lm_call = lm(y ~ x1 + x2, data),
h_ahead = 2L,
estimation_end = as.Date("2011-03-31"),
time_vec = data$date,
estimation_window = NULL,
return_betas = FALSE
)
#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2011-03-31 2011-09-30 -2.100528 2.89
#> 2 2011-06-30 2011-12-31 2.174392 2.11
#> 3 2011-09-30 2012-03-31 2.813745 2.97
#> 4 2011-12-31 2012-06-30 1.807014 0.99
This test evaluates the performance of a lagged linear model had it been conditioned on present values that would have been available to the forecaster at the forecast origin. This is in contrast to conditioning on realized values or a forecast of the covariates.
The historical_average_forc()
function produces an
h period ahead forecast based on the historical average of the
series that is being forecasted. The function takes an average function,
a vector of realized values, an integer number of periods ahead to
forecast, a period to end the initial average estimation and begin
forecasting, an optional vector of time data associated with the
realized values, and an optional integer number of past periods to
estimate the average over. The historical average is originally
calculated with realized values up to estimation_end
minus
the number of periods specified in estimation_window
. If
estimation_window
is left NULL
then the
historical average is calculated with all available realized values up
to estimation_end
. In each period the historical average is
set as the h_ahead
period ahead forecast. This process is
iteratively repeated for each period after estimation_end
with the historical average updating in each period as more information
would have become available to the forecaster.
If avg_function
is set to mean
, in the
sample i, for each period p greater than or equal to
estimation_end
, h_ahead
h forecasts
are calculated as:
$$ forecast_{p+h} = \frac{1}{p-w} \sum_{i=p-w}^{p}{Y_i} \qquad $$
Where Y is the series being forecasted and w is the
estimation_window
.
historical_average_forc(
avg_function = "mean",
realized_vec = data$y,
h_ahead = 2L,
estimation_end = as.Date("2011-03-31"),
time_vec = data$date,
estimation_window = 4L
)
#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2011-03-31 2011-09-30 1.626 2.89
#> 2 2011-06-30 2011-12-31 1.678 2.11
#> 3 2011-09-30 2012-03-31 1.914 2.97
#> 4 2011-12-31 2012-06-30 2.118 0.99
historical_average_forc()
returns a historical average
forecast where the h_ahead
period ahead forecast is simply
the historical average or rolling window average of the series being
forecasted. This replicates the historical average forecast that would
have been produced in real-time and can serve as a benchmark for other
forecasting models.
The random_walk_forc()
function produces an h
period ahead forecast based on the last realized value in the series
that is being forecasted. The function takes a vector of realized
values, an integer number of periods ahead to forecast, and an optional
vector of time data associated with the realized values. In each period,
the current period value of the realized_vec
series is set
as the h_ahead
period ahead forecast.
In the sample i, for each period p,
h_ahead
h forecasts are calculated as:
forecastp + h = Yp
Where Y is the series being forecasted.
random_walk_forc(
realized_vec = data$y,
h_ahead = 6L,
time_vec = data$date
)
#> h_ahead = 6
#>
#> origin future forecast realized
#> 1 2010-03-31 2011-09-30 1.09 2.89
#> 2 2010-06-30 2011-12-31 1.71 2.11
#> 3 2010-09-30 2012-03-31 1.09 2.97
#> 4 2010-12-31 2012-06-30 2.46 0.99
random_walk_forc()
returns a random walk forecast where
the h_ahead
period ahead forecast is simply the present
value of the series being forecasted. This replicates the random walk
forecast that a forecaster would have produced in real-time and can
serve as a benchmark for other forecasting models.
The autoreg_forc()
function produces an h
period ahead forecast based on an autoregressive (AR) model. The
function takes a vector of realized values, an integer number of periods
ahead to forecast, an integer number of lags to include in the
autoregressive model, a period to end the initial model estimation and
begin forecasting, an optional vector of time data associated with the
realized values, and an optional integer number of past periods to
estimate the autoregressive model over. An AR(ar_lags
)
autoregressive model is originally estimated with realized values up to
estimation_end
minus the number of periods specified in
estimation_window
. If estimation_window
is
left NULL
then the autoregressive model is estimated with
all realized values up to estimation_end
. The
AR(ar_lags
) model is estimated by regressing the vector of
realized values on vectors of the same realized values that have been
lagged by one to ar_lags
steps. The AR coefficients of this
model are multiplied by lagged values and the present period realized
value to create a one period ahead forecast. If h_ahead
is
greater than one, the one period ahead forecasting process is
iteratively repeated so that the two period ahead forecast conditions on
the one period ahead forecasted value. This process of rolling one
period ahead forecasts forward continues until an h_ahead
forecast is obtained. The h_ahead
forecasting process is
repeated for each period after estimation_end
with AR model
coefficients updating as more information would have become available to
the forecaster. In each period, coefficients are updated based on all
available realized values if estimation_window
is set to
NULL
, or a rolling window of past periods if
estimation_window
is set to an integer value.
In the sample i with ar_lags
set to two and
h_ahead
set to two. For each period p greater than
or equal to estimation_end
, coefficients are calculated
as:
Yi = β0 + β1Yi − 1 + β2Yi − 2 for
all p − w ≤ i ≤ p Where Y is
the series being forecasted and w is the
estimation_window
. h_ahead
two step ahead
forecasts are estimated as:
$$ Y_{p+1} = \beta_0 + \beta_1 Y_p + \beta_2 Y_{p-1} \\ forecast_{p+h} = Y_{p+2} = \beta_0 + \beta_1 Y_{p+1} + \beta_2 Y_{p} $$
autoreg_forc(
realized_vec = data$y,
h_ahead = 2L,
ar_lags = 2L,
estimation_end = as.Date("2011-06-30"),
time_vec = data$date,
estimation_window = NULL,
return_betas = FALSE
)
#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2011-06-30 2011-12-31 1.649380 2.11
#> 2 2011-09-30 2012-03-31 2.376138 2.97
#> 3 2011-12-31 2012-06-30 1.944882 0.99
autoreg_forc()
returns an autoregressive forecast based
on information that would have been available at the forecast origin.
This function replicates the forecasts than an AR model would have
produced in real-time and can serve as a benchmark for other forecasting
models.
The performance_weighted_forc()
function produces a
weighted average of multiple forecasts based on the recent performance
of each forecast. The function takes two or more forecasts of the
Forecast
class, an evaluation window, and an error
function. For each forecast period, the error function is used to
calculate forecast accuracy over the past eval_window
number of periods. The forecast accuracy of each forecast is then used
to weight forecasts based on a weighting function. In each period,
weights are calculated and used to create a weighted average
forecast.
We use a stylized example in which we create a weighted forecast of two forecasts: Y1 and Y2.
For all periods p in the k number of forecasts
Y, weights W are calculated over the
eval_window
e as:
$$ W_k = \frac{1/MSE(Y_{ki})}{1/\sum_{k=1}^{k}MSE(Y_{ki})} \qquad \text{where} \quad i = p-e \leq i \leq p $$
Forecasts are estimated as:
forecastp = Y1pW1 + Y2pW2
y1_forecast <- Forecast(
origin = as.Date(c("2009-03-31", "2009-06-30", "2009-09-30", "2009-12-31",
"2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30")),
future = as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30")),
forecast = c(1.33, 1.36, 1.38, 1.68, 1.60, 1.55, 1.32, 1.22, 1.08, 0.88),
realized = c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99),
h_ahead = 4L
)
y2_forecast <- Forecast(
origin = as.Date(c("2009-03-31", "2009-06-30", "2009-09-30", "2009-12-31",
"2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30")),
future = as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30")),
forecast = c(0.70, 0.88, 1.03, 1.05, 1.01, 0.82, 0.95, 1.09, 1.07, 1.06),
realized = c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99),
h_ahead = 4L
)
performance_weighted_forc(
y1_forecast, y2_forecast,
eval_window = 2L,
errors = "mse",
return_weights = FALSE
)
#> h_ahead = 4
#>
#> origin future forecast realized
#> 1 2009-03-31 2010-03-31 NA 1.09
#> 2 2009-06-30 2010-06-30 NA 1.71
#> 3 2009-09-30 2010-09-30 NA 1.09
#> 4 2009-12-31 2010-12-31 NA 2.46
#> 5 2010-03-31 2011-03-31 NA 1.78
#> 6 2010-06-30 2011-06-30 1.421244 1.35
#> 7 2010-09-30 2011-09-30 1.234979 2.89
#> 8 2010-12-31 2011-12-31 1.186461 2.11
#> 9 2011-03-31 2012-03-31 1.078011 2.97
#> 10 2011-06-30 2012-06-30 0.893773 0.99
The performance_weighted_forc()
function returns a
weighted forecast of the Y1 and Y2 forecasts based on performance in
recent periods. The weights used in each period can be returned to the
Global Environment by setting return_weights
to
TRUE
. Note that although we were only weighting performance
over the past two periods, we have five NA
forecasts. This
reflects the lmForc philosophy of replicating what it would be
like to forecast in real-time. If a forecaster was making a forecast at
2010-06-30
, they would only have access to realized values
up to 2010-06-30
, in this case the first two rows. This is
why a weighted forecast with an eval_window
of two can only
be computed once the forecast origin becomes 2010-06-30
and
the forecaster has access to two realized values. This function can be
used to compute a weighted forecast for the present period or to test
how a weighted forecast would have performed historically. The weighted
forecasts are based on information that would have been
available to the forecaster at the forecast origin.
The states_weighted_forc()
function produces a weighted
average of multiple forecasts based on how each forecast performed
during the past state of the world that is most similar to the current
state of the world. The function takes two or more forecasts, a data
frame, matrix, or array of matching variables, an optional vector of
time data associated with the matching variables, a matching window
size, a matching function, and an error function. The first step of the
weighted forecast process is to match the current state of the world to
a similar past state of the world. For each forecast period, the
matching_vars
are standardized and the past
matching_window
periods of the matching variables are
considered as the current state of the world. This current state of the
world is compared to all past matching_window
size periods
of the matching variables. The current state is matched to the past
state that minimizes the user selected matching function. For example,
if matching
is set to euclidean
then the
matched past state is the past state which has the minimum euclidean
distance to the current state of the world. The objective is to select a
past period that is similar to the current state of the world as given
by the matching variables. Once a past state has been matched, the
accuracy of each forecast is calculated over the periods of the past
state according to the user selected error function. Forecast weights
are then computed based on forecast accuracy during the past state. The
objective is to give more weight to the forecasts that perform better in
conditions that reflect the current state of the world. The forecast
weights are then used to create a weighted forecast for the current
period.
We use a stylized example with one matching variable x as well as two forecasts Y1 and Y2.
The matching variable x is first standardized using the function:
$$ x_i = \frac{x_i - mean(x)}{sd(x)} $$
For all periods p, the current state of the world c and past states of the world p are calculated as:
$$ c_i = x_i \qquad \text{where} \quad p-w \leq i \leq p \\ \qquad \qquad \qquad \qquad p_{id} = x_i \qquad \text{where} \quad d-w \leq i \leq d \quad \text{for all} \quad d \lt p-w $$
Where w is the matching_window
size and
d are all periods that occur before the beginning of the
current state.
All possible past states are passed through the matching function and
the matched past state is selected as the past state that minimizes the
matching function. If matching
is set to
euclidean
, the matched past state p is the past
state that satisfies the following:
$$ p = \min \sqrt{\sum_{i=1}^{n}(c_i - p_{id})^2} \qquad \text{for all past states } d $$
Forecast accuracy and forecast weights are computed over observations
from the matched past state p. If errors
is set to
mse
then the forecast weights W for each forecast
k are calculated as.
$$ W_k = \frac{1/MSE(Y_{kp})}{1/\sum_{k=1}^{k}MSE(Y_{kp})} $$
The current period forecast is then calculated as:
forecastp = Y1pW1 + Y2pW2
date <- as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
"2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30"))
future <- as.Date(c("2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
"2012-03-31", "2012-06-30", "2012-09-30", "2012-12-31",
"2013-03-31", "2013-06-30"))
y <- c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99)
x1 <- c(4.22, 3.86, 4.27, 5.60, 5.11, 4.31, 4.92, 5.80, 6.30, 4.17)
x2 <- c(10.03, 10.49, 10.85, 10.47, 9.09, 10.91, 8.68, 9.91, 7.87, 6.63)
data <- data.frame(date, y, x1, x2)
matching_vars <- data[, c("x1", "x2")]
y1_forecast <- Forecast(
origin = date,
future = future,
forecast = c(1.33, 1.36, 1.38, 1.68, 1.60, 1.55, 1.32, 1.22, 1.08, 0.88),
realized = c(1.78, 1.35, 2.89, 2.11, 2.97, 0.99, 1.31, 1.41, 1.02, 1.05),
h_ahead = 4L
)
y2_forecast <- Forecast(
origin = date,
future = future,
forecast = c(0.70, 0.88, 1.03, 1.05, 1.01, 0.82, 0.95, 1.09, 1.07, 1.06),
realized = c(1.78, 1.35, 2.89, 2.11, 2.97, 0.99, 1.31, 1.41, 1.02, 1.05),
h_ahead = 4L
)
states_weighted_forc(
y1_forecast, y2_forecast,
matching_vars = matching_vars,
time_vec = data$date,
matching_window = 2L,
matching = "euclidean",
errors = "mse",
return_weights = FALSE
)
#> h_ahead = 4
#>
#> origin future forecast realized
#> 1 2010-03-31 2011-03-31 NA 1.78
#> 2 2010-06-30 2011-06-30 NA 1.35
#> 3 2010-09-30 2011-09-30 NA 2.89
#> 4 2010-12-31 2011-12-31 NA 2.11
#> 5 2011-03-31 2012-03-31 NA 2.97
#> 6 2011-06-30 2012-06-30 1.456977 0.99
#> 7 2011-09-30 2012-09-30 1.211438 1.31
#> 8 2011-12-31 2012-12-31 1.174534 1.41
#> 9 2012-03-31 2013-03-31 1.077066 1.02
#> 10 2012-06-30 2013-06-30 0.932814 1.05
The states_weighted_forc()
function returns a weighted
forecast of the Y1 and Y2 forecasts based on how these forecasts
performed in past states of the world that resemble the current state of
the world. The weights used in each period and a list of the matched
states can be returned to the Global Environment by setting
return_weights
to TRUE
. This function can be
used to compute a states weighted forecast for the present period or to
test how a states weighted forecast would have performed historically.
The states weighted forecasts are based on information that
would have been available to the forecaster at the
forecast origin.
The Forecast class comes with a built in method for subsetting a single Forecast object. This subsetting method takes numeric or logical values and follows subsetting conventions that are present throughout the R language.
forc[2:4]
forc[origin(forc1_1h) >= as.Date("2010-12-31")]
However, one often ends up working with multiple Forecast objects. Examples include working with different model forecasts for the same forecast horizon, one model forecast for varying forecast horizons, or both. The lmForc convention for working with multiple Forecast objects is to put them into a list. The following functions provide a way to subset lists of Forecast objects by various conditions.
Examples of lmForc subsetting functions utilize the following
stylized dataset. This example dataset contains one-quarter ahead
forecasts produced by two different models, forc1
and
forc2
. Note that both forecasts have identical
future
, realized
, and h_ahead
values, but that the origin
dates of the last two forecasts
differ. This becomes relevant when both forecast models are subset to
identical origin
values.
forc1_1h <- Forecast(
origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-05", "2011-03-10")),
future = as.Date(c("2010-06-30", "2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
forecast = c(4.27, 3.36, 4.78, 5.45, 5.12),
realized = c(4.96, 4.17, 4.26, 4.99, 5.38),
h_ahead = 1
)
forc2_1h <- Forecast(
origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-22", "2011-03-27")),
future = as.Date(c("2010-06-30", "2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
forecast = c(4.01, 3.89, 3.31, 4.33, 4.61),
realized = c(4.96, 4.17, 4.26, 4.99, 5.38),
h_ahead = 1
)
The simplest way to subset multiple Forecast objects is via the
subset_forcs()
function. This function takes a list of
Forecast objects and a numeric or logical index. All forecasts in the
list are subset by the numerical or logical values that are passed to
the index
argument.
For example, a list of Forecast objects can be subset by a
condition:
subset_forcs(forcs, origin(forc1_1h) >= as.Date("2010-12-31"))
forcs <- list(forc1_1h, forc2_1h)
subset_forcs(forcs, 2:3)
#> [[1]]
#> h_ahead = 1
#>
#> origin future forecast realized
#> 1 2010-05-14 2010-09-30 3.36 4.17
#> 2 2010-07-22 2010-12-31 4.78 4.26
#>
#> [[2]]
#> h_ahead = 1
#>
#> origin future forecast realized
#> 1 2010-05-14 2010-09-30 3.89 4.17
#> 2 2010-07-22 2010-12-31 3.31 4.26
One may want to compare forecasts over a specific time horizon. The
subset_bytime()
function allows the user to subset multiple
forecasts based on origin
or future
values.
After using the slot
argument to choose whether to subset
by origin
or future
values, the user passes a
single time object or vector of time objects to the values
argument. All forecasts in the list of Forecast objects are subset by
values
.
For example, to see all of the forecasts that were made on a specific
date:
subset_bytime(forcs, values = as.Date("2010-05-14"), slot = "origin")
forcs <- list(forc1_1h, forc2_1h)
subset_bytime(
forcs,
values = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31")),
slot = "future"
)
#> [[1]]
#> h_ahead = 1
#>
#> origin future forecast realized
#> 1 2010-05-14 2010-09-30 3.36 4.17
#> 2 2010-07-22 2010-12-31 4.78 4.26
#> 3 2010-12-05 2011-03-31 5.45 4.99
#>
#> [[2]]
#> h_ahead = 1
#>
#> origin future forecast realized
#> 1 2010-05-14 2010-09-30 3.89 4.17
#> 2 2010-07-22 2010-12-31 3.31 4.26
#> 3 2010-12-22 2011-03-31 4.33 4.99
When comparing multiple forecasts, it is imperative that forecast
accuracy is compared across an identical time period. This can become
complicated if the origin
and future
values of
multiple forecasts do not always align. The
subset_identical()
function finds all overlapping
origin
or future
values in a list of Forecast
objects and subsets each of the forecasts to these overlapping values.
This leaves the user with a list of Forecasts that have either identical
origin
values or identical future
values
depending on what the user passes to the slot
argument.
forcs <- list(forc1_1h, forc2_1h)
subset_identical(forcs, slot = "origin")
#> [[1]]
#> h_ahead = 1
#>
#> origin future forecast realized
#> 1 2010-02-17 2010-06-30 4.27 4.96
#> 2 2010-05-14 2010-09-30 3.36 4.17
#> 3 2010-07-22 2010-12-31 4.78 4.26
#>
#> [[2]]
#> h_ahead = 1
#>
#> origin future forecast realized
#> 1 2010-02-17 2010-06-30 4.01 4.96
#> 2 2010-05-14 2010-09-30 3.89 4.17
#> 3 2010-07-22 2010-12-31 3.31 4.26
Forecasts are often produced for multiple h_ahead
horizons into the future. For example, a model may produce a 1-quarter
ahead, 2-quarter ahead, 3-quarter ahead, and 4-quarter ahead forecast
during each quarter of the year. In this example, multiple Forecast
objects are needed to capture the forecast made during each quarter. As
per lmForc convention, one would work with these forecasts by putting
them into a list. When working with list of forecasts for multiple
h_ahead
horizons into the future, there are two general
formats in which the forecasts can be organized. These two formats are:
Time Format and h_ahead Format.
Time Format consists of a list of Forecast objects
where each forecast has homogenous origin
or
future
values. Each Forecast object in the list was made at
the same time or contains forecasts for the same future time. However,
the h_ahead
forecast horizon differs within each Forecast
object. Time Format is used to represent forecasts made
at a single origin
time for multiple h_ahead
horizons.
The following is an example of forecasts in Time
Format. Each Forecast object represents a set of 1-quarter,
2-quarter, and 3-quarter ahead forecasts made at a single
origin
time during each quarter of 2010. Note that because
each Forecast object contains forecasts for multiple
h_ahead
horizons, h_ahead
is set to
NA
. We place all of these forecasts into a list of Forecast
objects that is in Time Format and assign it to
forcs_time_format
.
forc1_t1 <- Forecast(
origin = as.Date(c("2010-02-17", "2010-02-17", "2010-02-17")),
future = as.Date(c("2010-06-30", "2010-09-30", "2010-12-31")),
forecast = c(4.27, 3.77, 3.52),
realized = c(4.96, 4.17, 4.26),
h_ahead = NA
)
forc1_t2 <- Forecast(
origin = as.Date(c("2010-05-14", "2010-05-14", "2010-05-14")),
future = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31")),
forecast = c(3.36, 3.82, 4.22),
realized = c(4.17, 4.26, 4.99),
h_ahead = NA
)
forc1_t3 <- Forecast(
origin = as.Date(c("2010-07-22", "2010-07-22", "2010-07-22")),
future = as.Date(c("2010-12-31", "2011-03-31", "2011-06-30")),
forecast = c(4.78, 4.53, 5.03),
realized = c(4.26, 4.99, 5.33),
h_ahead = NA
)
forc1_t4 <- Forecast(
origin = as.Date(c("2010-12-22", "2010-12-22", "2010-12-22")),
future = as.Date(c("2011-03-31", "2011-06-30", "2011-09-30")),
forecast = c(5.45, 4.89, 5.78),
realized = c(4.99, 5.33, 5.21),
h_ahead = NA
)
forcs_time_format <- list(forc1_t1, forc1_t2, forc1_t3, forc1_t4)
h_ahead Format consists of a list of Forecast
objects where each forecast has homogenous h_ahead
values
but the origin
and future
values vary. The
h_ahead Format is useful for analyzing forecasts at a
specific h_ahead
horizon. For example, one may want to
calculate the forecast accuracy of 4-quarter ahead forecasts only. In
this case it is useful to have multiple forecasts arranged by homogenous
h_ahead
values.
The following is an example of forecasts in h_ahead
Format. Each Forecast object represents all of the 1-quarter,
2-quarter, and 3-quarter ahead forecasts made during different quarters
of 2010. Note that because each Forecast object has a homogenous
h_ahead
horizon we can now set h_ahead
to the
appropriate value. These forecasts are collected into a list of Forecast
objects that is in h_ahead Format and assigned to
forcs_h_ahead_format
.
forc1_1h <- Forecast(
origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-22")),
future = as.Date(c("2010-06-30", "2010-09-30", "2010-12-31", "2011-03-31")),
forecast = c(4.27, 3.36, 4.78, 5.45),
realized = c(4.96, 4.17, 4.26, 4.99),
h_ahead = 1
)
forc1_2h <- Forecast(
origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-22")),
future = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
forecast = c(3.77, 3.82, 4.53, 4.89),
realized = c(4.17, 4.26, 4.99, 5.33),
h_ahead = 2
)
forc1_3h <- Forecast(
origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-22")),
future = as.Date(c("2010-12-31", "2011-03-31", "2011-06-30", "2011-09-30")),
forecast = c(3.52, 4.22, 5.03, 5.78),
realized = c(4.26, 4.99, 5.33, 5.21),
h_ahead = 3
)
forcs_h_ahead_format <- list(forc1_1h, forc1_2h, forc1_3h)
Given a list of forecasts in h_ahead Format, one may
want to convert one or multiple of these forecasts into Time
Format. The function convert_bytime()
takes a list
of Forecast objects in h_ahead Format and converts the
forecasts made on the time specified in the value
and
slot
arguments into Forecast objects that are in
Time Format. Note that because we are converting to
Time Format, the h_ahead
value in each
Forecast object is changed to NA
.
convert_bytime(
forcs_h_ahead_format,
value = as.Date(c("2010-07-22", "2010-12-22")),
slot = "origin"
)
#> [[1]]
#> h_ahead = NA
#>
#> origin future forecast realized
#> 1 2010-07-22 2010-12-31 4.78 4.26
#> 2 2010-07-22 2011-03-31 4.53 4.99
#> 3 2010-07-22 2011-06-30 5.03 5.33
#>
#> [[2]]
#> h_ahead = NA
#>
#> origin future forecast realized
#> 1 2010-12-22 2011-03-31 5.45 4.99
#> 2 2010-12-22 2011-06-30 4.89 5.33
#> 3 2010-12-22 2011-09-30 5.78 5.21
Given a list of forecasts in h_ahead Format one can
convert all of the forecasts to Time Format using the
transform_bytime()
function. This function transforms all
Forecast objects in forcs
to a list of Time
Format Forecast objects that have homogenous
origin
or future
values depending on what the
user specifies in the slot
argument. The difference between
transform_bytime()
and convert_bytime()
is
that transforming automatically converts all forecasts in the list while
converting only converts the forecasts specified by the user.
transform_bytime(forcs_h_ahead_format, slot = "origin")
#> [[1]]
#> h_ahead = NA
#>
#> origin future forecast realized
#> 1 2010-02-17 2010-06-30 4.27 4.96
#> 2 2010-02-17 2010-09-30 3.77 4.17
#> 3 2010-02-17 2010-12-31 3.52 4.26
#>
#> [[2]]
#> h_ahead = NA
#>
#> origin future forecast realized
#> 1 2010-05-14 2010-09-30 3.36 4.17
#> 2 2010-05-14 2010-12-31 3.82 4.26
#> 3 2010-05-14 2011-03-31 4.22 4.99
#>
#> [[3]]
#> h_ahead = NA
#>
#> origin future forecast realized
#> 1 2010-07-22 2010-12-31 4.78 4.26
#> 2 2010-07-22 2011-03-31 4.53 4.99
#> 3 2010-07-22 2011-06-30 5.03 5.33
#>
#> [[4]]
#> h_ahead = NA
#>
#> origin future forecast realized
#> 1 2010-12-22 2011-03-31 5.45 4.99
#> 2 2010-12-22 2011-06-30 4.89 5.33
#> 3 2010-12-22 2011-09-30 5.78 5.21
Note that the output of transform_bytime()
above is
identical to the list of Forecast objects in
forcs_time_format
. One can continually transform between
Time Format and h_ahead Format without
losing information. This is evidenced by the fact that:
transform_bytime(forcs_h_ahead_format, slot = "origin") == forcs_time_format
and
transform_byh(forcs_time_format, h_aheads = c(1, 2, 3)) == forcs_h_ahead_format
The inverse of convert_bytime()
is
convert_byh()
. Given a list of forecasts in Time
Format convert_byh()
converts one or multiple of
these forecasts into h_ahead Format. The functions
takes a list of Forecast objects in Time Format and
converts the forecasts specified by the index
argument into
Forecast objects in h_ahead Format. Because forecasts
that are in Time Format do not have
h_ahead
values, the function allows the user to assign
h_ahead
values to the converted Forecast objects via the
h_aheads
argument.
convert_byh(forcs_time_format, index = 1:2, h_aheads = c(1, 2))
#> [[1]]
#> h_ahead = 1
#>
#> origin future forecast realized
#> 1 2010-02-17 2010-06-30 4.27 4.96
#> 2 2010-05-14 2010-09-30 3.36 4.17
#> 3 2010-07-22 2010-12-31 4.78 4.26
#> 4 2010-12-22 2011-03-31 5.45 4.99
#>
#> [[2]]
#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2010-02-17 2010-09-30 3.77 4.17
#> 2 2010-05-14 2010-12-31 3.82 4.26
#> 3 2010-07-22 2011-03-31 4.53 4.99
#> 4 2010-12-22 2011-06-30 4.89 5.33
Given a list of forecasts in Time Format one can
convert all of the forecasts to h_ahead Format using
the transform_byh()
function. This function transforms all
Forecast objects in forcs
to a list of h_ahead
Format Forecast objects that have homogenous
h_ahead
values. h_ahead
values are assigned to
each converted Forecast object based on the values passed to the
h_aheads
argument. The difference between
transform_byh()
and convert_byh()
is that
transforming automatically converts all forecasts in the list while
converting only converts the forecasts specified by the user.
transform_byh(forcs_time_format, h_aheads = c(1, 2, 3))
#> [[1]]
#> h_ahead = 1
#>
#> origin future forecast realized
#> 1 2010-02-17 2010-06-30 4.27 4.96
#> 2 2010-05-14 2010-09-30 3.36 4.17
#> 3 2010-07-22 2010-12-31 4.78 4.26
#> 4 2010-12-22 2011-03-31 5.45 4.99
#>
#> [[2]]
#> h_ahead = 2
#>
#> origin future forecast realized
#> 1 2010-02-17 2010-09-30 3.77 4.17
#> 2 2010-05-14 2010-12-31 3.82 4.26
#> 3 2010-07-22 2011-03-31 4.53 4.99
#> 4 2010-12-22 2011-06-30 4.89 5.33
#>
#> [[3]]
#> h_ahead = 3
#>
#> origin future forecast realized
#> 1 2010-02-17 2010-12-31 3.52 4.26
#> 2 2010-05-14 2011-03-31 4.22 4.99
#> 3 2010-07-22 2011-06-30 5.03 5.33
#> 4 2010-12-22 2011-09-30 5.78 5.21
Note that the output of transform_byh()
above is
identical to the list of Forecast objects in
forcs_h_ahead_format
. One can continually transform between
Time Format and h_ahead Format without
losing information. This is evidenced by the fact that:
transform_byh(forcs_time_format, h_aheads = c(1, 2, 3)) == forcs_h_ahead_format
and
transform_bytime(forcs_h_ahead_format, slot = "origin") == forcs_time_format