Title: | Bayesian Structural Time Series |
---|---|
Description: | Time series regression using dynamic linear models fit using MCMC. See Scott and Varian (2014) <DOI:10.1504/IJMMNO.2014.059942>, among many other sources. |
Authors: | Steven L. Scott <[email protected]> |
Maintainer: | Steven L. Scott <[email protected]> |
License: | LGPL-2.1 | MIT + file LICENSE |
Version: | 0.9.10 |
Built: | 2024-11-02 05:12:54 UTC |
Source: | https://github.com/cran/bsts |
Time series regression using dynamic linear models fit using MCMC. See Scott and Varian (2014) <DOI:10.1504/IJMMNO.2014.059942>, among many other sources.
If you are installing bsts
using install.packages
on a Linux machine (and thus
compiling yourself) you will almost certainly want to set the
Ncpus
argument to a large number. Windows and Mac users can
ignore this advice.
Author: Steven L. Scott <[email protected]> Maintainer: Steven L. Scott <[email protected]>
Please see the references in the help page for the bsts
function.
See the examples in the bsts
function.
Add an AR(p) state component to the state specification.
AddAr(state.specification, y, lags = 1, sigma.prior, initial.state.prior = NULL, sdy)
AddAr(state.specification, y, lags = 1, sigma.prior, initial.state.prior = NULL, sdy)
state.specification |
A list of state components. If omitted, an empty list is assumed. |
y |
A numeric vector. The time series to be modeled. |
lags |
The number of lags ("p") in the AR(p) process. |
sigma.prior |
An object created by SdPrior. The prior for the standard deviation of the process increments. |
initial.state.prior |
An object of class MvnPrior describing the
values of the state at time 0. This argument can be |
sdy |
The sample standard deviation of the time series to be
modeled. Used to scale the prior distribution. This can be omitted
if |
The model is
The state consists of the last p
lags of alpha
. The
state transition matrix has phi
in its first row, ones along
its first subdiagonal, and zeros elsewhere. The state variance matrix
has sigma^2
in its upper left corner and is zero elsewhere.
The observation matrix has 1 in its first element and is zero
otherwise.
Returns state.specification
with an AR(p) state component
added to the end.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
n <- 100 residual.sd <- .001 # Actual values of the AR coefficients true.phi <- c(-.7, .3, .15) ar <- arima.sim(model = list(ar = true.phi), n = n, sd = 3) ## Layer some noise on top of the AR process. y <- ar + rnorm(n, 0, residual.sd) ss <- AddAr(list(), lags = 3, sigma.prior = SdPrior(3.0, 1.0)) # Fit the model with knowledge with residual.sd essentially fixed at the # true value. model <- bsts(y, state.specification=ss, niter = 500, prior = SdPrior(residual.sd, 100000)) # Now compare the empirical ACF to the true ACF. acf(y, lag.max = 30) points(0:30, ARMAacf(ar = true.phi, lag.max = 30), pch = "+") points(0:30, ARMAacf(ar = colMeans(model$AR3.coefficients), lag.max = 30)) legend("topright", leg = c("empirical", "truth", "MCMC"), pch = c(NA, "+", "o"))
n <- 100 residual.sd <- .001 # Actual values of the AR coefficients true.phi <- c(-.7, .3, .15) ar <- arima.sim(model = list(ar = true.phi), n = n, sd = 3) ## Layer some noise on top of the AR process. y <- ar + rnorm(n, 0, residual.sd) ss <- AddAr(list(), lags = 3, sigma.prior = SdPrior(3.0, 1.0)) # Fit the model with knowledge with residual.sd essentially fixed at the # true value. model <- bsts(y, state.specification=ss, niter = 500, prior = SdPrior(residual.sd, 100000)) # Now compare the empirical ACF to the true ACF. acf(y, lag.max = 30) points(0:30, ARMAacf(ar = true.phi, lag.max = 30), pch = "+") points(0:30, ARMAacf(ar = colMeans(model$AR3.coefficients), lag.max = 30)) legend("topright", leg = c("empirical", "truth", "MCMC"), pch = c(NA, "+", "o"))
Add a dynamic regression component to the state specification of a bsts model. A dynamic regression is a regression model where the coefficients change over time according to a random walk.
AddDynamicRegression( state.specification, formula, data, model.options = NULL, sigma.mean.prior.DEPRECATED = NULL, shrinkage.parameter.prior.DEPRECATED = GammaPrior(a = 10, b = 1), sigma.max.DEPRECATED = NULL, contrasts = NULL, na.action = na.pass) DynamicRegressionRandomWalkOptions( sigma.prior = NULL, sdy = NULL, sdx = NULL) DynamicRegressionHierarchicalRandomWalkOptions( sdy = NULL, sigma.mean.prior = NULL, shrinkage.parameter.prior = GammaPrior(a = 10, b = 1), sigma.max = NULL) DynamicRegressionArOptions(lags = 1, sigma.prior = SdPrior(1, 1))
AddDynamicRegression( state.specification, formula, data, model.options = NULL, sigma.mean.prior.DEPRECATED = NULL, shrinkage.parameter.prior.DEPRECATED = GammaPrior(a = 10, b = 1), sigma.max.DEPRECATED = NULL, contrasts = NULL, na.action = na.pass) DynamicRegressionRandomWalkOptions( sigma.prior = NULL, sdy = NULL, sdx = NULL) DynamicRegressionHierarchicalRandomWalkOptions( sdy = NULL, sigma.mean.prior = NULL, shrinkage.parameter.prior = GammaPrior(a = 10, b = 1), sigma.max = NULL) DynamicRegressionArOptions(lags = 1, sigma.prior = SdPrior(1, 1))
state.specification |
A list of state components that you wish to add to. If omitted, an empty list will be assumed. |
formula |
A formula describing the regression portion of the relationship between y and X. If no regressors are desired then the formula can be replaced by a numeric vector giving the time series to be modeled. |
data |
An optional data frame, list or environment (or object
coercible by |
model.options |
An object inheriting from
|
sigma.mean.prior |
An object created by
|
sigma.mean.prior.DEPRECATED |
This option should be set using model.options. It will be removed in a future release. |
shrinkage.parameter.prior |
An object of class
|
shrinkage.parameter.prior.DEPRECATED |
This option should be set using model.options. It will be removed in a future release. |
sigma.max |
The largest supported value of each
|
sigma.max.DEPRECATED |
This option should be set using model.options. It will be removed in a future release. |
contrasts |
An optional list. See the |
na.action |
What to do about missing values. The default is to allow missing responses, but no missing predictors. Set this to na.omit or na.exclude if you want to omit missing responses altogether. |
sdy |
The standard deviation of the response variable. This is
used to scale default priors and |
sdx |
The vector of standard deviations of each predictor variable in the dynamic regression. Used only to scale the default prior. This argument is not used if a prior is specified directly. |
lags |
The number of lags in the autoregressive process for the coefficients. |
sigma.prior |
Either an object of class |
For the standard "random walk" coefficient model, the model is
That is, each coefficient evolves independently, with its own variance term which is scaled by the variance of the i'th column of X. The parameters of the hyperprior are interpretable as: sqrt(b/a) typical amount that a coefficient might change in a single time period, and 'a' is the 'sample size' or 'shrinkage parameter' measuring the degree of similarity in sigma[i] among the arms.
In most cases we hope b/a is small, so that sigma[i]'s will be small and the series will be forecastable. We also hope that 'a' is large because it means that the sigma[i]'s will be similar to one another.
The default prior distribution is a pair of independent Gamma priors for sqrt(b/a) and a. The mean of sigma[i] is set to .01 * sd(y) with shape parameter equal to 1. The mean of the shrinkage parameter is set to 10, but with shape parameter equal to 1.
If the coefficients have AR dynamics, then the model is that each
coefficient independently follows an AR(p) process, where p is given
by the lags
argument. Independent priors are assumed for each
coefficient's model, with a uniform prior on AR coefficients (with
support restricted to the finite region where the process is
stationary), while the sigma.prior
argument gives the prior for
each coefficient's innovation variance.
Returns a list with the elements necessary to specify a dynamic regression model.
Steven L. Scott
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
## Setting the seed to avoid small sample effects resulting from small ## number of iterations. set.seed(8675309) n <- 1000 x <- matrix(rnorm(n)) # beta follows a random walk with sd = .1 starting at -12. beta <- cumsum(rnorm(n, 0, .1)) - 12 # level is a local level model with sd = 1 starting at 18. level <- cumsum(rnorm(n)) + 18 # sigma.obs is .1 error <- rnorm(n, 0, .1) y <- level + x * beta + error par(mfrow = c(1, 3)) plot(y, main = "Raw Data") plot(x, y - level, main = "True Regression Effect") plot(y - x * beta, main = "Local Level Effect") ss <- list() ss <- AddLocalLevel(ss, y) ss <- AddDynamicRegression(ss, y ~ x) ## In a real appliction you'd probably want more than 100 ## iterations. See comment above about the random seed. model <- bsts(y, state.specification = ss, niter = 100, seed = 8675309) plot(model, "dynamic", burn = 10) xx <- rnorm(10) pred <- predict(model, newdata = xx) plot(pred)
## Setting the seed to avoid small sample effects resulting from small ## number of iterations. set.seed(8675309) n <- 1000 x <- matrix(rnorm(n)) # beta follows a random walk with sd = .1 starting at -12. beta <- cumsum(rnorm(n, 0, .1)) - 12 # level is a local level model with sd = 1 starting at 18. level <- cumsum(rnorm(n)) + 18 # sigma.obs is .1 error <- rnorm(n, 0, .1) y <- level + x * beta + error par(mfrow = c(1, 3)) plot(y, main = "Raw Data") plot(x, y - level, main = "True Regression Effect") plot(y - x * beta, main = "Local Level Effect") ss <- list() ss <- AddLocalLevel(ss, y) ss <- AddDynamicRegression(ss, y ~ x) ## In a real appliction you'd probably want more than 100 ## iterations. See comment above about the random seed. model <- bsts(y, state.specification = ss, niter = 100, seed = 8675309) plot(model, "dynamic", burn = 10) xx <- rnorm(10) pred <- predict(model, newdata = xx) plot(pred)
Add a local level model to a state specification. The local level model assumes the trend is a random walk:
The prior is on the
parameter.
AddLocalLevel( state.specification, y, sigma.prior, initial.state.prior, sdy, initial.y)
AddLocalLevel( state.specification, y, sigma.prior, initial.state.prior, sdy, initial.y)
state.specification |
A list of state components that you wish to add to. If omitted, an empty list will be assumed. |
y |
The time series to be modeled, as a numeric vector. |
sigma.prior |
An object created by |
initial.state.prior |
An object created using
|
sdy |
The standard deviation of the series to be modeled. This
will be ignored if |
initial.y |
The initial value of the series being modeled. This will be
ignored if |
Returns a list with the elements necessary to specify a local linear trend state model.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
Add a local linear trend model to a state specification. The local linear trend model assumes that both the mean and the slope of the trend follow random walks. The equation for the mean is
The equation for the slope is
The prior distribution is on the level standard deviation
and the slope standard deviation
.
AddLocalLinearTrend( state.specification = NULL, y, level.sigma.prior = NULL, slope.sigma.prior = NULL, initial.level.prior = NULL, initial.slope.prior = NULL, sdy, initial.y)
AddLocalLinearTrend( state.specification = NULL, y, level.sigma.prior = NULL, slope.sigma.prior = NULL, initial.level.prior = NULL, initial.slope.prior = NULL, sdy, initial.y)
state.specification |
A list of state components that you wish to add to. If omitted, an empty list will be assumed. |
y |
The time series to be modeled, as a numeric vector. |
level.sigma.prior |
An object created by
|
slope.sigma.prior |
An object created by
|
initial.level.prior |
An object created by
|
initial.slope.prior |
An object created by
|
sdy |
The standard deviation of the series to be modeled. This
will be ignored if |
initial.y |
The initial value of the series being modeled. This will be
ignored if |
Returns a list with the elements necessary to specify a local linear trend state model.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
A seasonal state component for daily data, representing the contribution of each month to the annual seasonal cycle. I.e. this is the "January, February, March, ..." effect, with 12 seasons. There is a step change at the start of each month, and then the contribution of that month is constant over the course of the month.
Note that if you have anything other than daily data, then you're
probably looking for AddSeasonal
instead.
The state of this model is an 11-vector
where the first element is the contribution to the mean for the
current month, and the remaining elements are the values for the 10
most recent months. When
is the first day in the month
then
And the remaining elements are shifted down
one. When
is any other day then
.
AddMonthlyAnnualCycle(state.specification, y, date.of.first.observation = NULL, sigma.prior = NULL, initial.state.prior = NULL, sdy)
AddMonthlyAnnualCycle(state.specification, y, date.of.first.observation = NULL, sigma.prior = NULL, initial.state.prior = NULL, sdy)
state.specification |
A list of state components, to which the monthly annual cycle will be added. If omitted, an empty list will be assumed. |
y |
The time series to be modeled, as a numeric vector. |
date.of.first.observation |
The time stamp of the first
observation in |
sigma.prior |
An object created by |
initial.state.prior |
An object created using
|
sdy |
The standard deviation of the series to be modeled. This
will be ignored if |
## Let's simulate some fake daily data with a monthly cycle. ## Not run: residuals <- rnorm(365 * 5) ## End(Not run) n <- length(residuals) dates <- seq.Date(from = as.Date("2014-01-01"), len = n, by = 1) monthly.cycle <- rnorm(12) monthly.cycle <- monthly.cycle - mean(monthly.cycle) timestamps <- as.POSIXlt(dates) month <- timestamps$mon + 1 new.month <- c(TRUE, diff(timestamps$mon) != 0) month.effect <- cumsum(new.month) month.effect[month.effect == 0] <- 12 response <- monthly.cycle[month] + residuals response <- zoo(response, timestamps) ## Now let's fit a bsts model to the daily data with a monthly annual ## cycle. ss <- AddLocalLevel(list(), response) ss <- AddMonthlyAnnualCycle(ss, response) ## In real life you'll probably want more iterations. model <- bsts(response, state.specification = ss, niter = 200) plot(model) plot(model, "monthly")
## Let's simulate some fake daily data with a monthly cycle. ## Not run: residuals <- rnorm(365 * 5) ## End(Not run) n <- length(residuals) dates <- seq.Date(from = as.Date("2014-01-01"), len = n, by = 1) monthly.cycle <- rnorm(12) monthly.cycle <- monthly.cycle - mean(monthly.cycle) timestamps <- as.POSIXlt(dates) month <- timestamps$mon + 1 new.month <- c(TRUE, diff(timestamps$mon) != 0) month.effect <- cumsum(new.month) month.effect[month.effect == 0] <- 12 response <- monthly.cycle[month] + residuals response <- zoo(response, timestamps) ## Now let's fit a bsts model to the daily data with a monthly annual ## cycle. ss <- AddLocalLevel(list(), response) ss <- AddMonthlyAnnualCycle(ss, response) ## In real life you'll probably want more iterations. model <- bsts(response, state.specification = ss, niter = 200) plot(model) plot(model, "monthly")
Adds a random walk holiday state model to the state specification. This model says
where there is one element in for each day
in the holiday influence window. The transition equation is
if t+1 occurs on day d(t+1) of the influence window, and
otherwise.
AddRandomWalkHoliday(state.specification = NULL, y, holiday, time0 = NULL, sigma.prior = NULL, initial.state.prior = NULL, sdy = sd(as.numeric(y), na.rm = TRUE))
AddRandomWalkHoliday(state.specification = NULL, y, holiday, time0 = NULL, sigma.prior = NULL, initial.state.prior = NULL, sdy = sd(as.numeric(y), na.rm = TRUE))
state.specification |
A list of state components that you wish augment. If omitted, an empty list will be assumed. |
y |
The time series to be modeled, as a numeric vector
convertible to |
holiday |
An object of class |
time0 |
An object convertible to |
sigma.prior |
An object created by |
initial.state.prior |
An object created using
|
sdy |
The standard deviation of the series to be modeled. This
will be ignored if |
A list describing the specification of the random walk holiday state model, formatted as expected by the underlying C++ code.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
bsts
.
RegressionHolidayStateModel
HierarchicalRegressionHolidayStateModel
trend <- cumsum(rnorm(730, 0, .1)) dates <- seq.Date(from = as.Date("2014-01-01"), length = length(trend), by = "day") y <- zoo(trend + rnorm(length(trend), 0, .2), dates) AddHolidayEffect <- function(y, dates, effect) { ## Adds a holiday effect to simulated data. ## Args: ## y: A zoo time series, with Dates for indices. ## dates: The dates of the holidays. ## effect: A vector of holiday effects of odd length. The central effect is ## the main holiday, with a symmetric influence window on either side. ## Returns: ## y, with the holiday effects added. time <- dates - (length(effect) - 1) / 2 for (i in 1:length(effect)) { y[time] <- y[time] + effect[i] time <- time + 1 } return(y) } ## Define some holidays. memorial.day <- NamedHoliday("MemorialDay") memorial.day.effect <- c(.3, 3, .5) memorial.day.dates <- as.Date(c("2014-05-26", "2015-05-25")) y <- AddHolidayEffect(y, memorial.day.dates, memorial.day.effect) presidents.day <- NamedHoliday("PresidentsDay") presidents.day.effect <- c(.5, 2, .25) presidents.day.dates <- as.Date(c("2014-02-17", "2015-02-16")) y <- AddHolidayEffect(y, presidents.day.dates, presidents.day.effect) labor.day <- NamedHoliday("LaborDay") labor.day.effect <- c(1, 2, 1) labor.day.dates <- as.Date(c("2014-09-01", "2015-09-07")) y <- AddHolidayEffect(y, labor.day.dates, labor.day.effect) ## The holidays can be in any order. holiday.list <- list(memorial.day, labor.day, presidents.day) number.of.holidays <- length(holiday.list) ## In a real example you'd want more than 100 MCMC iterations. niter <- 100 ss <- AddLocalLevel(list(), y) ss <- AddRandomWalkHoliday(ss, y, memorial.day) ss <- AddRandomWalkHoliday(ss, y, labor.day) ss <- AddRandomWalkHoliday(ss, y, presidents.day) model <- bsts(y, state.specification = ss, niter = niter, seed = 8675309) ## Plot model components. plot(model, "comp") ## Plot the effect of the specific state component. plot(ss[[2]], model)
trend <- cumsum(rnorm(730, 0, .1)) dates <- seq.Date(from = as.Date("2014-01-01"), length = length(trend), by = "day") y <- zoo(trend + rnorm(length(trend), 0, .2), dates) AddHolidayEffect <- function(y, dates, effect) { ## Adds a holiday effect to simulated data. ## Args: ## y: A zoo time series, with Dates for indices. ## dates: The dates of the holidays. ## effect: A vector of holiday effects of odd length. The central effect is ## the main holiday, with a symmetric influence window on either side. ## Returns: ## y, with the holiday effects added. time <- dates - (length(effect) - 1) / 2 for (i in 1:length(effect)) { y[time] <- y[time] + effect[i] time <- time + 1 } return(y) } ## Define some holidays. memorial.day <- NamedHoliday("MemorialDay") memorial.day.effect <- c(.3, 3, .5) memorial.day.dates <- as.Date(c("2014-05-26", "2015-05-25")) y <- AddHolidayEffect(y, memorial.day.dates, memorial.day.effect) presidents.day <- NamedHoliday("PresidentsDay") presidents.day.effect <- c(.5, 2, .25) presidents.day.dates <- as.Date(c("2014-02-17", "2015-02-16")) y <- AddHolidayEffect(y, presidents.day.dates, presidents.day.effect) labor.day <- NamedHoliday("LaborDay") labor.day.effect <- c(1, 2, 1) labor.day.dates <- as.Date(c("2014-09-01", "2015-09-07")) y <- AddHolidayEffect(y, labor.day.dates, labor.day.effect) ## The holidays can be in any order. holiday.list <- list(memorial.day, labor.day, presidents.day) number.of.holidays <- length(holiday.list) ## In a real example you'd want more than 100 MCMC iterations. niter <- 100 ss <- AddLocalLevel(list(), y) ss <- AddRandomWalkHoliday(ss, y, memorial.day) ss <- AddRandomWalkHoliday(ss, y, labor.day) ss <- AddRandomWalkHoliday(ss, y, presidents.day) model <- bsts(y, state.specification = ss, niter = niter, seed = 8675309) ## Plot model components. plot(model, "comp") ## Plot the effect of the specific state component. plot(ss[[2]], model)
Add a seasonal model to a state specification.
The seasonal model can be thought of as a regression on
nseasons
dummy variables with coefficients constrained to sum
to 1 (in expectation). If there are S
seasons then the state
vector is of dimension
S-1
. The first
element of the state vector obeys
AddSeasonal( state.specification, y, nseasons, season.duration = 1, sigma.prior, initial.state.prior, sdy)
AddSeasonal( state.specification, y, nseasons, season.duration = 1, sigma.prior, initial.state.prior, sdy)
state.specification |
A list of state components that you wish to add to. If omitted, an empty list will be assumed. |
y |
The time series to be modeled, as a numeric vector. |
nseasons |
The number of seasons to be modeled. |
season.duration |
The number of time periods in each season. |
sigma.prior |
An object created by |
initial.state.prior |
An object created using
|
sdy |
The standard deviation of the series to be modeled. This
will be ignored if |
Returns a list with the elements necessary to specify a seasonal state model.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
The semi-local linear trend model is similar to the local linear
trend, but more useful for long-term forecasting. It assumes the
level component moves according to a random walk, but the slope
component moves according to an AR1 process centered on a
potentially nonzero value . The equation for the level is
The equation for the slope is
This model differs from the local linear trend model in that the
latter assumes the slope follows a random
walk. A stationary AR(1) process is less variable than a random walk
when making projections far into the future, so this model often gives
more reasonable uncertainty estimates when making long term forecasts.
The prior distribution for the semi-local linear trend has four independent components. These are:
an inverse gamma prior on the level
standard deviation ,
an inverse gamma prior on the slope standard deviation
,
a Gaussian prior on the long run slope parameter ,
and a potentially truncated Gaussian prior on the AR1
coefficient . If the prior on
is
truncated to (-1, 1), then the slope will exhibit short term
stationary variation around the long run slope
.
AddSemilocalLinearTrend( state.specification = list(), y = NULL, level.sigma.prior = NULL, slope.mean.prior = NULL, slope.ar1.prior = NULL, slope.sigma.prior = NULL, initial.level.prior = NULL, initial.slope.prior = NULL, sdy = NULL, initial.y = NULL)
AddSemilocalLinearTrend( state.specification = list(), y = NULL, level.sigma.prior = NULL, slope.mean.prior = NULL, slope.ar1.prior = NULL, slope.sigma.prior = NULL, initial.level.prior = NULL, initial.slope.prior = NULL, sdy = NULL, initial.y = NULL)
state.specification |
A list of state components that you wish to add to. If omitted, an empty list will be assumed. |
y |
The time series to be modeled, as a numeric vector. This can
be omitted if |
level.sigma.prior |
An object created by
|
slope.mean.prior |
An object created by
|
slope.ar1.prior |
An object created by
|
slope.sigma.prior |
An object created by
|
initial.level.prior |
An object created by
|
initial.slope.prior |
An object created by
|
sdy |
The standard deviation of the series to be modeled. This
will be ignored if |
initial.y |
The initial value of the series being modeled. This will be
ignored if |
Returns a list with the elements necessary to specify a generalized local linear trend state model.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
Adds a static intercept term to a state space model. If the model includes a traditional trend component (e.g. local level, local linear trend, etc) then a separate intercept is not needed (and will probably cause trouble, as it will be confounded with the initial state of the trend model). However, if there is no trend, or the trend is an AR process centered around zero, then adding a static intercept will shift the center to a data-determined value.
AddStaticIntercept( state.specification, y, initial.state.prior = NormalPrior(y[1], sd(y, na.rm = TRUE)))
AddStaticIntercept( state.specification, y, initial.state.prior = NormalPrior(y[1], sd(y, na.rm = TRUE)))
state.specification |
A list of state components that you wish to add to. If omitted, an empty list will be assumed. |
y |
The time series to be modeled, as a numeric vector. |
initial.state.prior |
An object created using
|
Returns a list with the information required to specify the state component. If initial.state.prior is specified then y is unused.
Steven L. Scott
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
Add a local level model to a state specification. The local linear trend model assumes that both the mean and the slope of the trend follow random walks. The equation for the mean is
The equation for the slope is
Independent prior distributions are assumed on the level standard
deviation, the slope standard deviation
, the level tail thickness
, and the slope tail thickness
.
AddStudentLocalLinearTrend( state.specification = NULL, y, save.weights = FALSE, level.sigma.prior = NULL, level.nu.prior = NULL, slope.sigma.prior = NULL, slope.nu.prior = NULL, initial.level.prior = NULL, initial.slope.prior = NULL, sdy, initial.y)
AddStudentLocalLinearTrend( state.specification = NULL, y, save.weights = FALSE, level.sigma.prior = NULL, level.nu.prior = NULL, slope.sigma.prior = NULL, slope.nu.prior = NULL, initial.level.prior = NULL, initial.slope.prior = NULL, sdy, initial.y)
state.specification |
A list of state components that you wish to add to. If omitted, an empty list will be assumed. |
y |
The time series to be modeled, as a numeric vector. |
save.weights |
A logical value indicating whether to save the draws of the weights from the normal mixture representation. |
level.sigma.prior |
An object created by
|
level.nu.prior |
An object inheritng from the class
|
slope.sigma.prior |
An object created by
|
slope.nu.prior |
An object inheritng from the class
|
initial.level.prior |
An object created by
|
initial.slope.prior |
An object created by
|
sdy |
The standard deviation of the series to be modeled. This
will be ignored if |
initial.y |
The initial value of the series being modeled. This will be
ignored if |
Returns a list with the elements necessary to specify a local linear trend state model.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
data(rsxfs) ss <- AddStudentLocalLinearTrend(list(), rsxfs) model <- bsts(rsxfs, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
data(rsxfs) ss <- AddStudentLocalLinearTrend(list(), rsxfs) model <- bsts(rsxfs, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
Add a trigonometric seasonal model to a state specification.
AddTrig( state.specification = NULL, y, period, frequencies, sigma.prior = NULL, initial.state.prior = NULL, sdy = sd(y, na.rm = TRUE), method = c("harmonic", "direct"))
AddTrig( state.specification = NULL, y, period, frequencies, sigma.prior = NULL, initial.state.prior = NULL, sdy = sd(y, na.rm = TRUE), method = c("harmonic", "direct"))
state.specification |
A list of state components that you wish to add to. If omitted, an empty list will be assumed. |
y |
The time series to be modeled, as a numeric vector. |
period |
A positive scalar giving the number of time steps required for the longest cycle to repeat. |
frequencies |
A vector of positive real numbers giving the number of times each cyclic component repeats in a period. One sine and one cosine term will be added for each frequency. |
sigma.prior |
An object created by |
initial.state.prior |
An object created using
|
sdy |
The standard deviation of the series to be modeled. This
will be ignored if |
method |
The method of including the sinusoids. The "harmonic" method is strongly preferred, with "direct" offered mainly for teaching purposes. |
Each frequency where S
is the period (number of time points in a full cycle) is associated
with two time-varying random components:
, and
. They evolve through
time as
where and
are
independent with the same variance. This is the real-valued version
of a harmonic function:
.
The transition matrix multiplies the function by
, so that
after 't' steps the harmonic's value is
.
The model dynamics allows gamma to drift over time in a random walk.
The state of the model is
,
for j = 1, ... number of frequencies.
The state transition matrix is a block diagonal matrix, where block 'j' is
The error variance matrix is sigma^2 * I. There is a common sigma^2 parameter shared by all frequencies.
The model is full rank, so the state error expander matrix R_t is the identity.
The observation_matrix is (1, 0, 1, 0, ...), where the 1's pick out the 'real' part of the state contributions.
Under the 'direct' method the trig component adds a collection of sine and cosine terms with randomly varying coefficients to the state model. The coefficients are the states, while the sine and cosine values are part of the "observation matrix".
This state component adds the sum of its terms to the observation equation.
The evolution equation is that each of the sinusoid coefficients follows a random walk with standard deviation sigma[j].
The direct method is generally inferior to the harmonic method. It may be removed in the future.
Returns a list with the elements necessary to specify a seasonal state model.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddTrig(ss, y, period = 12, frequencies = 1:3) model <- bsts(y, state.specification = ss, niter = 200) plot(model) ## The "harmonic" method is much more stable than the "direct" method. ss <- AddLocalLinearTrend(list(), y) ss <- AddTrig(ss, y, period = 12, frequencies = 1:3, method = "direct") model2 <- bsts(y, state.specification = ss, niter = 200) plot(model2)
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddTrig(ss, y, period = 12, frequencies = 1:3) model <- bsts(y, state.specification = ss, niter = 200) plot(model) ## The "harmonic" method is much more stable than the "direct" method. ss <- AddLocalLinearTrend(list(), y) ss <- AddTrig(ss, y, period = 12, frequencies = 1:3, method = "direct") model2 <- bsts(y, state.specification = ss, niter = 200) plot(model2)
Aggregate measurements from a fine scaled time series into
a coarse time series. This is similar to functions from the
xts
package, but it can handle aggregation from weeks to
months.
AggregateTimeSeries(fine.series, contains.end, membership.fraction, trim.left = any(membership.fraction < 1), trim.right = NULL, byrow = TRUE)
AggregateTimeSeries(fine.series, contains.end, membership.fraction, trim.left = any(membership.fraction < 1), trim.right = NULL, byrow = TRUE)
fine.series |
A numeric vector or matrix giving the fine scale time series to be aggregated. |
contains.end |
A logical vector corresponding to
|
membership.fraction |
A numeric vector corresponding to
|
trim.left |
Logical indicating whether the first observation in the coarse aggregate should be removed. |
trim.right |
Logical indicating whether the final observation in the coarse aggregate should be removed. |
byrow |
Logical. If |
A matrix (if fine.series
is a matrix) or vector
(otherwise) containing the aggregated values of fine.series
.
Steven L. Scott [email protected]
week.ending <- as.Date(c("2011-11-05", "2011-11-12", "2011-11-19", "2011-11-26", "2011-12-03", "2011-12-10", "2011-12-17", "2011-12-24", "2011-12-31")) membership.fraction <- GetFractionOfDaysInInitialMonth(week.ending) which.month <- MatchWeekToMonth(week.ending, as.Date("2011-11-01")) contains.end <- WeekEndsMonth(week.ending) weekly.values <- rnorm(length(week.ending)) monthly.values <- AggregateTimeSeries(weekly.values, contains.end, membership.fraction)
week.ending <- as.Date(c("2011-11-05", "2011-11-12", "2011-11-19", "2011-11-26", "2011-12-03", "2011-12-10", "2011-12-17", "2011-12-24", "2011-12-31")) membership.fraction <- GetFractionOfDaysInInitialMonth(week.ending) which.month <- MatchWeekToMonth(week.ending, as.Date("2011-11-01")) contains.end <- WeekEndsMonth(week.ending) weekly.values <- rnorm(length(week.ending)) monthly.values <- AggregateTimeSeries(weekly.values, contains.end, membership.fraction)
Aggregate measurements from a weekly time series into a monthly time series.
AggregateWeeksToMonths(weekly.series, membership.fraction = NULL, trim.left = TRUE, trim.right = NULL)
AggregateWeeksToMonths(weekly.series, membership.fraction = NULL, trim.left = TRUE, trim.right = NULL)
weekly.series |
A numeric vector or matrix of class
|
membership.fraction |
A optional numeric vector corresponding to
|
trim.left |
Logical indicating whether the first observation in the monthly aggregate should be removed. |
trim.right |
Logical indicating whether the final observation in the monthly aggregate should be removed. |
A zoo-matrix (if weekly.series
is a matrix) or vector
(otherwise) containing the aggregated values of weekly.series
.
Steven L. Scott [email protected]
week.ending <- as.Date(c("2011-11-05", "2011-11-12", "2011-11-19", "2011-11-26", "2011-12-03", "2011-12-10", "2011-12-17", "2011-12-24", "2011-12-31")) weekly.values <- zoo(rnorm(length(week.ending)), week.ending) monthly.values <- AggregateWeeksToMonths(weekly.values)
week.ending <- as.Date(c("2011-11-05", "2011-11-12", "2011-11-19", "2011-11-26", "2011-12-03", "2011-12-10", "2011-12-17", "2011-12-24", "2011-12-31")) weekly.values <- zoo(rnorm(length(week.ending)), week.ending) monthly.values <- AggregateWeeksToMonths(weekly.values)
Add a sparse AR(p) process to the state distribution. A sparse AR(p) is an AR(p) process with a spike and slab prior on the autoregression coefficients.
AddAutoAr(state.specification, y, lags = 1, prior = NULL, sdy = NULL, ...)
AddAutoAr(state.specification, y, lags = 1, prior = NULL, sdy = NULL, ...)
state.specification |
A list of state components. If omitted, an empty list is assumed. |
y |
A numeric vector. The time series to be modeled. This can
be omitted if |
lags |
The maximum number of lags ("p") to be considered in the AR(p) process. |
prior |
An object inheriting from |
sdy |
The sample standard deviation of the time series to be
modeled. Used to scale the prior distribution. This can be omitted
if |
... |
Extra arguments passed to |
The model contributes alpha[t] to the expected value of y[t], where the transition equation is
The state consists of the last p
lags of alpha
. The
state transition matrix has phi
in its first row, ones along
its first subdiagonal, and zeros elsewhere. The state variance matrix
has sigma^2
in its upper left corner and is zero elsewhere.
The observation matrix has 1 in its first element and is zero
otherwise.
This model differs from the one in AddAr
only in that
some of its coefficients may be set to zero.
Returns state.specification
with an AR(p) state component
added to the end.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
n <- 100 residual.sd <- .001 # Actual values of the AR coefficients true.phi <- c(-.7, .3, .15) ar <- arima.sim(model = list(ar = true.phi), n = n, sd = 3) ## Layer some noise on top of the AR process. y <- ar + rnorm(n, 0, residual.sd) ss <- AddAutoAr(list(), y, lags = 6) # Fit the model with knowledge with residual.sd essentially fixed at the # true value. model <- bsts(y, state.specification=ss, niter = 500, prior = SdPrior(residual.sd, 100000)) # Now compare the empirical ACF to the true ACF. acf(y, lag.max = 30) points(0:30, ARMAacf(ar = true.phi, lag.max = 30), pch = "+") points(0:30, ARMAacf(ar = colMeans(model$AR6.coefficients), lag.max = 30)) legend("topright", leg = c("empirical", "truth", "MCMC"), pch = c(NA, "+", "o"))
n <- 100 residual.sd <- .001 # Actual values of the AR coefficients true.phi <- c(-.7, .3, .15) ar <- arima.sim(model = list(ar = true.phi), n = n, sd = 3) ## Layer some noise on top of the AR process. y <- ar + rnorm(n, 0, residual.sd) ss <- AddAutoAr(list(), y, lags = 6) # Fit the model with knowledge with residual.sd essentially fixed at the # true value. model <- bsts(y, state.specification=ss, niter = 500, prior = SdPrior(residual.sd, 100000)) # Now compare the empirical ACF to the true ACF. acf(y, lag.max = 30) points(0:30, ARMAacf(ar = true.phi, lag.max = 30), pch = "+") points(0:30, ARMAacf(ar = colMeans(model$AR6.coefficients), lag.max = 30)) legend("topright", leg = c("empirical", "truth", "MCMC"), pch = c(NA, "+", "o"))
Uses MCMC to sample from the posterior distribution of a Bayesian structural time series model. This function can be used either with or without contemporaneous predictor variables (in a time series regression).
If predictor variables are present, the regression coefficients are fixed (as opposed to time varying, though time varying coefficients might be added as state component). The predictors and response in the formula are contemporaneous, so if you want lags and differences you need to put them in the predictor matrix yourself.
If no predictor variables are used, then the model is an ordinary state space time series model.
The model allows for several useful extensions beyond standard Bayesian dynamic linear models.
A spike-and-slab prior is used for the (static) regression component of models that include predictor variables. This is especially useful with large numbers of regressor series.
Both the spike-and-slab component (for static regressors) and
the Kalman filter (for components of time series state) require
observations and state variables to be Gaussian. The bsts
package allows for non-Gaussian error families in the observation
equation (as well as some state components) by using data
augmentation to express these families as conditionally
Gaussian.
As of version 0.7.0, bsts
supports having multiple
observations at the same time point. In this case the basic model
is taken to be
This is a regression model where all observations with the same time point share a common time series effect.
bsts(formula, state.specification, family = c("gaussian", "logit", "poisson", "student"), data, prior, contrasts = NULL, na.action = na.pass, niter, ping = niter / 10, model.options = BstsOptions(), timestamps = NULL, seed = NULL, ...)
bsts(formula, state.specification, family = c("gaussian", "logit", "poisson", "student"), data, prior, contrasts = NULL, na.action = na.pass, niter, ping = niter / 10, model.options = BstsOptions(), timestamps = NULL, seed = NULL, ...)
formula |
A formula describing the regression portion of the relationship between y and X. If no regressors are desired then the formula can be replaced by a numeric vector giving the time series to be modeled. Missing values are not allowed in predictors, but they are allowed in the response variable. If the response variable is of class |
state.specification |
A list with elements created by
|
family |
The model family for the observation equation. Non-Gaussian model families use data augmentation to recover a conditionally Gaussian model. |
data |
An optional data frame, list or environment (or object
coercible by |
prior |
If regressors are supplied in the model formula, then
this is a prior distribution for the regression component of the
model, as created by If the model contains no regressors, then this is simply the prior
on the residual standard deviation, expressed as an object created
by |
contrasts |
An optional list containing the names of contrast
functions to use when converting factors numeric variables in a
regression formula. This argument works exactly as it does in
|
na.action |
What to do about missing values. The default is to allow missing responses, but no missing predictors. Set this to na.omit or na.exclude if you want to omit missing responses altogether. |
niter |
A positive integer giving the desired number of MCMC draws. |
ping |
A scalar giving the desired frequency of status messages.
If ping > 0 then the program will print a status message to the
screen every |
model.options |
An object (list) returned by
|
timestamps |
The timestamp associated with each value of the
response. This argument is primarily useful in cases where the
response has missing gaps, or where there are multiple observations
per time point. If the response is a "regular" time series with a
single observation per time point then you can leave this argument
as |
seed |
An integer to use as the random seed for the underlying
C++ code. If |
... |
Extra arguments to be passed to
|
If the model family is logit, then there are two ways one can format the response variable. If the response is 0/1, TRUE/FALSE, or 1/-1, then the response variable can be passed as with any other model family. If the response is a set of counts out of a specified number of trials then it can be passed as a two-column matrix, where the first column contains the counts of successes and the second contains the count of failures.
Likewise, if the model family is Poisson, the response can be passed as a single vector of counts, under the assumption that each observation has unit exposure. If the exposures differ across observations, then the resopnse can be a two column matrix, with the first column containing the event counts and the second containing exposure times.
An object of class bsts
which is a list with the
following components
coefficients |
A |
sigma.obs |
A vector of length |
The returned object will also contain named elements holding the MCMC
draws of model parameters belonging to the state models. The names of
each component are supplied by the entries in
state.specification
. If a model parameter is a scalar, then
the list element is a vector with niter
elements. If the
parameter is a vector then the list element is a matrix with
niter
rows. If the parameter is a matrix then the list element
is a 3-way array with first dimension niter
.
Finally, if a model formula was supplied, then the returned object will contain the information necessary for the predict method to build the design matrix when a new prediction is made.
Steven L. Scott [email protected]
Scott and Varian (2014) "Predicting the Present with Bayesian Structural Time Series", International Journal of Mathematical Modelling and Numerical Optimization. 4–23.
Scott and Varian (2015) "Bayesian Variable Selection for Nowcasting Economic Time Series", Economic Analysis of the Digital Economy, pp 119-135.
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
George and McCulloch (1997) "Approaches for Bayesian variable selection", Statistica Sinica pp 339–374.
bsts
,
AddLocalLevel
,
AddLocalLinearTrend
,
AddSemilocalLinearTrend
,
AddSeasonal
AddDynamicRegression
SpikeSlabPrior
,
SdPrior
.
## Example 1: Time series (ts) data data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) par(mfrow = c(1,2)) plot(model) plot(pred) ## Not run: MakePlots <- function(model, ask = TRUE) { ## Make all the plots callable by plot.bsts. opar <- par(ask = ask) on.exit(par(opar)) plot.types <- c("state", "components", "residuals", "prediction.errors", "forecast.distribution") for (plot.type in plot.types) { plot(model, plot.type) } if (model$has.regression) { regression.plot.types <- c("coefficients", "predictors", "size") for (plot.type in regression.plot.types) { plot(model, plot.type) } } } ## Example 2: GOOG is the Google stock price, an xts series of daily ## data. data(goog) ss <- AddSemilocalLinearTrend(list(), goog) model <- bsts(goog, state.specification = ss, niter = 500) MakePlots(model) ## Example 3: Change GOOG to be zoo, and not xts. goog <- zoo(goog, index(goog)) ss <- AddSemilocalLinearTrend(list(), goog) model <- bsts(goog, state.specification = ss, niter = 500) MakePlots(model) ## Example 4: Naked numeric data works too y <- rnorm(100) ss <- AddLocalLinearTrend(list(), y) model <- bsts(y, state.specification = ss, niter = 500) MakePlots(model) ## Example 5: zoo data with intra-day measurements y <- zoo(rnorm(100), seq(from = as.POSIXct("2012-01-01 7:00 EST"), len = 100, by = 100)) ss <- AddLocalLinearTrend(list(), y) model <- bsts(y, state.specification = ss, niter = 500) MakePlots(model) \dontrun{ ## Example 6: Including regressors data(iclaims) ss <- AddLocalLinearTrend(list(), initial.claims$iclaimsNSA) ss <- AddSeasonal(ss, initial.claims$iclaimsNSA, nseasons = 52) model <- bsts(iclaimsNSA ~ ., state.specification = ss, data = initial.claims, niter = 1000) plot(model) plot(model, "components") plot(model, "coefficients") plot(model, "predictors") } ## End(Not run) ## Not run: ## Example 7: Regressors with multiple time stamps. number.of.time.points <- 50 sample.size.per.time.point <- 10 total.sample.size <- number.of.time.points * sample.size.per.time.point sigma.level <- .1 sigma.obs <- 1 ## Simulate some fake data with a local level state component. trend <- cumsum(rnorm(number.of.time.points, 0, sigma.level)) predictors <- matrix(rnorm(total.sample.size * 2), ncol = 2) colnames(predictors) <- c("X1", "X2") coefficients <- c(-10, 10) regression <- as.numeric(predictors %*% coefficients) y.hat <- rep(trend, sample.size.per.time.point) + regression y <- rnorm(length(y.hat), y.hat, sigma.obs) ## Create some time stamps, with multiple observations per time stamp. first <- as.POSIXct("2013-03-24") dates <- seq(from = first, length = number.of.time.points, by = "month") timestamps <- rep(dates, sample.size.per.time.point) ## Run the model with a local level trend, and an unnecessary seasonal component. ss <- AddLocalLevel(list(), y) ss <- AddSeasonal(ss, y, nseasons = 7) model <- bsts(y ~ predictors, ss, niter = 250, timestamps = timestamps, seed = 8675309) plot(model) plot(model, "components") ## End(Not run) ## Example 8: Non-Gaussian data ## Poisson counts of shark attacks in Florida. data(shark) logshark <- log1p(shark$Attacks) ss.level <- AddLocalLevel(list(), y = logshark) model <- bsts(shark$Attacks, ss.level, niter = 500, ping = 250, family = "poisson", seed = 8675309) ## Poisson data can have an 'exposure' as the second column of a ## two-column matrix. model <- bsts(cbind(shark$Attacks, shark$Population / 1000), state.specification = ss.level, niter = 500, family = "poisson", ping = 250, seed = 8675309)
## Example 1: Time series (ts) data data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) par(mfrow = c(1,2)) plot(model) plot(pred) ## Not run: MakePlots <- function(model, ask = TRUE) { ## Make all the plots callable by plot.bsts. opar <- par(ask = ask) on.exit(par(opar)) plot.types <- c("state", "components", "residuals", "prediction.errors", "forecast.distribution") for (plot.type in plot.types) { plot(model, plot.type) } if (model$has.regression) { regression.plot.types <- c("coefficients", "predictors", "size") for (plot.type in regression.plot.types) { plot(model, plot.type) } } } ## Example 2: GOOG is the Google stock price, an xts series of daily ## data. data(goog) ss <- AddSemilocalLinearTrend(list(), goog) model <- bsts(goog, state.specification = ss, niter = 500) MakePlots(model) ## Example 3: Change GOOG to be zoo, and not xts. goog <- zoo(goog, index(goog)) ss <- AddSemilocalLinearTrend(list(), goog) model <- bsts(goog, state.specification = ss, niter = 500) MakePlots(model) ## Example 4: Naked numeric data works too y <- rnorm(100) ss <- AddLocalLinearTrend(list(), y) model <- bsts(y, state.specification = ss, niter = 500) MakePlots(model) ## Example 5: zoo data with intra-day measurements y <- zoo(rnorm(100), seq(from = as.POSIXct("2012-01-01 7:00 EST"), len = 100, by = 100)) ss <- AddLocalLinearTrend(list(), y) model <- bsts(y, state.specification = ss, niter = 500) MakePlots(model) \dontrun{ ## Example 6: Including regressors data(iclaims) ss <- AddLocalLinearTrend(list(), initial.claims$iclaimsNSA) ss <- AddSeasonal(ss, initial.claims$iclaimsNSA, nseasons = 52) model <- bsts(iclaimsNSA ~ ., state.specification = ss, data = initial.claims, niter = 1000) plot(model) plot(model, "components") plot(model, "coefficients") plot(model, "predictors") } ## End(Not run) ## Not run: ## Example 7: Regressors with multiple time stamps. number.of.time.points <- 50 sample.size.per.time.point <- 10 total.sample.size <- number.of.time.points * sample.size.per.time.point sigma.level <- .1 sigma.obs <- 1 ## Simulate some fake data with a local level state component. trend <- cumsum(rnorm(number.of.time.points, 0, sigma.level)) predictors <- matrix(rnorm(total.sample.size * 2), ncol = 2) colnames(predictors) <- c("X1", "X2") coefficients <- c(-10, 10) regression <- as.numeric(predictors %*% coefficients) y.hat <- rep(trend, sample.size.per.time.point) + regression y <- rnorm(length(y.hat), y.hat, sigma.obs) ## Create some time stamps, with multiple observations per time stamp. first <- as.POSIXct("2013-03-24") dates <- seq(from = first, length = number.of.time.points, by = "month") timestamps <- rep(dates, sample.size.per.time.point) ## Run the model with a local level trend, and an unnecessary seasonal component. ss <- AddLocalLevel(list(), y) ss <- AddSeasonal(ss, y, nseasons = 7) model <- bsts(y ~ predictors, ss, niter = 250, timestamps = timestamps, seed = 8675309) plot(model) plot(model, "components") ## End(Not run) ## Example 8: Non-Gaussian data ## Poisson counts of shark attacks in Florida. data(shark) logshark <- log1p(shark$Attacks) ss.level <- AddLocalLevel(list(), y = logshark) model <- bsts(shark$Attacks, ss.level, niter = 500, ping = 250, family = "poisson", seed = 8675309) ## Poisson data can have an 'exposure' as the second column of a ## two-column matrix. model <- bsts(cbind(shark$Attacks, shark$Population / 1000), state.specification = ss.level, niter = 500, family = "poisson", ping = 250, seed = 8675309)
Rarely used modeling options for bsts models.
BstsOptions(save.state.contributions = TRUE, save.prediction.errors = TRUE, bma.method = c("SSVS", "ODA"), oda.options = list( fallback.probability = 0.0, eigenvalue.fudge.factor = 0.01), timeout.seconds = Inf, save.full.state = FALSE)
BstsOptions(save.state.contributions = TRUE, save.prediction.errors = TRUE, bma.method = c("SSVS", "ODA"), oda.options = list( fallback.probability = 0.0, eigenvalue.fudge.factor = 0.01), timeout.seconds = Inf, save.full.state = FALSE)
save.state.contributions |
Logical. If |
save.prediction.errors |
Logical. If |
bma.method |
If the model contains a regression component, this
argument specifies the method to use for Bayesian model averaging.
"SSVS" is stochastic search variable selection, which is the classic
approach from George and McCulloch (1997). "ODA" is orthoganal data
augmentation, from Ghosh and Clyde (2011). It adds a set of latent
observations that make the |
oda.options |
If bma.method == "ODA" then these are some options for fine tuning the ODA algorithm.
|
timeout.seconds |
The number of seconds that sampler will be allowed to run. If the timeout is exceeded the returned object will be truncated to the final draw that took place before the timeout occurred, as if that had been the requested number of iterations. |
save.full.state |
Logical. If |
The arguments are checked to make sure they have legal types and values, then a list is returned containing the arguments.
Steven L. Scott [email protected]
Produce a set of line plots showing the cumulative absolute one step ahead prediction errors for different models. This plot not only shows which model is doing the best job predicting the data, it highlights regions of the data where the predictions are particularly good or bad.
CompareBstsModels(model.list, burn = SuggestBurn(.1, model.list[[1]]), filename = "", colors = NULL, lwd = 2, xlab = "Time", main = "", grid = TRUE, cutpoint = NULL)
CompareBstsModels(model.list, burn = SuggestBurn(.1, model.list[[1]]), filename = "", colors = NULL, lwd = 2, xlab = "Time", main = "", grid = TRUE, cutpoint = NULL)
model.list |
A list of |
burn |
The number of initial MCMC iterations to remove from each model as burn-in. |
filename |
A string. If non-empty string then a pdf of the plot will be saved in the specified file. |
colors |
A vector of colors to use for the different lines in the
plot. If |
lwd |
The width of the lines to be drawn. |
xlab |
Label for the horizontal axis. |
main |
Main title for the plot. |
grid |
Logical. Should gridlines be drawn in the background? |
cutpoint |
Either |
Invisibly returns the matrix of cumulative one-step ahead prediction errors (the lines in the top panel of the plot). Each row in the matrix corresponds to a model in model.list.
Steven L. Scott [email protected]
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) trend.only <- bsts(y, ss, niter = 250) ss <- AddSeasonal(ss, y, nseasons = 12) trend.and.seasonal <- bsts(y, ss, niter = 250) CompareBstsModels(list(trend = trend.only, "trend and seasonal" = trend.and.seasonal)) CompareBstsModels(list(trend = trend.only, "trend and seasonal" = trend.and.seasonal), cutpoint = 100)
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) trend.only <- bsts(y, ss, niter = 250) ss <- AddSeasonal(ss, y, nseasons = 12) trend.and.seasonal <- bsts(y, ss, niter = 250) CompareBstsModels(list(trend = trend.only, "trend and seasonal" = trend.and.seasonal)) CompareBstsModels(list(trend = trend.only, "trend and seasonal" = trend.and.seasonal), cutpoint = 100)
Returns the first and last dates of the influence window for the given holiday, among the given timestamps.
DateRange(holiday, timestamps)
DateRange(holiday, timestamps)
holiday |
An object of class |
timestamps |
A vector of timestamps of class |
Returns a two-column data frame giving the first and last dates
of the influence window for the holiday in the period covered by
timestamps
.
Steven L. Scott [email protected]
holiday <- NamedHoliday("MemorialDay", days.before = 2, days.after = 2) timestamps <- seq.Date(from = as.Date("2001-01-01"), by = "day", length.out = 365 * 10) influence <- DateRange(holiday, timestamps)
holiday <- NamedHoliday("MemorialDay", days.before = 2, days.after = 2) timestamps <- seq.Date(from = as.Date("2001-01-01"), by = "day", length.out = 365 * 10) influence <- DateRange(holiday, timestamps)
Plots for describing time series data.
DayPlot(y, colors = NULL, ylab = NULL, ...) MonthPlot(y, seasonal.identifier = months, colors = NULL, ylab = NULL, ...) YearPlot(y, colors = NULL, ylab = NULL, ylim = NULL, legend = TRUE, ...)
DayPlot(y, colors = NULL, ylab = NULL, ...) MonthPlot(y, seasonal.identifier = months, colors = NULL, ylab = NULL, ...) YearPlot(y, colors = NULL, ylab = NULL, ylim = NULL, legend = TRUE, ...)
y |
A time series to plot. Must be of class |
seasonal.identifier |
A function that takes a vector of class |
colors |
A vector of colors to use for the lines. |
legend |
Logical. If |
ylab |
Label for the vertical axis. |
ylim |
Limits for the vertical axis. (a 2-vector) |
... |
DayPlot
and MonthPlot
plot the time series one season at
a time, on the same set of axes. The intent is to use DayPlot for
daily data and MonthPlot for monthly or quarterly data.
YearPlot
plots each year of the time series as a separate line
on the same set of axes.
Both sets of plots help visualize seasonal patterns.
Returns invisible{NULL}
.
monthplot
is a base R function for plotting time
series of type ts
.
## Plot a 'ts' time series. data(AirPassengers) par(mfrow = c(1,2)) MonthPlot(AirPassengers) YearPlot(AirPassengers) ## Plot a 'zoo' time series. data(turkish) par(mfrow = c(1,2)) YearPlot(turkish) DayPlot(turkish)
## Plot a 'ts' time series. data(AirPassengers) par(mfrow = c(1,2)) MonthPlot(AirPassengers) YearPlot(AirPassengers) ## Plot a 'zoo' time series. data(turkish) par(mfrow = c(1,2)) YearPlot(turkish) DayPlot(turkish)
Diagnostic plots for distributions of residuals.
qqdist(draws, ...) AcfDist(draws, lag.max = NULL, xlab = "Lag", ylab = "Autocorrelation", ...)
qqdist(draws, ...) AcfDist(draws, lag.max = NULL, xlab = "Lag", ylab = "Autocorrelation", ...)
draws |
A matrix of Monte Carlo draws of residual errors. Each row is a Monte Carlo draw, and each column is an observation. In the case of AcfDist successive observations are assumed to be sequential in time. |
lag.max |
The number of lags to plot in the autocorrelation
function. See |
xlab |
Label for the horizontal axis. |
ylab |
Label for the vertical axis. |
... |
Extra arguments passed to either |
qqdist
sorts the columns of draws
by their mean, and
plots the resulting set of curves against the quantiles of the
standard normal distribution. A reference line is added, and the mean
of each column of draws is represented by a blue dot. The dots and
the line are the transpose of what you get with qqnorm
and qqline
.
AcfDist
plots the posterior distribution of the autocorrelation
function using a set of side-by-side boxplots.
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, ss, niter = 500) r <- residuals(model) par(mfrow = c(1,2)) qqdist(r) ## A bit of departure in the upper tail AcfDist(r)
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, ss, niter = 500) r <- residuals(model) par(mfrow = c(1,2)) qqdist(r) ## A bit of departure in the upper tail AcfDist(r)
A dynamic intercept regression is a regression model where the
intercept term is a state space model. This model differs from
bsts
in that there can be multiple observations per time
point.
dirm(formula, state.specification, data, prior = NULL, contrasts = NULL, na.action = na.pass, niter, ping = niter / 10, model.options = DirmModelOptions(), timestamps = NULL, seed = NULL, ...)
dirm(formula, state.specification, data, prior = NULL, contrasts = NULL, na.action = na.pass, niter, ping = niter / 10, model.options = DirmModelOptions(), timestamps = NULL, seed = NULL, ...)
formula |
A formula, as you would supply to |
state.specification |
A list with elements created by
The state specification describes the dynamic intercept term in the regression model. |
data |
An optional data frame, list or environment (or object
coercible by |
prior |
A prior distribution for the regression component of the
model, as created by |
contrasts |
An optional list containing the names of contrast
functions to use when converting factors numeric variables in a
regression formula. This argument works exactly as it does in
|
na.action |
What to do about missing values. The default is to allow missing responses, but no missing predictors. Set this to na.omit or na.exclude if you want to omit missing responses altogether. |
niter |
A positive integer giving the desired number of MCMC draws. |
ping |
A scalar giving the desired frequency of status messages.
If ping > 0 then the program will print a status message to the
screen every |
model.options |
An object created by
|
timestamps |
The timestamp associated with each value of the
response. This is most likely a |
seed |
An integer to use as the random seed for the underlying
C++ code. If |
... |
Extra arguments to be passed to
|
The fitted model is a regression model with an intercept term given by
a structural time series model. This is similar to the model fit by
bsts
, but it allows for multiple observations per time
period.
Currently dirm
only supports Gaussian observation errors, but
look for that to change in future releases.
An object of class bsts
which is a list with the
following components
coefficients |
A |
sigma.obs |
A vector of length |
The returned object will also contain named elements holding the MCMC
draws of model parameters belonging to the state models. The names of
each component are supplied by the entries in
state.specification
. If a model parameter is a scalar, then
the list element is a vector with niter
elements. If the
parameter is a vector then the list element is a matrix with
niter
rows. If the parameter is a matrix then the list element
is a 3-way array with first dimension niter
.
Finally, if a model formula was supplied, then the returned object will contain the information necessary for the predict method to build the design matrix when a new prediction is made.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
George and McCulloch (1997) "Approaches for Bayesian variable selection", Statistica Sinica pp 339–374.
bsts
,
AddLocalLevel
,
AddLocalLinearTrend
,
AddSemilocalLinearTrend
,
AddSeasonal
AddDynamicRegression
SpikeSlabPrior
,
SdPrior
.
SimulateDirmData <- function(observation.sd = 1, trend.sd = .1, time.dimension = 100, nobs.per.period = 3, xdim = 4) { trend <- cumsum(rnorm(time.dimension, 0, trend.sd)) total.sample.size <- nobs.per.period * time.dimension predictors <- matrix(rnorm(total.sample.size * xdim), nrow = total.sample.size) coefficients <- rnorm(xdim) expanded.trend <- rep(trend, each = nobs.per.period) response <- expanded.trend + predictors %*% coefficients + rnorm( total.sample.size, 0, observation.sd) timestamps <- seq.Date(from = as.Date("2008-01-01"), len = time.dimension, by = "day") extended.timestamps <- rep(timestamps, each = nobs.per.period) return(list(response = response, predictors = predictors, timestamps = extended.timestamps, trend = trend, coefficients = coefficients)) } data <- SimulateDirmData(time.dimension = 20) ss <- AddLocalLevel(list(), data$response) # In real life you'd want more than 50 MCMC iterations. model <- dirm(data$response ~ data$predictors, ss, niter = 50, timestamps = data$timestamps)
SimulateDirmData <- function(observation.sd = 1, trend.sd = .1, time.dimension = 100, nobs.per.period = 3, xdim = 4) { trend <- cumsum(rnorm(time.dimension, 0, trend.sd)) total.sample.size <- nobs.per.period * time.dimension predictors <- matrix(rnorm(total.sample.size * xdim), nrow = total.sample.size) coefficients <- rnorm(xdim) expanded.trend <- rep(trend, each = nobs.per.period) response <- expanded.trend + predictors %*% coefficients + rnorm( total.sample.size, 0, observation.sd) timestamps <- seq.Date(from = as.Date("2008-01-01"), len = time.dimension, by = "day") extended.timestamps <- rep(timestamps, each = nobs.per.period) return(list(response = response, predictors = predictors, timestamps = extended.timestamps, trend = trend, coefficients = coefficients)) } data <- SimulateDirmData(time.dimension = 20) ss <- AddLocalLevel(list(), data$response) # In real life you'd want more than 50 MCMC iterations. model <- dirm(data$response ~ data$predictors, ss, niter = 50, timestamps = data$timestamps)
Specify modeling options for a dynamic intercept regression model.
DirmModelOptions(timeout.seconds = Inf, high.dimensional.threshold.factor = 1.0)
DirmModelOptions(timeout.seconds = Inf, high.dimensional.threshold.factor = 1.0)
timeout.seconds |
The overall time budget for model fitting. If the MCMC algorithm takes longer than this number, the current iteration will complete, and then the fitting algorithm will return with however many MCMC iterations were managed during the allotted time. |
high.dimensional.threshold.factor |
When doing Kalman filter updates for the model, Sherman-Morrisson-Woodbury style updates are applied for high dimensional data, while direct linear algebra is used for low dimensional data. The definition of "high dimensional" is relative to the dimension of the state. An observation is considered high dimensional if its dimension exceeds the state dimension times this factor. |
An object of class DirmModelOptions
, which is simply a list
containing values of the function arguments.
The value of using this function instead of making a list "by hand" is that argument types are properly checked, and list names are sure to be correct.
Estimate the time scale used in time series data.
EstimateTimeScale(dates)
EstimateTimeScale(dates)
dates |
A sorted vector of class |
A character string. Either "daily", "weekly", "yearly",
"monthly", "quarterly", or "other". The value is determined based on
counting the number of days between successive observations in dates
.
Steven L. Scott [email protected]
weekly.data <- as.Date(c("2011-10-01", "2011-10-08", "2011-10-15", "2011-10-22", "2011-10-29", "2011-11-05")) EstimateTimeScale(weekly.data) # "weekly" almost.weekly.data <- as.Date(c("2011-10-01", "2011-10-08", "2011-10-15", "2011-10-22", "2011-10-29", "2011-11-06")) # last day is one later EstimateTimeScale(weekly.data) # "other"
weekly.data <- as.Date(c("2011-10-01", "2011-10-08", "2011-10-15", "2011-10-22", "2011-10-29", "2011-11-05")) EstimateTimeScale(weekly.data) # "weekly" almost.weekly.data <- as.Date(c("2011-10-01", "2011-10-08", "2011-10-15", "2011-10-22", "2011-10-29", "2011-11-06")) # last day is one later EstimateTimeScale(weekly.data) # "other"
Pads a vector of dates to a specified length.
ExtendTime(dates, number.of.periods, dt = NULL)
ExtendTime(dates, number.of.periods, dt = NULL)
dates |
An ordered vector of class |
number.of.periods |
The desired length of the output. |
dt |
A character string describing the frequency of the dates in
|
If number.of.periods
is longer than length(dates)
, then
dates
will be padded to the desired length. Extra dates are
added at time intervals matching the average interval in
dates
. Thus they may not be
Steven L. Scott [email protected]
origin.month <- as.Date("2011-09-01") week.ending <- as.Date(c("2011-10-01", ## 1 "2011-10-08", ## 2 "2011-12-03", ## 3 "2011-12-31")) ## 4 MatchWeekToMonth(week.ending, origin.month) == 1:4
origin.month <- as.Date("2011-09-01") week.ending <- as.Date(c("2011-10-01", ## 1 "2011-10-08", ## 2 "2011-12-03", ## 3 "2011-12-31")) ## 4 MatchWeekToMonth(week.ending, origin.month) == 1:4
Tools for checking if a series of timestamps is 'regular' meaning that
it has no duplicates, and no gaps. Checking for regularity can be
tricky. For example, if you have monthly observations with
Date
or POSIXt
timestamps then gaps
between timestamps can be 28, 29, 30, or 31 days, but the series is
still "regular".
NoDuplicates(timestamps) NoGaps(timestamps) IsRegular(timestamps) HasDuplicateTimestamps(bsts.object)
NoDuplicates(timestamps) NoGaps(timestamps) IsRegular(timestamps) HasDuplicateTimestamps(bsts.object)
timestamps |
A set of (possibly irregular or non-unique)
timestamps. This could be a set of integers (like 1, 2, , 3...), a
set of numeric like (1945, 1945.083, 1945.167, ...) indicating years
and fractions of years, a |
bsts.object |
A bsts model object. |
All four functions return scalar logical values. NoDuplicates
returns TRUE
if all elements of timestamps
are unique.
NoGaps
examines the smallest nonzero gap between time points.
As long as no gaps between time points are more than twice as wide as
the smallest gap, it returns TRUE
, indicating that there are no
missing timestamps. Otherwise it returns FALSE
.
IsRegular
returns TRUE
if NoDuplicates
and
NoGaps
both return TRUE
.
HasDuplicateTimestamps
returns FALSE
if the data used to
fit bsts.model either has NULL timestamps, or if the timestamps
contain no duplicate values.
Steven L. Scott [email protected]
first <- as.POSIXct("2015-04-19 08:00:04") monthly <- seq(from = first, length.out = 24, by = "month") IsRegular(monthly) ## TRUE skip.one <- monthly[-8] IsRegular(skip.one) ## FALSE has.duplicates <- monthly has.duplicates[1] <- has.duplicates[2] IsRegular(has.duplicates) ## FALSE
first <- as.POSIXct("2015-04-19 08:00:04") monthly <- seq(from = first, length.out = 24, by = "month") IsRegular(monthly) ## TRUE skip.one <- monthly[-8] IsRegular(skip.one) ## FALSE has.duplicates <- monthly has.duplicates[1] <- has.duplicates[2] IsRegular(has.duplicates) ## FALSE
Annual gross domestic product for 57 countries, as produced by the OECD.
Fields:
LOCATION: Three letter country code.
MEASURE: MLN_USD signifies a total GDP number in millions of US dollars. USD_CAP is per capita GDP in US dollars.
TIME: The year of the measurement.
Value: The measured value.
Flag.Codes: P for provisional data, B for a break in the series, and E for an estimated value.
data(gdp)
data(gdp)
data frame
OECD website: See https://data.oecd.org/gdp/gross-domestic-product-gdp.htm
Create a geometric sequence.
GeometricSequence(length, initial.value = 1, discount.factor = .5)
GeometricSequence(length, initial.value = 1, discount.factor = .5)
length |
A positive integer giving the length of the desired sequence. |
initial.value |
The first term in the sequence. Cannot be zero. |
discount.factor |
The ratio between a sequence term and the preceding term. Cannot be zero. |
A numeric vector containing the desired sequence.
Steven L. Scott [email protected]
GeometricSequence(4, .8, .6) # [1] 0.8000 0.4800 0.2880 0.1728 GeometricSequence(5, 2, 3) # [1] 2 6 18 54 162 ## Not run: GeometricSequence(0, -1, -2) # Error: length > 0 is not TRUE ## End(Not run)
GeometricSequence(4, .8, .6) # [1] 0.8000 0.4800 0.2880 0.1728 GeometricSequence(5, 2, 3) # [1] 2 6 18 54 162 ## Not run: GeometricSequence(0, -1, -2) # Error: length > 0 is not TRUE ## End(Not run)
Returns the fraction of days in a week that occur in the ear
GetFractionOfDaysInInitialMonth(week.ending) GetFractionOfDaysInInitialQuarter(week.ending)
GetFractionOfDaysInInitialMonth(week.ending) GetFractionOfDaysInInitialQuarter(week.ending)
week.ending |
A vector of class |
Returns a numeric vector of the same length as week.ending
.
Each entry gives the fraction of days in the week that occur in the
coarse time interval (month or quarter) containing the start of the
week (i.e the date 6 days before).
Steven L. Scott [email protected]
dates <- as.Date(c("2003-03-31", "2003-04-01", "2003-04-02", "2003-04-03", "2003-04-04", "2003-04-05", "2003-04-06", "2003-04-07")) fraction <- GetFractionOfDaysInInitialMonth(dates) fraction == c(1, 6/7, 5/7, 4/7, 3/7, 2/7, 1/7, 1)
dates <- as.Date(c("2003-03-31", "2003-04-01", "2003-04-02", "2003-04-03", "2003-04-04", "2003-04-05", "2003-04-06", "2003-04-07")) fraction <- GetFractionOfDaysInInitialMonth(dates) fraction == c(1, 6/7, 5/7, 4/7, 3/7, 2/7, 1/7, 1)
Daily closing price of Google stock.
data(goog)
data(goog)
xts time series
The Internets
Given a state space model on a fine scale, the Harvey cumulator aggregates the model to a coarser scale (e.g. from days to weeks, or weeks to months).
HarveyCumulator(fine.series, contains.end, membership.fraction)
HarveyCumulator(fine.series, contains.end, membership.fraction)
fine.series |
The fine-scale time series to be aggregated. |
contains.end |
A logical vector, with length matching
|
membership.fraction |
The fraction of each fine-scale time
observation belonging to the coarse scale time observation at the
beginning of the time interval. For example, if week i started in
March and ended in April, |
Returns a vector containing the course scale partial aggregates
of fine.series
.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
data(goog) days <- factor(weekdays(index(goog)), levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"), ordered = TRUE) ## Because of holidays, etc the days do not always go in sequence. ## (Sorry, Rebecca Black! https://www.youtube.com/watch?v=kfVsfOSbJY0) ## diff.days[i] is the number of days between days[i-1] and days[i]. ## We know that days[i] is the end of a week if diff.days[i] < 0. diff.days <- tail(as.numeric(days), -1) - head(as.numeric(days), -1) contains.end <- c(FALSE, diff.days < 0) goog.weekly <- HarveyCumulator(goog, contains.end, 1)
data(goog) days <- factor(weekdays(index(goog)), levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"), ordered = TRUE) ## Because of holidays, etc the days do not always go in sequence. ## (Sorry, Rebecca Black! https://www.youtube.com/watch?v=kfVsfOSbJY0) ## diff.days[i] is the number of days between days[i-1] and days[i]. ## We know that days[i] is the end of a week if diff.days[i] < 0. diff.days <- tail(as.numeric(days), -1) - head(as.numeric(days), -1) contains.end <- c(FALSE, diff.days < 0) goog.weekly <- HarveyCumulator(goog, contains.end, 1)
Specify holidays for use with holiday state models.
FixedDateHoliday(holiday.name, month = base::month.name, day, days.before = 1, days.after = 1) NthWeekdayInMonthHoliday(holiday.name, month = base::month.name, day.of.week = weekday.names, week.number = 1, days.before = 1, days.after = 1) LastWeekdayInMonthHoliday(holiday.name, month = base::month.name, day.of.week = weekday.names, days.before = 1, days.after = 1) NamedHoliday(holiday.name = named.holidays, days.before = 1, days.after = 1) DateRangeHoliday(holiday.name, start.date, end.date)
FixedDateHoliday(holiday.name, month = base::month.name, day, days.before = 1, days.after = 1) NthWeekdayInMonthHoliday(holiday.name, month = base::month.name, day.of.week = weekday.names, week.number = 1, days.before = 1, days.after = 1) LastWeekdayInMonthHoliday(holiday.name, month = base::month.name, day.of.week = weekday.names, days.before = 1, days.after = 1) NamedHoliday(holiday.name = named.holidays, days.before = 1, days.after = 1) DateRangeHoliday(holiday.name, start.date, end.date)
holiday.name |
A string that can be used to label the holiday in output. |
month |
A string naming the month in which the holiday occurs. Unambiguous partial matches are acceptable. Capitalize the first letter. |
day |
An integer specifying the day of the month on which the
|
day.of.week |
A string giving the day of the week on which the holiday occurs. |
week.number |
An integer specifying the week of the month on
which the |
days.before |
An integer giving the number of days of influence that the holiday exerts prior to the actual holiday. |
days.after |
An integer giving the number of days of influence that holiday exerts after the actual holiday. |
named.holidays |
A character vector containing one or more recognized holiday names. |
start.date |
A vector of starting dates for the holiday. Each instance of the holiday in the training data or the forecast period must be represented by an element in this vector. Thus if this is an annual holiday and, there are 10 years of training data, and a 1-year forecast is needed, then this will be a vector of length 11. |
end.date |
A vector of ending dates for the holiday. Each date
must occur on or after the corresponding element of
|
Each function returns a list containing the information from the
function arguments, formatted as expected by the underlying C++ code.
State models that focus on holidays, such as
AddRandomWalkHoliday
,
AddRegressionHoliday
, and
AddHierarchicalRegressionHoliday
, will expect one or
more holiday objects as arguments.
FixedDateHoliday
describes a holiday that occurs on the
same date each year, like US independence day (July 4).
NthWeekdayInMonthHoliday
describes a holiday that
occurs a particular weekday of a particular week of a particular
month. For example, US Labor Day is the first Monday in
September.
LastWeekdayInMonthHoliday
describes a holiday that
occurs on the last instance of a particular weekday in a
particular month. For example, US Memorial Day is the last Monday
in May.
DateRangeHoliday
describes an irregular holiday that
might not follow a particular pattern. You can handle this type
of holiday by manually specifying a range of dates for each
instance of the holiday in your data set. NOTE: If you plan on
using the model to forecast, be sure to include date ranges in the
forecast period as well as the period covered by the training
data.
NamedHoliday
is a convenience class for describing
several important holidays in the US.
Steven L. Scott [email protected]
AddRandomWalkHoliday
,
AddRegressionHoliday
,
AddHierarchicalRegressionHoliday
july4 <- FixedDateHoliday("July4", "July", 4) memorial.day <- LastWeekdayInMonthHoliday("MemorialDay", "May", "Monday") labor.day <- NthWeekdayInMonthHoliday("LaborDay", "September", "Monday", 1) another.way.to.get.memorial.day <- NamedHoliday("MemorialDay") easter <- NamedHoliday("Easter") winter.olympics <- DateRangeHoliday("WinterOlympicsSince2000", start = as.Date(c("2002-02-08", "2006-02-10", "2010-02-12", "2014-02-07", "2018-02-07")), end = as.Date(c("2002-02-24", "2006-02-26", "2010-02-28", "2014-02-23", "2018-02-25")))
july4 <- FixedDateHoliday("July4", "July", 4) memorial.day <- LastWeekdayInMonthHoliday("MemorialDay", "May", "Monday") labor.day <- NthWeekdayInMonthHoliday("LaborDay", "September", "Monday", 1) another.way.to.get.memorial.day <- NamedHoliday("MemorialDay") easter <- NamedHoliday("Easter") winter.olympics <- DateRangeHoliday("WinterOlympicsSince2000", start = as.Date(c("2002-02-08", "2006-02-10", "2010-02-12", "2014-02-07", "2018-02-07")), end = as.Date(c("2002-02-24", "2006-02-26", "2010-02-28", "2014-02-23", "2018-02-25")))
A weekly time series of US initial claims for unemployment. The first column contains the initial claims numbers from FRED. The others contain a measure of the relative popularity of various search queries identified by Google Correlate.
data(iclaims)
data(iclaims)
zoo time series
FRED. http://research.stlouisfed.org/fred2/series/ICNSA,
Google correlate. http://www.google.com/trends/correlate
data(iclaims) plot(initial.claims)
data(iclaims) plot(initial.claims)
Finds the last day in the month containing a specefied date.
LastDayInMonth(dates)
LastDayInMonth(dates)
dates |
A vector of class |
A vector of class Date
where each entry is the last day
in the month containing the corresponding entry in dates
.
Steven L. Scott [email protected]
inputs <- as.Date(c("2007-01-01", "2007-01-31", "2008-02-01", "2008-02-29", "2008-03-14", "2008-12-01", "2008-12-31")) expected.outputs <- as.Date(c("2007-01-31", "2007-01-31", "2008-02-29", "2008-02-29", "2008-03-31", "2008-12-31", "2008-12-31")) LastDayInMonth(inputs) == expected.outputs
inputs <- as.Date(c("2007-01-01", "2007-01-31", "2008-02-01", "2008-02-29", "2008-03-14", "2008-12-01", "2008-12-31")) expected.outputs <- as.Date(c("2007-01-31", "2007-01-31", "2008-02-29", "2008-02-29", "2008-03-31", "2008-12-31", "2008-12-31")) LastDayInMonth(inputs) == expected.outputs
S3 generic method for MATCH function supplied in the zoo package.
## S3 method for class 'NumericTimestamps' MATCH(x, table, nomatch = NA, ...)
## S3 method for class 'NumericTimestamps' MATCH(x, table, nomatch = NA, ...)
x |
A numeric set of timestamps. |
table |
A set of regular numeric timestamps to match against. |
nomatch |
The value to be returned in the case when no match is found. Note that it is coerced to integer. |
... |
Additional arguments passed to |
Numeric timestamps match if they agree to 8 significant digits.
Returns the index of the entry in table
matched by each
argument in x
. If an entry has no match then nomatch
is
returned at that position.
Returns the index of a month, in a sequence of months, that contains a given week.
MatchWeekToMonth(week.ending, origin.month)
MatchWeekToMonth(week.ending, origin.month)
week.ending |
A vector of class |
origin.month |
A |
The index of the month matching the month containing the first
day in week.ending
. The origin is month 1. It is the caller's
responsibility to ensure that these indices correspond to legal values
in a particular vector of months.
Steven L. Scott [email protected]
origin.month <- as.Date("2011-09-01") week.ending <- as.Date(c("2011-10-01", ## 1 "2011-10-08", ## 2 "2011-12-03", ## 3 "2011-12-31")) ## 4 MatchWeekToMonth(week.ending, origin.month) == 1:4
origin.month <- as.Date("2011-09-01") week.ending <- as.Date(c("2011-10-01", ## 1 "2011-10-08", ## 2 "2011-12-03", ## 3 "2011-12-31")) ## 4 MatchWeekToMonth(week.ending, origin.month) == 1:4
The maximum width of a holiday's influence window
## Default S3 method: MaxWindowWidth(holiday, ...) ## S3 method for class 'DateRangeHoliday' MaxWindowWidth(holiday, ...)
## Default S3 method: MaxWindowWidth(holiday, ...) ## S3 method for class 'DateRangeHoliday' MaxWindowWidth(holiday, ...)
holiday |
An object of class |
... |
Other arguments (not used). |
Returns the number of days in a holiday's influence window.
Steven L. Scott [email protected]
Holiday
.
AddRegressionHoliday
.
AddRandomWalkHoliday
.
AddHierarchicalRegressionHoliday
.
easter <- NamedHoliday("Easter", days.before = 2, days.after = 1) if (MaxWindowWidth(easter) == 4) { print("That's the right answer!\n") } ## This holiday lasts two days longer in 2005 than in 2004. may18 <- DateRangeHoliday("May18", start = as.Date(c("2004-05-17", "2005-05-16")), end = as.Date(c("2004-05-19", "2005-05-20"))) if (MaxWindowWidth(may18) == 5) { print("Right again!\n") }
easter <- NamedHoliday("Easter", days.before = 2, days.after = 1) if (MaxWindowWidth(easter) == 4) { print("That's the right answer!\n") } ## This holiday lasts two days longer in 2005 than in 2004. may18 <- DateRangeHoliday("May18", start = as.Date(c("2004-05-17", "2005-05-16")), end = as.Date(c("2004-05-19", "2005-05-20"))) if (MaxWindowWidth(may18) == 5) { print("Right again!\n") }
Fit a multivariate Bayesian structural time series model, also known as a "dynamic factor model."
** NOTE ** This code is experimental. Please feel free to experiment with it and report any bugs to the maintainer. Expect it to improve substantially in the next release.
mbsts(formula, shared.state.specification, series.state.specification = NULL, data = NULL, timestamps = NULL, series.id = NULL, prior = NULL, # TODO opts = NULL, contrasts = NULL, na.action = na.pass, niter, ping = niter / 10, data.format = c("long", "wide"), seed = NULL, ...)
mbsts(formula, shared.state.specification, series.state.specification = NULL, data = NULL, timestamps = NULL, series.id = NULL, prior = NULL, # TODO opts = NULL, contrasts = NULL, na.action = na.pass, niter, ping = niter / 10, data.format = c("long", "wide"), seed = NULL, ...)
formula |
A formula describing the regression portion of the relationship between y and X. If no regressors are desired then the formula can be replaced by a numeric matrix giving the multivariate time series to be modeled. |
shared.state.specification |
A list with elements created by
This list defines the components of state which are shared across all time series. These are the "factors" in the dynamic factor model. |
series.state.specification |
This argument specifies state
components needed by a particular series. Not all series need have
the same state components (e.g. some series may require a seasonal
component, while others do not). It can be It can be a list of elements created by In its most general form, this argument can be a list of lists, some of which can be NULL, but with non-NULL lists specifying state components for individual series, as above. |
data |
An optional data frame, list or environment (or object
coercible by |
timestamps |
A vector of timestamps indicating the time of each
observation. If TODO: TEST THIS under wide and long formats in regression and non-regression settings. |
series.id |
A factor (or object coercible to factor) indicating the series to which each observation in "long" format belongs. This argument is ignored for data in "wide" format. |
prior |
A list of |
opts |
A list containing model options. This is currently only
used for debugging, so leave this as |
contrasts |
An optional list containing the names of contrast
functions to use when converting factors numeric variables in a
regression formula. This argument works exactly as it does in
|
na.action |
What to do about missing values. The default is to allow missing responses, but no missing predictors. Set this to na.omit or na.exclude if you want to omit missing responses altogether. |
niter |
A positive integer giving the desired number of MCMC draws. |
ping |
A scalar giving the desired frequency of status messages.
If ping > 0 then the program will print a status message to the
screen every |
data.format |
Whether the data are store in wide (each row is a
time point, and columns are values from different series) or long
(each row is the value of a particular series at a particular point
in time) format. For |
seed |
An integer to use as the random seed for the underlying
C++ code. If |
... |
Extra arguments to be passed to
|
An object of class mbsts
which is a list with the
following components
coefficients |
A |
sigma.obs |
A vector of length |
The returned object will also contain named elements holding the MCMC
draws of model parameters belonging to the state models. The names of
each component are supplied by the entries in
state.specification
. If a model parameter is a scalar, then
the list element is a vector with niter
elements. If the
parameter is a vector then the list element is a matrix with
niter
rows. If the parameter is a matrix then the list element
is a 3-way array with first dimension niter
.
Finally, if a model formula was supplied, then the returned object will contain the information necessary for the predict method to build the design matrix when a new prediction is made.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
George and McCulloch (1997) "Approaches for Bayesian variable selection", Statistica Sinica pp 339–374.
bsts
,
AddLocalLevel
,
AddLocalLinearTrend
,
AddSemilocalLinearTrend
,
AddSeasonal
AddDynamicRegression
SpikeSlabPrior
,
SdPrior
.
## Not run: # This example takes 12s on Windows, which is longer than CRAN's 10s # limit. Marking code as 'dontrun' to prevent CRAN auto checks from # timing out. seed <- 8675309 set.seed(seed) ntimes <- 250 nseries <- 20 nfactors <- 6 residual.sd <- 1.2 state.innovation.sd <- .75 ##--------------------------------------------------------------------------- ## simulate latent state for fake data. ##--------------------------------------------------------------------------- state <- matrix(rnorm(ntimes * nfactors, 0, state.innovation.sd), nrow = ntimes) for (i in 1:ncol(state)) { state[, i] <- cumsum(state[, i]) } ##--------------------------------------------------------------------------- ## Simulate "observed" data from state. ##--------------------------------------------------------------------------- observation.coefficients <- matrix(rnorm(nseries * nfactors), nrow = nseries) diag(observation.coefficients) <- 1.0 observation.coefficients[upper.tri(observation.coefficients)] <- 0 errors <- matrix(rnorm(nseries * ntimes, 0, residual.sd), ncol = ntimes) y <- t(observation.coefficients %*% t(state) + errors) ##--------------------------------------------------------------------------- ## Plot the data. ##--------------------------------------------------------------------------- par(mfrow=c(1,2)) plot.ts(y, plot.type="single", col = rainbow(nseries), main = "observed data") plot.ts(state, plot.type = "single", col = 1:nfactors, main = "latent state") ##--------------------------------------------------------------------------- ## Fit the model ##--------------------------------------------------------------------------- ss <- AddSharedLocalLevel(list(), y, nfactors = nfactors) opts <- list("fixed.state" = t(state), fixed.residual.sd = rep(residual.sd, nseries), fixed.regression.coefficients = matrix(rep(0, nseries), ncol = 1)) model <- mbsts(y, shared.state.specification = ss, niter = 100, data.format = "wide", seed = seed) ##--------------------------------------------------------------------------- ## Plot the state ##--------------------------------------------------------------------------- par(mfrow=c(1, nfactors)) ylim <- range(model$shared.state, state) for (j in 1:nfactors) { PlotDynamicDistribution(model$shared.state[, j, ], ylim=ylim) lines(state[, j], col = "blue") } ##--------------------------------------------------------------------------- ## Plot the factor loadings. ##--------------------------------------------------------------------------- opar <- par(mfrow=c(nfactors,1), mar=c(0, 4, 0, 4), omi=rep(.25, 4)) burn <- 10 for(j in 1:nfactors) { BoxplotTrue(model$shared.local.level.coefficients[-(1:burn), j, ], t(observation.coefficients)[, j], axes=F, truth.color="blue") abline(h=0, lty=3) box() axis(2) } axis(1) par(opar) ##--------------------------------------------------------------------------- ## Plot the predicted values of the series. ##--------------------------------------------------------------------------- index <- 1:12 nr <- floor(sqrt(length(index))) nc <- ceiling(length(index) / nr) opar <- par(mfrow = c(nr, nc), mar = c(2, 4, 1, 2)) for (i in index) { PlotDynamicDistribution( model$shared.state.contributions[, 1, i, ] + model$regression.coefficients[, i, 1] , ylim=range(y)) points(y[, i], col="blue", pch = ".", cex = .2) } par(opar) # next line closes 'dontrun' ## End(Not run) # next line closes 'examples'
## Not run: # This example takes 12s on Windows, which is longer than CRAN's 10s # limit. Marking code as 'dontrun' to prevent CRAN auto checks from # timing out. seed <- 8675309 set.seed(seed) ntimes <- 250 nseries <- 20 nfactors <- 6 residual.sd <- 1.2 state.innovation.sd <- .75 ##--------------------------------------------------------------------------- ## simulate latent state for fake data. ##--------------------------------------------------------------------------- state <- matrix(rnorm(ntimes * nfactors, 0, state.innovation.sd), nrow = ntimes) for (i in 1:ncol(state)) { state[, i] <- cumsum(state[, i]) } ##--------------------------------------------------------------------------- ## Simulate "observed" data from state. ##--------------------------------------------------------------------------- observation.coefficients <- matrix(rnorm(nseries * nfactors), nrow = nseries) diag(observation.coefficients) <- 1.0 observation.coefficients[upper.tri(observation.coefficients)] <- 0 errors <- matrix(rnorm(nseries * ntimes, 0, residual.sd), ncol = ntimes) y <- t(observation.coefficients %*% t(state) + errors) ##--------------------------------------------------------------------------- ## Plot the data. ##--------------------------------------------------------------------------- par(mfrow=c(1,2)) plot.ts(y, plot.type="single", col = rainbow(nseries), main = "observed data") plot.ts(state, plot.type = "single", col = 1:nfactors, main = "latent state") ##--------------------------------------------------------------------------- ## Fit the model ##--------------------------------------------------------------------------- ss <- AddSharedLocalLevel(list(), y, nfactors = nfactors) opts <- list("fixed.state" = t(state), fixed.residual.sd = rep(residual.sd, nseries), fixed.regression.coefficients = matrix(rep(0, nseries), ncol = 1)) model <- mbsts(y, shared.state.specification = ss, niter = 100, data.format = "wide", seed = seed) ##--------------------------------------------------------------------------- ## Plot the state ##--------------------------------------------------------------------------- par(mfrow=c(1, nfactors)) ylim <- range(model$shared.state, state) for (j in 1:nfactors) { PlotDynamicDistribution(model$shared.state[, j, ], ylim=ylim) lines(state[, j], col = "blue") } ##--------------------------------------------------------------------------- ## Plot the factor loadings. ##--------------------------------------------------------------------------- opar <- par(mfrow=c(nfactors,1), mar=c(0, 4, 0, 4), omi=rep(.25, 4)) burn <- 10 for(j in 1:nfactors) { BoxplotTrue(model$shared.local.level.coefficients[-(1:burn), j, ], t(observation.coefficients)[, j], axes=F, truth.color="blue") abline(h=0, lty=3) box() axis(2) } axis(1) par(opar) ##--------------------------------------------------------------------------- ## Plot the predicted values of the series. ##--------------------------------------------------------------------------- index <- 1:12 nr <- floor(sqrt(length(index))) nc <- ceiling(length(index) / nr) opar <- par(mfrow = c(nr, nc), mar = c(2, 4, 1, 2)) for (i in index) { PlotDynamicDistribution( model$shared.state.contributions[, 1, i, ] + model$regression.coefficients[, i, 1] , ylim=range(y)) points(y[, i], col="blue", pch = ".", cex = .2) } par(opar) # next line closes 'dontrun' ## End(Not run) # next line closes 'examples'
Fit a structured time series to mixed frequncy data.
bsts.mixed(target.series, predictors, which.coarse.interval, membership.fraction, contains.end, state.specification, regression.prior, niter, ping = niter / 10, seed = NULL, truth = NULL, ...)
bsts.mixed(target.series, predictors, which.coarse.interval, membership.fraction, contains.end, state.specification, regression.prior, niter, ping = niter / 10, seed = NULL, truth = NULL, ...)
target.series |
A vector object of class |
predictors |
A matrix of class |
which.coarse.interval |
A numeric vector of length
|
membership.fraction |
A numeric vector of length
|
contains.end |
A logical vector of length |
state.specification |
A state specification like that required
by |
regression.prior |
A prior distribution created by
|
niter |
The desired number of MCMC iterations. |
ping |
An integer indicating the frequency with which progress
reports get printed. E.g. setting |
seed |
An integer to use as the random seed for the underlying
C++ code. If |
truth |
For debugging purposes only. A list containing one or more of the following elements. If any are present then corresponding values will be held fixed in the MCMC algorithm.
|
... |
Extra arguments passed to SpikeSlabPrior |
An object of class bsts.mixed
, which is a list with the
following elements. Many of these are arrays, in which case the first
index of the array corresponds to the MCMC iteration number.
coefficients |
A matrix containing the MCMC draws of the regression coefficients. Rows correspond to MCMC draws, and columns correspond to variables. |
sigma.obs |
The standard deviation of the weekly latent observations. |
state.contributions |
A three-dimensional array containing the MCMC draws of each state model's contributions to the state of the weekly model. The three dimensions are MCMC iteration, state model, and week number. |
weekly |
A matrix of MCMC draws of the weekly latent observations. Rows are MCMC iterations, and columns are weekly time points. |
cumulator |
A matrix of MCMC draws of the cumulator variable. |
The returned object also contains MCMC draws for the parameters of the
state models supplied as part of state.specification
, relevant
information passed to the function call, and other supplemental
information.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
bsts
,
AddLocalLevel
,
AddLocalLinearTrend
,
AddSemilocalLinearTrend
,
SpikeSlabPrior
,
SdPrior
.
## Not run: data <- SimulateFakeMixedFrequencyData(nweeks = 104, xdim = 20) ## Setting an upper limit on the standard deviations can help keep the ## MCMC from flying off to infinity. sd.limit <- sd(data$coarse.target) state.specification <- AddLocalLinearTrend(list(), data$coarse.target, level.sigma.prior = SdPrior(1.0, 5, upper.limit = sd.limit), slope.sigma.prior = SdPrior(.5, 5, upper.limit = sd.limit)) weeks <- index(data$predictor) months <- index(data$coarse.target) which.month <- MatchWeekToMonth(weeks, months[1]) membership.fraction <- GetFractionOfDaysInInitialMonth(weeks) contains.end <- WeekEndsMonth(weeks) model <- bsts.mixed(target.series = data$coarse.target, predictors = data$predictors, membership.fraction = membership.fraction, contains.end = contains.end, which.coarse = which.month, state.specification = state.specification, niter = 500, expected.r2 = .999, prior.df = 1) plot(model, "state") plot(model, "components") ## End(Not run)
## Not run: data <- SimulateFakeMixedFrequencyData(nweeks = 104, xdim = 20) ## Setting an upper limit on the standard deviations can help keep the ## MCMC from flying off to infinity. sd.limit <- sd(data$coarse.target) state.specification <- AddLocalLinearTrend(list(), data$coarse.target, level.sigma.prior = SdPrior(1.0, 5, upper.limit = sd.limit), slope.sigma.prior = SdPrior(.5, 5, upper.limit = sd.limit)) weeks <- index(data$predictor) months <- index(data$coarse.target) which.month <- MatchWeekToMonth(weeks, months[1]) membership.fraction <- GetFractionOfDaysInInitialMonth(weeks) contains.end <- WeekEndsMonth(weeks) model <- bsts.mixed(target.series = data$coarse.target, predictors = data$predictors, membership.fraction = membership.fraction, contains.end = contains.end, which.coarse = which.month, state.specification = state.specification, niter = 500, expected.r2 = .999, prior.df = 1) plot(model, "state") plot(model, "components") ## End(Not run)
The (integer) number of months between dates.
MonthDistance(dates, origin)
MonthDistance(dates, origin)
dates |
A vector of class |
origin |
A scalar of class |
Returns a numeric vector giving the integer number of months
that have elapsed between origin
and each element in
dates
. The daily component of each date is ignored, so two
dates that are in the same month will have the same measured
distance. Distances are signed, so months that occur before
origin
will have negative values.
Steven L. Scott [email protected]
dates <- as.Date(c("2008-04-17", "2008-05-01", "2008-05-31", "2008-06-01")) origin <- as.Date("2008-05-15") MonthDistance(dates, origin) == c(-1, 0, 0, 1)
dates <- as.Date(c("2008-04-17", "2008-05-01", "2008-05-31", "2008-06-01")) origin <- as.Date("2008-05-15") MonthDistance(dates, origin) == c(-1, 0, 0, 1)
A character vector listing the names of pre-specified holidays.
named.holidays
named.holidays
"NewYearsDay" "SuperBowlSunday" "MartinLutherKingDay" "PresidentsDay" "ValentinesDay" "SaintPatricksDay" "USDaylightSavingsTimeBegins" "USDaylightSavingsTimeEnds" "EasterSunday" "USMothersDay" "IndependenceDay" "LaborDay" "ColumbusDay" "Halloween" "Thanksgiving" "MemorialDay" "VeteransDay" "Christmas"
The first column, HSN1FNSA is a time series of new home sales in the US, obtained from the FRED online data base. The series has been manually deseasonalized. The remaining columns contain search terms from Google trends (obtained from http://trends.google.com/correlate). These show the relative popularity of each search term among all serach terms typed into Google. All series in this data set have been standardized by subtracting off their mean and dividing by their standard deviation.
data(new.home.sales)
data(new.home.sales)
zoo time series
FRED and trends.google.com
Computes the one-step-ahead prediction errors for a bsts
model.
bsts.prediction.errors(bsts.object, cutpoints = NULL, burn = SuggestBurn(.1, bsts.object), standardize = FALSE)
bsts.prediction.errors(bsts.object, cutpoints = NULL, burn = SuggestBurn(.1, bsts.object), standardize = FALSE)
bsts.object |
An object of class |
cutpoints |
An increasing sequence of integers between 1 and the
number of time points in the trainig data for |
burn |
An integer giving the number of MCMC iterations to discard
as burn-in. If |
standardize |
Logical. If |
Returns the posterior distribution of the one-step-ahead prediction errors from the bsts.object. The errors are computing using the Kalman filter, and are of two types.
Purely in-sample errors are computed as a by-product of the Kalman
filter as a result of fitting the model. These are stored in the
bsts.object assuming the save.prediction.errors
option is TRUE,
which is the default (See BstsOptions
). The in-sample
errors are 'in-sample' in the sense that the parameter values used to
run the Kalman filter are drawn from their posterior distribution given
complete data. Conditional on the parameters in that MCMC iteration,
each 'error' is the difference between the observed y[t] and its
expectation given data to t-1.
Purely out-of-sample errors can be computed by specifying the 'cutpoints' argument. If cutpoints are supplied then a separate MCMC is run using just data up to the cutpoint. The Kalman filter is then run on the remaining data, again finding the difference between y[t] and its expectation given data to t-1, but conditional on parameters estimated using data up to the cutpoint.
A matrix of draws of the one-step-ahead prediction errors. Rows of the matrix correspond to MCMC draws. Columns correspond to time.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
bsts
,
AddLocalLevel
,
AddLocalLinearTrend
,
AddSemilocalLinearTrend
,
SpikeSlabPrior
,
SdPrior
.
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) ## Not run: model <- bsts(y, state.specification = ss, niter = 500) ## End(Not run) errors <- bsts.prediction.errors(model, burn = 100) PlotDynamicDistribution(errors$in.sample) ## Compute out of sample prediction errors beyond times 80 and 120. errors <- bsts.prediction.errors(model, cutpoints = c(80, 120)) standardized.errors <- bsts.prediction.errors( model, cutpoints = c(80, 120), standardize = TRUE) plot(model, "prediction.errors", cutpoints = c(80, 120)) str(errors) ## three matrices, with 400 ( = 500 - 100) rows ## and length(y) columns
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) ## Not run: model <- bsts(y, state.specification = ss, niter = 500) ## End(Not run) errors <- bsts.prediction.errors(model, burn = 100) PlotDynamicDistribution(errors$in.sample) ## Compute out of sample prediction errors beyond times 80 and 120. errors <- bsts.prediction.errors(model, cutpoints = c(80, 120)) standardized.errors <- bsts.prediction.errors( model, cutpoints = c(80, 120), standardize = TRUE) plot(model, "prediction.errors", cutpoints = c(80, 120)) str(errors) ## three matrices, with 400 ( = 500 - 100) rows ## and length(y) columns
Functions to plot the results of a model fit using
bsts
.
## S3 method for class 'bsts' plot(x, y = c("state", "components", "residuals", "coefficients", "prediction.errors", "forecast.distribution", "predictors", "size", "dynamic", "seasonal", "monthly", "help"), ...) PlotBstsCoefficients(bsts.object, burn = SuggestBurn(.1, bsts.object), inclusion.threshold = 0, number.of.variables = NULL, ...) PlotBstsComponents(bsts.object, burn = SuggestBurn(.1, bsts.object), time, same.scale = TRUE, layout = c("square", "horizontal", "vertical"), style = c("dynamic", "boxplot"), ylim = NULL, components = 1:length(bsts.object$state.specification), ...) PlotDynamicRegression(bsts.object, burn = SuggestBurn(.1, bsts.object), time = NULL, same.scale = FALSE, style = c("dynamic", "boxplot"), layout = c("square", "horizontal", "vertical"), ylim = NULL, zero.width = 2, zero.color = "green", ...) PlotBstsState(bsts.object, burn = SuggestBurn(.1, bsts.object), time, show.actuals = TRUE, style = c("dynamic", "boxplot"), scale = c("linear", "mean"), ylim = NULL, ...) PlotBstsResiduals(bsts.object, burn = SuggestBurn(.1, bsts.object), time, style = c("dynamic", "boxplot"), means = TRUE, ...) PlotBstsPredictionErrors(bsts.object, cutpoints = NULL, burn = SuggestBurn(.1, bsts.object), style = c("dynamic", "boxplot"), xlab = "Time", ylab = "", main = "", ...) PlotBstsForecastDistribution(bsts.object, cutpoints = NULL, burn = SuggestBurn(.1, bsts.object), style = c("dynamic", "boxplot"), xlab = "Time", ylab = "", main = "", show.actuals = TRUE, col.actuals = "blue", ...) PlotBstsSize(bsts.object, burn = SuggestBurn(.1, bsts.object), style = c("histogram", "ts"), ...) PlotSeasonalEffect(bsts.object, nseasons = 7, season.duration = 1, same.scale = TRUE, ylim = NULL, get.season.name = NULL, burn = SuggestBurn(.1, bsts.object), ...) PlotMonthlyAnnualCycle(bsts.object, ylim = NULL, same.scale = TRUE, burn = SuggestBurn(.1, bsts.object), ...)
## S3 method for class 'bsts' plot(x, y = c("state", "components", "residuals", "coefficients", "prediction.errors", "forecast.distribution", "predictors", "size", "dynamic", "seasonal", "monthly", "help"), ...) PlotBstsCoefficients(bsts.object, burn = SuggestBurn(.1, bsts.object), inclusion.threshold = 0, number.of.variables = NULL, ...) PlotBstsComponents(bsts.object, burn = SuggestBurn(.1, bsts.object), time, same.scale = TRUE, layout = c("square", "horizontal", "vertical"), style = c("dynamic", "boxplot"), ylim = NULL, components = 1:length(bsts.object$state.specification), ...) PlotDynamicRegression(bsts.object, burn = SuggestBurn(.1, bsts.object), time = NULL, same.scale = FALSE, style = c("dynamic", "boxplot"), layout = c("square", "horizontal", "vertical"), ylim = NULL, zero.width = 2, zero.color = "green", ...) PlotBstsState(bsts.object, burn = SuggestBurn(.1, bsts.object), time, show.actuals = TRUE, style = c("dynamic", "boxplot"), scale = c("linear", "mean"), ylim = NULL, ...) PlotBstsResiduals(bsts.object, burn = SuggestBurn(.1, bsts.object), time, style = c("dynamic", "boxplot"), means = TRUE, ...) PlotBstsPredictionErrors(bsts.object, cutpoints = NULL, burn = SuggestBurn(.1, bsts.object), style = c("dynamic", "boxplot"), xlab = "Time", ylab = "", main = "", ...) PlotBstsForecastDistribution(bsts.object, cutpoints = NULL, burn = SuggestBurn(.1, bsts.object), style = c("dynamic", "boxplot"), xlab = "Time", ylab = "", main = "", show.actuals = TRUE, col.actuals = "blue", ...) PlotBstsSize(bsts.object, burn = SuggestBurn(.1, bsts.object), style = c("histogram", "ts"), ...) PlotSeasonalEffect(bsts.object, nseasons = 7, season.duration = 1, same.scale = TRUE, ylim = NULL, get.season.name = NULL, burn = SuggestBurn(.1, bsts.object), ...) PlotMonthlyAnnualCycle(bsts.object, ylim = NULL, same.scale = TRUE, burn = SuggestBurn(.1, bsts.object), ...)
x |
An object of class |
bsts.object |
An object of class |
y |
A character string indicating the aspect of the model that should be plotted. |
burn |
The number of MCMC iterations to discard as burn-in. |
col.actuals |
The color to use for the actual data when comparing actuals vs forecasts. |
components |
A numeric vector indicating which components to plot. Component indices correspond to elements of the state specification that was used to build the bsts model being plotted. |
cutpoints |
A numeric vector of integers, or |
get.season.name |
A function that can be used to infer the title
of each seasonal plot. It should take a single |
inclusion.threshold |
An inclusion probability that individual
coefficients must exceed in order to be displayed when |
layout |
For controlling the layout of functions that generate mutiple plots. |
main |
Main title for the plot. |
means |
Logical. If TRUE then the mean of each residual is plotted as a blue dot. If false only the distribution of the residuals is plotted. |
nseasons |
If there is only one seasonal component in the model,
this argument is ignored. If there are multiple seasonal
components then |
number.of.variables |
If non- |
same.scale |
Logical. If |
scale |
The scale on which to plot the state. If the error family is "logit" or "poisson" then the state can either be plotted on the scale of the linear predictor (e.g. trend + seasonal + regression) or the linear predictor can be passed through the link function so as to plot the distribution of the conditional mean. |
season.duration |
If there is only one seasonal component in the
model, this argument is ignored. If there are multiple seasonal
components then |
show.actuals |
Logical. If |
style |
The desired plot style. Partial matching is allowed, so "dyn" would match "dynamic", for example. |
time |
An optional vector of values to plot against. If missing, the default is to diagnose the time scale of the original time series. |
xlab |
Label for the horizontal axis. |
ylab |
Label for the vertical axis. |
ylim |
Limits for the vertical axis. If |
zero.width |
A numerical value for the width of the reference
line at zero. If |
zero.color |
A color for the width of the reference line at zero.
If |
... |
Additional arguments to be passed to
|
PlotBstsState
, PlotBstsComponents
, and
PlotBstsResiduals
all produce dynamic distribution
plots. PlotBstsState
plots the aggregate state
contribution (including regression effects) to the mean, while
PlotBstsComponents
plots the contribution of each state
component. PlotBstsResiduals
plots the posterior
distribution of the residuals given complete data (i.e. looking
forward and backward in time). PlotBstsPredictionErrors
plots filtering errors (i.e. the one-step-ahead prediction errors
given data up to the previous time point).
PlotBstsForecastDistribution
plots the one-step-ahead
forecasts instead of the prediction errors.
PlotBstsCoefficients
creates a significance plot for
the predictors used in the state space regression model. It is
obviously not useful for models with no regressors.
PlotBstsSize
plots the distribution of the number of
predictors included in the model.
PlotSeasonalEffect
generates an array of plots showing
how the distibution of the seasonal effect changes, for each season,
for models that include a seasonal state component.
PlotMonthlyAnnualCycle
produces an array of plots much
like PlotSeasonalEffect
, for models that include a
MonthlyAnnualCycle
state component.
These functions are called for their side effect, which is to produce a plot on the current graphics device.
PlotBstsState
invisibly returns the state object being plotted.
bsts
PlotDynamicDistribution
plot.lm.spike
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) plot(model, burn = 100) plot(model, "residuals", burn = 100) plot(model, "components", burn = 100) plot(model, "forecast.distribution", burn = 100)
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) plot(model, burn = 100) plot(model, "residuals", burn = 100) plot(model, "components", burn = 100) plot(model, "forecast.distribution", burn = 100)
Functions for plotting the output of a mixed frequency time series regression.
## S3 method for class 'bsts.mixed' plot(x, y = c("state", "components", "coefficients", "predictors", "size"), ...) PlotBstsMixedState(bsts.mixed.object, burn = SuggestBurn(.1, bsts.mixed.object), time = NULL, fine.scale = FALSE, style = c("dynamic", "boxplot"), trim.left = NULL, trim.right = NULL, ...) PlotBstsMixedComponents(bsts.mixed.object, burn = SuggestBurn(.1, bsts.mixed.object), time = NULL, same.scale = TRUE, fine.scale = FALSE, style = c("dynamic", "boxplot"), layout = c("square", "horizontal", "vertical"), ylim = NULL, trim.left = NULL, trim.right = NULL, ...)
## S3 method for class 'bsts.mixed' plot(x, y = c("state", "components", "coefficients", "predictors", "size"), ...) PlotBstsMixedState(bsts.mixed.object, burn = SuggestBurn(.1, bsts.mixed.object), time = NULL, fine.scale = FALSE, style = c("dynamic", "boxplot"), trim.left = NULL, trim.right = NULL, ...) PlotBstsMixedComponents(bsts.mixed.object, burn = SuggestBurn(.1, bsts.mixed.object), time = NULL, same.scale = TRUE, fine.scale = FALSE, style = c("dynamic", "boxplot"), layout = c("square", "horizontal", "vertical"), ylim = NULL, trim.left = NULL, trim.right = NULL, ...)
x |
An object of class |
bsts.mixed.object |
An object of class |
y |
A character string indicating the aspect of the model that should be plotted. |
burn |
The number of MCMC iterations to discard as burn-in. |
time |
An optional vector of values to plot against. If missing, the default is to obtain the time scale from the original time series. |
fine.scale |
Logical. If |
same.scale |
Logical. If |
style |
character. If "dynamic" then a dynamic distribution plot will be shown. If "box" then boxplots will be shown. |
layout |
A character string indicating whether the plots showing components of state should be laid out in a square, horizontally, or vertically. |
trim.left |
A logical indicating whether the first (presumedly partial) observation in the aggregated state time series should be removed. |
trim.right |
A logical indicating whether the last (presumedly partial) observation in the aggregated state time series should be removed. |
ylim |
Limits for the vertical axis. Optional. |
... |
Additional arguments to be passed to
|
PlotBstsMixedState
plots the aggregate state
contribution (including regression effects) to the mean, while
PlotBstsComponents
plots the contribution of each state
component separately. PlotBstsCoefficients
creates a
significance plot for the predictors used in the state space
regression model.
These functions are called for their side effect, which is to produce a plot on the current graphics device.
bsts.mixed
PlotDynamicDistribution
plot.lm.spike
PlotBstsSize
## Not run: ## This example is flaky and needs to be fixed data <- SimulateFakeMixedFrequencyData(nweeks = 104, xdim = 20) state.specification <- AddLocalLinearTrend(list(), data$coarse.target) weeks <- index(data$predictor) months <- index(data$coarse.target) which.month <- MatchWeekToMonth(weeks, months[1]) membership.fraction <- GetFractionOfDaysInInitialMonth(weeks) contains.end <- WeekEndsMonth(weeks) model <- bsts.mixed(target.series = data$coarse.target, predictors = data$predictors, membership.fraction = membership.fraction, contains.end = contains.end, which.coarse = which.month, state.specification = state.specification, niter = 500) plot(model, "state") plot(model, "components") ## End(Not run)
## Not run: ## This example is flaky and needs to be fixed data <- SimulateFakeMixedFrequencyData(nweeks = 104, xdim = 20) state.specification <- AddLocalLinearTrend(list(), data$coarse.target) weeks <- index(data$predictor) months <- index(data$coarse.target) which.month <- MatchWeekToMonth(weeks, months[1]) membership.fraction <- GetFractionOfDaysInInitialMonth(weeks) contains.end <- WeekEndsMonth(weeks) model <- bsts.mixed(target.series = data$coarse.target, predictors = data$predictors, membership.fraction = membership.fraction, contains.end = contains.end, which.coarse = which.month, state.specification = state.specification, niter = 500) plot(model, "state") plot(model, "components") ## End(Not run)
Plot the posterior predictive distribution from a
bsts
prediction object.
## S3 method for class 'bsts.prediction' plot(x, y = NULL, burn = 0, plot.original = TRUE, median.color = "blue", median.type = 1, median.width = 3, interval.quantiles = c(.025, .975), interval.color = "green", interval.type = 2, interval.width = 2, style = c("dynamic", "boxplot"), ylim = NULL, ...)
## S3 method for class 'bsts.prediction' plot(x, y = NULL, burn = 0, plot.original = TRUE, median.color = "blue", median.type = 1, median.width = 3, interval.quantiles = c(.025, .975), interval.color = "green", interval.type = 2, interval.width = 2, style = c("dynamic", "boxplot"), ylim = NULL, ...)
x |
An object of class |
y |
A dummy argument necessary to match the signature of the
|
plot.original |
Logical or numeric. If |
burn |
The number of observations you wish to discard as burn-in
from the posterior predictive distribution. This is in addition
to the burn-in discarded using |
median.color |
The color to use for the posterior median of the prediction. |
median.type |
The type of line (lty) to use for the posterior median of the prediction. |
median.width |
The width of line (lwd) to use for the posterior median of the prediction. |
interval.quantiles |
The lower and upper limits of the credible interval to be plotted. |
interval.color |
The color to use for the upper and lower limits of the 95% credible interval for the prediction. |
interval.type |
The type of line (lty) to use for the upper and lower limits of the 95% credible inerval for of the prediction. |
interval.width |
The width of line (lwd) to use for the upper and lower limits of the 95% credible inerval for of the prediction. |
style |
Either "dynamic", for dynamic distribution plots, or "boxplot", for box plots. Partial matching is allowed, so "dyn" or "box" would work, for example. |
ylim |
Limits on the vertical axis. |
... |
Extra arguments to be passed to
|
Plots the posterior predictive distribution described by
x
using a dynamic distribution plot generated by
PlotDynamicDistribution
. Overlays the
posterior median and 95% prediction limits for the predictive
distribution.
Returns NULL.
bsts
PlotDynamicDistribution
plot.lm.spike
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
Creates a time series plot showing the most likely
predictors of a time series used to fit a bsts
object.
PlotBstsPredictors(bsts.object, burn = SuggestBurn(.1, bsts.object), inclusion.threshold = .1, ylim = NULL, flip.signs = TRUE, show.legend = TRUE, grayscale = TRUE, short.names = TRUE, ...)
PlotBstsPredictors(bsts.object, burn = SuggestBurn(.1, bsts.object), inclusion.threshold = .1, ylim = NULL, flip.signs = TRUE, show.legend = TRUE, grayscale = TRUE, short.names = TRUE, ...)
bsts.object |
An object of class |
burn |
The number of observations you wish to discard as burn-in. |
inclusion.threshold |
Plot predictors with marginal inclusion probabilities above this threshold. |
ylim |
Scale for the vertical axis. |
flip.signs |
If true then a predictor with a negative sign will be flipped before being plotted, to better align visually with the target series. |
show.legend |
Should a legend be shown indicating which predictors are plotted? |
grayscale |
Logical. If |
short.names |
Logical. If |
... |
Extra arguments to be passed to |
bsts
PlotDynamicDistribution
plot.lm.spike
data(AirPassengers) y <- log(AirPassengers) lag.y <- c(NA, head(y, -1)) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) ## Call bsts with na.action = na.omit to omit the leading NA in lag.y model <- bsts(y ~ lag.y, state.specification = ss, niter = 500, na.action = na.omit) plot(model, "predictors")
data(AirPassengers) y <- log(AirPassengers) lag.y <- c(NA, head(y, -1)) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) ## Call bsts with na.action = na.omit to omit the leading NA in lag.y model <- bsts(y ~ lag.y, state.specification = ss, niter = 500, na.action = na.omit) plot(model, "predictors")
Plot the estimated effect of the given holiday.
PlotHoliday(holiday, model, show.raw.data = TRUE, ylim = NULL, ...)
PlotHoliday(holiday, model, show.raw.data = TRUE, ylim = NULL, ...)
holiday |
An object of class |
model |
A model fit by |
show.raw.data |
Logical indicating if the raw data corresponding
to |
ylim |
Limits on the vertical axis of the plots. |
... |
Extra arguments passed to |
Returns invisible{NULL}
.
trend <- cumsum(rnorm(730, 0, .1)) dates <- seq.Date(from = as.Date("2014-01-01"), length = length(trend), by = "day") y <- zoo(trend + rnorm(length(trend), 0, .2), dates) AddHolidayEffect <- function(y, dates, effect) { ## Adds a holiday effect to simulated data. ## Args: ## y: A zoo time series, with Dates for indices. ## dates: The dates of the holidays. ## effect: A vector of holiday effects of odd length. The central effect is ## the main holiday, with a symmetric influence window on either side. ## Returns: ## y, with the holiday effects added. time <- dates - (length(effect) - 1) / 2 for (i in 1:length(effect)) { y[time] <- y[time] + effect[i] time <- time + 1 } return(y) } ## Define some holidays. memorial.day <- NamedHoliday("MemorialDay") memorial.day.effect <- c(.3, 3, .5) memorial.day.dates <- as.Date(c("2014-05-26", "2015-05-25")) y <- AddHolidayEffect(y, memorial.day.dates, memorial.day.effect) presidents.day <- NamedHoliday("PresidentsDay") presidents.day.effect <- c(.5, 2, .25) presidents.day.dates <- as.Date(c("2014-02-17", "2015-02-16")) y <- AddHolidayEffect(y, presidents.day.dates, presidents.day.effect) labor.day <- NamedHoliday("LaborDay") labor.day.effect <- c(1, 2, 1) labor.day.dates <- as.Date(c("2014-09-01", "2015-09-07")) y <- AddHolidayEffect(y, labor.day.dates, labor.day.effect) ## The holidays can be in any order. holiday.list <- list(memorial.day, labor.day, presidents.day) number.of.holidays <- length(holiday.list) ## In a real example you'd want more than 100 MCMC iterations. niter <- 100 ss <- AddLocalLevel(list(), y) ss <- AddRegressionHoliday(ss, y, holiday.list = holiday.list) model <- bsts(y, state.specification = ss, niter = niter) PlotHoliday(memorial.day, model)
trend <- cumsum(rnorm(730, 0, .1)) dates <- seq.Date(from = as.Date("2014-01-01"), length = length(trend), by = "day") y <- zoo(trend + rnorm(length(trend), 0, .2), dates) AddHolidayEffect <- function(y, dates, effect) { ## Adds a holiday effect to simulated data. ## Args: ## y: A zoo time series, with Dates for indices. ## dates: The dates of the holidays. ## effect: A vector of holiday effects of odd length. The central effect is ## the main holiday, with a symmetric influence window on either side. ## Returns: ## y, with the holiday effects added. time <- dates - (length(effect) - 1) / 2 for (i in 1:length(effect)) { y[time] <- y[time] + effect[i] time <- time + 1 } return(y) } ## Define some holidays. memorial.day <- NamedHoliday("MemorialDay") memorial.day.effect <- c(.3, 3, .5) memorial.day.dates <- as.Date(c("2014-05-26", "2015-05-25")) y <- AddHolidayEffect(y, memorial.day.dates, memorial.day.effect) presidents.day <- NamedHoliday("PresidentsDay") presidents.day.effect <- c(.5, 2, .25) presidents.day.dates <- as.Date(c("2014-02-17", "2015-02-16")) y <- AddHolidayEffect(y, presidents.day.dates, presidents.day.effect) labor.day <- NamedHoliday("LaborDay") labor.day.effect <- c(1, 2, 1) labor.day.dates <- as.Date(c("2014-09-01", "2015-09-07")) y <- AddHolidayEffect(y, labor.day.dates, labor.day.effect) ## The holidays can be in any order. holiday.list <- list(memorial.day, labor.day, presidents.day) number.of.holidays <- length(holiday.list) ## In a real example you'd want more than 100 MCMC iterations. niter <- 100 ss <- AddLocalLevel(list(), y) ss <- AddRegressionHoliday(ss, y, holiday.list = holiday.list) model <- bsts(y, state.specification = ss, niter = niter) PlotHoliday(memorial.day, model)
Functions to plot the results of a model fit using
mbsts
.
## S3 method for class 'mbsts' plot(x, y = c("means", "help"), ...) PlotMbstsSeriesMeans(mbsts.object, series.id = NULL, same.scale = TRUE, burn = SuggestBurn(.1, mbsts.object), time, show.actuals = TRUE, ylim = NULL, gap = 0, cex.actuals = 0.2, ...)
## S3 method for class 'mbsts' plot(x, y = c("means", "help"), ...) PlotMbstsSeriesMeans(mbsts.object, series.id = NULL, same.scale = TRUE, burn = SuggestBurn(.1, mbsts.object), time, show.actuals = TRUE, ylim = NULL, gap = 0, cex.actuals = 0.2, ...)
x |
An object of class |
y |
A character string indicating the aspect of the model that should be plotted. |
mbsts.object |
An object of class |
series.id |
Indicates which series should be plotted. An integer, logical, or character vector. |
same.scale |
Logical. If |
burn |
The number of MCMC iterations to discard as burn-in. |
time |
An optional vector of values to plot against. If missing, the default is to diagnose the time scale of the original time series. |
show.actuals |
Logical. If |
ylim |
Limits for the vertical axis. If |
gap |
The number of lines to leave between plots. This need not be an integer. |
cex.actuals |
Scale factor to use for plotting the raw data. |
... |
Additional arguments passed to
|
Plot the posterior predictive distribution from an
mbsts
prediction object.
## S3 method for class 'mbsts.prediction' plot(x, y = NULL, burn = 0, plot.original = TRUE, median.color = "blue", median.type = 1, median.width = 3, interval.quantiles = c(.025, .975), interval.color = "green", interval.type = 2, interval.width = 2, style = c("dynamic", "boxplot"), ylim = NULL, series.id = NULL, same.scale = TRUE, gap = 0, ...)
## S3 method for class 'mbsts.prediction' plot(x, y = NULL, burn = 0, plot.original = TRUE, median.color = "blue", median.type = 1, median.width = 3, interval.quantiles = c(.025, .975), interval.color = "green", interval.type = 2, interval.width = 2, style = c("dynamic", "boxplot"), ylim = NULL, series.id = NULL, same.scale = TRUE, gap = 0, ...)
x |
An object of class |
y |
A dummy argument necessary to match the signature of the
|
plot.original |
Logical or numeric. If |
burn |
The number of observations you wish to discard as burn-in
from the posterior predictive distribution. This is in addition
to the burn-in discarded using |
median.color |
The color to use for the posterior median of the prediction. |
median.type |
The type of line (lty) to use for the posterior median of the prediction. |
median.width |
The width of line (lwd) to use for the posterior median of the prediction. |
interval.quantiles |
The lower and upper limits of the credible interval to be plotted. |
interval.color |
The color to use for the upper and lower limits of the 95% credible interval for the prediction. |
interval.type |
The type of line (lty) to use for the upper and lower limits of the 95% credible inerval for of the prediction. |
interval.width |
The width of line (lwd) to use for the upper and lower limits of the 95% credible inerval for of the prediction. |
style |
Either "dynamic", for dynamic distribution plots, or "boxplot", for box plots. Partial matching is allowed, so "dyn" or "box" would work, for example. |
ylim |
Limits on the vertical axis. |
series.id |
A factor, string, or integer used to indicate which of the multivariate series to plot. If NULL then predictions for all series will be plotted. If there are many series this can make the plot unreadable. |
same.scale |
Logical. If TRUE then all predictions are plotted with the same scale, and limits are drawn on the Y axis. If FALSE then each prediction is drawn to fill its plot region, and no tick marks are drawn on the y axis. If ylim is specified then it is used for all plots, and same.scale is ignored. |
gap |
The amount of space to leave between plots, measured in lines of text. |
... |
Extra arguments to be passed to
|
Plots the posterior predictive distribution described by
x
using a dynamic distribution plot generated by
PlotDynamicDistribution
. Overlays the
posterior median and 95% prediction limits for the predictive
distribution.
Returns NULL.
Generate draws from the posterior predictive distribution
of a bsts
object.
## S3 method for class 'bsts' predict(object, horizon = 1, newdata = NULL, timestamps = NULL, burn = SuggestBurn(.1, object), na.action = na.exclude, olddata = NULL, olddata.timestamps = NULL, trials.or.exposure = 1, quantiles = c(.025, .975), seed = NULL, ...)
## S3 method for class 'bsts' predict(object, horizon = 1, newdata = NULL, timestamps = NULL, burn = SuggestBurn(.1, object), na.action = na.exclude, olddata = NULL, olddata.timestamps = NULL, trials.or.exposure = 1, quantiles = c(.025, .975), seed = NULL, ...)
object |
An object of class |
horizon |
An integer specifying the number of periods into the
future you wish to predict. If |
newdata |
a vector, matrix, or data frame containing the
predictor variables to use in making the prediction. This is only
required if |
timestamps |
A vector of time stamps (of the same type as the
timestamps used to fit |
burn |
An integer describing the number of MCMC
iterations in |
na.action |
A function determining what should be done with
missing values in |
olddata |
This is an optional component allowing predictions to
be made conditional on data other than the data used to fit the
model. If omitted, then it is assumed that forecasts are to be made
relative to the final observation in the training data. If
The value for
|
olddata.timestamps |
A set of timestamps corresponding to the
observations supplied in |
trials.or.exposure |
For logit or Poisson models, the number of
binomial trials (or the exposure time) to assume at each time point
in the forecast period. This can either be a scalar (if the number
of trials is to be the same for each time period), or it can be a
vector with length equal to |
quantiles |
A numeric vector of length 2 giving the lower and upper quantiles to use for the forecast interval estimate. |
seed |
An integer to use as the C++ random seed. If |
... |
This is a dummy argument included to match the signature
of the generic |
Samples from the posterior distribution of a Bayesian structural time series model. This function can be used either with or without contemporaneous predictor variables (in a time series regression).
If predictor variables are present, the regression coefficients are fixed (as opposed to time varying, though time varying coefficients might be added as state component). The predictors and response in the formula are contemporaneous, so if you want lags and differences you need to put them in the predictor matrix yourself.
If no predictor variables are used, then the model is an ordinary state space time series model.
Returns an object of class bsts.prediction
, which is a list
with the following components.
mean |
A vector giving the posterior mean of the prediction. |
interval |
A two (column/row?) matrix giving the upper and lower bounds of the 95 percent credible interval for the prediction. |
distribution |
A matrix of draws from the posterior predictive distribution. Each row in the matrix is one MCMC draw. Columns represent time. |
Steven L. Scott
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
bsts
.
AddLocalLevel
.
AddLocalLinearTrend
.
AddSemilocalLinearTrend
.
# The number of MCMC draws in the following examples is artificially low. ## Making predictions when there is no regression component. data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 250) pred <- predict(model, horizon = 12, burn = 100) plot(pred) ## An example using the olddata argument. full.pred <- pred training <- window(y, end = c(1959, 12)) model <- bsts(training, state.specification = ss, niter = 250) ## Predict the next 12 months. pred <- predict(model, horizon = 12) ## Compare the predictions to the actual data. plot(pred) lines(as.numeric(y, col = "red", lty = 2, lwd = 2)) ## Predict the 12 months of 1961 based on the posterior distribution ## of the model fit to data through 1959, but with state filtered ## through 1960. updated.pred <- predict(model, horizon = 12, olddata = y) par(mfrow = c(1, 2)) plot(full.pred, ylim = c(4, 7)) plot(updated.pred, ylim = c(4, 7)) ## Examples including a regression component. ## data(iclaims) training <- initial.claims[1:402, ] holdout1 <- initial.claims[403:450, ] holdout2 <- initial.claims[451:456, ] ## Not run: ## This example puts the total run time over 5 seconds, which is a CRAN ## violation. ss <- AddLocalLinearTrend(list(), training$iclaimsNSA) ss <- AddSeasonal(ss, training$iclaimsNSA, nseasons = 52) ## In real life you'd want more iterations... model <- bsts(iclaimsNSA ~ ., state.specification = ss, data = training, niter = 100) ## Predict the holdout set given the training set. ## This is really fast, because we can use saved state from the MCMC ## algorithm. pred.full <- predict(model, newdata = rbind(holdout1, holdout2)) ## Predict holdout 2, given training and holdout1. ## This is much slower because we need to re-filter the 'olddata' before ## simulating the predictions. pred.update <- predict(model, newdata = holdout2, olddata = rbind(training, holdout1)) ## End(Not run)
# The number of MCMC draws in the following examples is artificially low. ## Making predictions when there is no regression component. data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 250) pred <- predict(model, horizon = 12, burn = 100) plot(pred) ## An example using the olddata argument. full.pred <- pred training <- window(y, end = c(1959, 12)) model <- bsts(training, state.specification = ss, niter = 250) ## Predict the next 12 months. pred <- predict(model, horizon = 12) ## Compare the predictions to the actual data. plot(pred) lines(as.numeric(y, col = "red", lty = 2, lwd = 2)) ## Predict the 12 months of 1961 based on the posterior distribution ## of the model fit to data through 1959, but with state filtered ## through 1960. updated.pred <- predict(model, horizon = 12, olddata = y) par(mfrow = c(1, 2)) plot(full.pred, ylim = c(4, 7)) plot(updated.pred, ylim = c(4, 7)) ## Examples including a regression component. ## data(iclaims) training <- initial.claims[1:402, ] holdout1 <- initial.claims[403:450, ] holdout2 <- initial.claims[451:456, ] ## Not run: ## This example puts the total run time over 5 seconds, which is a CRAN ## violation. ss <- AddLocalLinearTrend(list(), training$iclaimsNSA) ss <- AddSeasonal(ss, training$iclaimsNSA, nseasons = 52) ## In real life you'd want more iterations... model <- bsts(iclaimsNSA ~ ., state.specification = ss, data = training, niter = 100) ## Predict the holdout set given the training set. ## This is really fast, because we can use saved state from the MCMC ## algorithm. pred.full <- predict(model, newdata = rbind(holdout1, holdout2)) ## Predict holdout 2, given training and holdout1. ## This is much slower because we need to re-filter the 'olddata' before ## simulating the predictions. pred.update <- predict(model, newdata = holdout2, olddata = rbind(training, holdout1)) ## End(Not run)
Generate draws from the posterior predictive distribution
of an mbsts
object.
## S3 method for class 'mbsts' predict(object, horizon = 1, newdata = NULL, timestamps = NULL, burn = SuggestBurn(.1, object), na.action = na.exclude, quantiles = c(.025, .975), seed = NULL, ...)
## S3 method for class 'mbsts' predict(object, horizon = 1, newdata = NULL, timestamps = NULL, burn = SuggestBurn(.1, object), na.action = na.exclude, quantiles = c(.025, .975), seed = NULL, ...)
object |
An object of class |
horizon |
An integer specifying the number of periods into the
future you wish to predict. If |
newdata |
A vector, matrix, or data frame containing the
predictor variables to use in making the prediction. This is only
required if |
timestamps |
A vector of time stamps (of the same type as the
timestamps used to fit |
burn |
An integer describing the number of MCMC iterations in
|
na.action |
A function determining what should be done with
missing values in |
quantiles |
A numeric vector of length 2 giving the lower and upper quantiles to use for the forecast interval estimate. |
seed |
An integer to use as the C++ random seed. If
|
... |
Not used. Present to match the signature of the default predict method. |
The prediction is based off of samples taken from the posterior distribution of a multivariate Bayesian structural time series model.
As an added convenience, means and interval estimates are produced from the posterior predictive distribution.
Returns an object of class mbsts.prediction, which is a list.
Steven L. Scott
mbsts
.
predict.bsts
plot.mbsts.prediction
Returns the quarter and year in which a date occurs.
Quarter(date)
Quarter(date)
date |
A vector convertible to |
A numeric vector identifying the quarter that each element of
date
corresponds to, expressed as a number of years since 1900.
Thus Q1-2000 is 100.00, and Q3-2007 is 107.50.
Steven L. Scott [email protected]
Quarter(c("2008-02-29", "2008-04-29")) # [1] 108.00 108.25
Quarter(c("2008-02-29", "2008-04-29")) # [1] 108.00 108.25
Add a regression-based holiday model to the state specification.
AddRegressionHoliday( state.specification = NULL, y, holiday.list, time0 = NULL, prior = NULL, sdy = sd(as.numeric(y), na.rm = TRUE)) AddHierarchicalRegressionHoliday( state.specification = NULL, y, holiday.list, coefficient.mean.prior = NULL, coefficient.variance.prior = NULL, time0 = NULL, sdy = sd(as.numeric(y), na.rm = TRUE))
AddRegressionHoliday( state.specification = NULL, y, holiday.list, time0 = NULL, prior = NULL, sdy = sd(as.numeric(y), na.rm = TRUE)) AddHierarchicalRegressionHoliday( state.specification = NULL, y, holiday.list, coefficient.mean.prior = NULL, coefficient.variance.prior = NULL, time0 = NULL, sdy = sd(as.numeric(y), na.rm = TRUE))
state.specification |
A list of state components that you wish to add to. If omitted, an empty list will be assumed. |
holiday.list |
A list of objects of type |
y |
The time series to be modeled, as a numeric vector
convertible to |
prior |
An object of class |
coefficient.mean.prior |
An object of type
|
coefficient.variance.prior |
An object of type
|
time0 |
An object convertible to |
sdy |
The standard deviation of the series to be modeled. This
will be ignored if |
The model assumes that
The regression state model assumes vector of regression coefficients
contains elements
.
The HierarchicalRegressionHolidayModel assumes is
composed of holiday-specific sub-vectors
, where each
contains coefficients describing the days in
the influence window of holiday h. The hierarchical version of the
model treats
and
as parameters to be learned,
with prior distributions
and
where represents the inverse Wishart distribution.
Returns a list with the elements necessary to specify a local linear trend state model.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
bsts
.
RandomWalkHolidayStateModel
.
SdPrior
NormalPrior
trend <- cumsum(rnorm(730, 0, .1)) dates <- seq.Date(from = as.Date("2014-01-01"), length = length(trend), by = "day") y <- zoo(trend + rnorm(length(trend), 0, .2), dates) AddHolidayEffect <- function(y, dates, effect) { ## Adds a holiday effect to simulated data. ## Args: ## y: A zoo time series, with Dates for indices. ## dates: The dates of the holidays. ## effect: A vector of holiday effects of odd length. The central effect is ## the main holiday, with a symmetric influence window on either side. ## Returns: ## y, with the holiday effects added. time <- dates - (length(effect) - 1) / 2 for (i in 1:length(effect)) { y[time] <- y[time] + effect[i] time <- time + 1 } return(y) } ## Define some holidays. memorial.day <- NamedHoliday("MemorialDay") memorial.day.effect <- c(.3, 3, .5) memorial.day.dates <- as.Date(c("2014-05-26", "2015-05-25")) y <- AddHolidayEffect(y, memorial.day.dates, memorial.day.effect) presidents.day <- NamedHoliday("PresidentsDay") presidents.day.effect <- c(.5, 2, .25) presidents.day.dates <- as.Date(c("2014-02-17", "2015-02-16")) y <- AddHolidayEffect(y, presidents.day.dates, presidents.day.effect) labor.day <- NamedHoliday("LaborDay") labor.day.effect <- c(1, 2, 1) labor.day.dates <- as.Date(c("2014-09-01", "2015-09-07")) y <- AddHolidayEffect(y, labor.day.dates, labor.day.effect) ## The holidays can be in any order. holiday.list <- list(memorial.day, labor.day, presidents.day) ## In a real example you'd want more than 100 MCMC iterations. niter <- 100 ## Fit the model ss <- AddLocalLevel(list(), y) ss <- AddRegressionHoliday(ss, y, holiday.list = holiday.list) model <- bsts(y, state.specification = ss, niter = niter) ## Plot all model state components. plot(model, "comp") ## Plot the specific holiday state component. plot(ss[[2]], model) ## Try again with some shrinkage. With only 3 holidays there won't be much ## shrinkage. ss2 <- AddLocalLevel(list(), y) ## Plot the specific holiday state component. ss2 <- AddHierarchicalRegressionHoliday(ss2, y, holiday.list = holiday.list) model2 <- bsts(y, state.specification = ss2, niter = niter) plot(model2, "comp") plot(ss2[[2]], model2)
trend <- cumsum(rnorm(730, 0, .1)) dates <- seq.Date(from = as.Date("2014-01-01"), length = length(trend), by = "day") y <- zoo(trend + rnorm(length(trend), 0, .2), dates) AddHolidayEffect <- function(y, dates, effect) { ## Adds a holiday effect to simulated data. ## Args: ## y: A zoo time series, with Dates for indices. ## dates: The dates of the holidays. ## effect: A vector of holiday effects of odd length. The central effect is ## the main holiday, with a symmetric influence window on either side. ## Returns: ## y, with the holiday effects added. time <- dates - (length(effect) - 1) / 2 for (i in 1:length(effect)) { y[time] <- y[time] + effect[i] time <- time + 1 } return(y) } ## Define some holidays. memorial.day <- NamedHoliday("MemorialDay") memorial.day.effect <- c(.3, 3, .5) memorial.day.dates <- as.Date(c("2014-05-26", "2015-05-25")) y <- AddHolidayEffect(y, memorial.day.dates, memorial.day.effect) presidents.day <- NamedHoliday("PresidentsDay") presidents.day.effect <- c(.5, 2, .25) presidents.day.dates <- as.Date(c("2014-02-17", "2015-02-16")) y <- AddHolidayEffect(y, presidents.day.dates, presidents.day.effect) labor.day <- NamedHoliday("LaborDay") labor.day.effect <- c(1, 2, 1) labor.day.dates <- as.Date(c("2014-09-01", "2015-09-07")) y <- AddHolidayEffect(y, labor.day.dates, labor.day.effect) ## The holidays can be in any order. holiday.list <- list(memorial.day, labor.day, presidents.day) ## In a real example you'd want more than 100 MCMC iterations. niter <- 100 ## Fit the model ss <- AddLocalLevel(list(), y) ss <- AddRegressionHoliday(ss, y, holiday.list = holiday.list) model <- bsts(y, state.specification = ss, niter = niter) ## Plot all model state components. plot(model, "comp") ## Plot the specific holiday state component. plot(ss[[2]], model) ## Try again with some shrinkage. With only 3 holidays there won't be much ## shrinkage. ss2 <- AddLocalLevel(list(), y) ## Plot the specific holiday state component. ss2 <- AddHierarchicalRegressionHoliday(ss2, y, holiday.list = holiday.list) model2 <- bsts(y, state.specification = ss2, niter = niter) plot(model2, "comp") plot(ss2[[2]], model2)
Given an set of timestamps that might contain duplicates and gaps, produce a set of timestamps that has no duplicates and no gaps.
RegularizeTimestamps(timestamps) ## Default S3 method: RegularizeTimestamps(timestamps) ## S3 method for class 'numeric' RegularizeTimestamps(timestamps) ## S3 method for class 'Date' RegularizeTimestamps(timestamps) ## S3 method for class 'POSIXt' RegularizeTimestamps(timestamps)
RegularizeTimestamps(timestamps) ## Default S3 method: RegularizeTimestamps(timestamps) ## S3 method for class 'numeric' RegularizeTimestamps(timestamps) ## S3 method for class 'Date' RegularizeTimestamps(timestamps) ## S3 method for class 'POSIXt' RegularizeTimestamps(timestamps)
timestamps |
A set of (possibly irregular or non-unique)
timestamps. This could be a set of integers (like 1, 2, , 3...), a
set of numeric like (1945, 1945.083, 1945.167, ...) indicating years
and fractions of years, a |
If the argument is NULL
a
NULL
will be returned.
A set of regularly spaced timestamps of the same class as the argument
(which might be NULL
).
Steven L. Scott [email protected]
first <- as.POSIXct("2015-04-19 08:00:04") monthly <- seq(from = first, length.out = 24, by = "month") skip.one <- monthly[-8] has.duplicates <- monthly has.duplicates[2] <- has.duplicates[3] reg1 <- RegularizeTimestamps(skip.one) all.equal(reg1, monthly) ## TRUE reg2 <- RegularizeTimestamps(has.duplicates) all.equal(reg2, monthly) ## TRUE
first <- as.POSIXct("2015-04-19 08:00:04") monthly <- seq(from = first, length.out = 24, by = "month") skip.one <- monthly[-8] has.duplicates <- monthly has.duplicates[2] <- has.duplicates[3] reg1 <- RegularizeTimestamps(skip.one) all.equal(reg1, monthly) ## TRUE reg2 <- RegularizeTimestamps(has.duplicates) all.equal(reg2, monthly) ## TRUE
Residuals (or posterior distribution of residuals) from a bsts object.
## S3 method for class 'bsts' residuals(object, burn = SuggestBurn(.1, object), mean.only = FALSE, ...)
## S3 method for class 'bsts' residuals(object, burn = SuggestBurn(.1, object), mean.only = FALSE, ...)
object |
An object of class |
burn |
The number of MCMC iterations to discard as burn-in. |
mean.only |
Logical. If |
... |
Not used. This argument is here to comply with the signature of the generic residuals function. |
If mean.only
is TRUE
then this function returns a vector
of residuals with the same "time stamp" as the original series. If
mean.only
is FALSE
then the posterior distribution of
the residuals is returned instead, as a matrix of draws. Each row of
the matrix is an MCMC draw, and each column is a time point. The
colnames of the returned matrix will be the timestamps of the original
series, as text.
A monthly time series of retail sales in the US, excluding food services. In millions of dollars. Seasonally adjusted.
data(rsxfs)
data(rsxfs)
zoo time series
FRED. See http://research.stlouisfed.org/fred2/series/RSXFS
data(rsxfs) plot(rsxfs)
data(rsxfs) plot(rsxfs)
An annual time series of shark attacks and fatalities in Florida.
data(shark)
data(shark)
zoo time series
From Jeffrey Simonoff "Analysis of Categorical Data". http://people.stern.nyu.edu/jsimonof/AnalCatData/Data/Comma_separated/floridashark.csv
data(shark) head(shark)
data(shark) head(shark)
Removes common prefixes and suffixes from character vectors.
Shorten(words)
Shorten(words)
words |
A character vector to be shortened. |
The argument words
is returned, after common prefixes and
suffixes have been removed. If all arguments are identical then no
shortening is done.
Steven L. Scott [email protected]
Shorten(c("/usr/common/foo.tex", "/usr/common/barbarian.tex")) # returns c("foo", "barbarian") Shorten(c("hello", "hellobye")) # returns c("", "bye") Shorten(c("hello", "hello")) # returns c("hello", "hello") Shorten(c("", "x", "xx")) # returns c("", "x", "xx") Shorten("abcde") # returns "abcde"
Shorten(c("/usr/common/foo.tex", "/usr/common/barbarian.tex")) # returns c("foo", "barbarian") Shorten(c("hello", "hellobye")) # returns c("", "bye") Shorten(c("hello", "hello")) # returns c("hello", "hello") Shorten(c("", "x", "xx")) # returns c("", "x", "xx") Shorten("abcde") # returns "abcde"
Simulate a fake data set that can be used to test mixed frequency code.
SimulateFakeMixedFrequencyData(nweeks, xdim, number.nonzero = xdim, start.date = as.Date("2009-01-03"), sigma.obs = 1.0, sigma.slope = .5, sigma.level = .5, beta.sd = 10)
SimulateFakeMixedFrequencyData(nweeks, xdim, number.nonzero = xdim, start.date = as.Date("2009-01-03"), sigma.obs = 1.0, sigma.slope = .5, sigma.level = .5, beta.sd = 10)
nweeks |
The number of weeks of data to simulate. |
xdim |
The dimension of the predictor variables to be simulated. |
number.nonzero |
The number nonzero coefficients. Must be
less than or equal to |
start.date |
The date of the first simulated week. |
sigma.obs |
The residual standard deviation for the fine time scale model. |
sigma.slope |
The standard deviation of the slope component of the local linear trend model for the fine time scale data. |
sigma.level |
The standard deviation of the level component fo the local linear trend model for the fine time scale data. |
beta.sd |
The standard deviation of the regression coefficients to be simulated. |
The simulation begins by simulating a local linear trend model for
nweeks
to get the trend component.
Next a nweeks
by xdim
matrix of predictor variables is
simulated as IID normal(0, 1) deviates, and a xdim
-vector of
regression coefficients is simulated as IID normal(0, beta.sd
).
The product of the predictor matrix and regression coefficients is
added to the output of the local linear trend model to get
fine.target
.
Finally, fine.target
is aggregated to the month level to get
coarse.target
.
Returns a list with the following components
coarse.target |
A |
fine.target |
A |
predictors |
A |
true.beta |
The vector of "true" regression coefficients used to
simulate |
true.sigma.obs |
The residual standard deviation that was used to
simulate |
true.sigma.slope |
The value of |
true.sigma.level |
The value of |
true.trend |
The combined contribution of the simulated latent
state on |
true.state |
A matrix containin the fine-scale state of the model
being simulated. Columns represent time (weeks). Rows correspond
to regression (a constant 1), the local linear trend level, the
local linear trend slope, the values of |
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
bsts.mixed
,
AddLocalLinearTrend
,
fake.data <- SimulateFakeMixedFrequencyData(nweeks = 100, xdim = 10) plot(fake.data$coarse.target)
fake.data <- SimulateFakeMixedFrequencyData(nweeks = 100, xdim = 10) plot(fake.data$coarse.target)
Returns a spike and slab prior for the parameters of an AR(p) process.
SpikeSlabArPrior( lags, prior.inclusion.probabilities = GeometricSequence( lags, initial.value = .8, discount.factor = .8), prior.mean = rep(0, lags), prior.sd = GeometricSequence(lags, initial.value = .5, discount.factor = .8), sdy, prior.df = 1, expected.r2 = .5, sigma.upper.limit = Inf, truncate = TRUE)
SpikeSlabArPrior( lags, prior.inclusion.probabilities = GeometricSequence( lags, initial.value = .8, discount.factor = .8), prior.mean = rep(0, lags), prior.sd = GeometricSequence(lags, initial.value = .5, discount.factor = .8), sdy, prior.df = 1, expected.r2 = .5, sigma.upper.limit = Inf, truncate = TRUE)
lags |
A positive integer giving the maximum number of lags to consider. |
prior.inclusion.probabilities |
A vector of length |
prior.mean |
A vector of length |
prior.sd |
A vector of length |
sdy |
The sample standard deviation of the series being modeled. |
expected.r2 |
The expected fraction of variation in the response explained by this AR proces. |
prior.df |
A positive number indicating the number of
observations (time points) worth of weight to assign to the guess at
|
sigma.upper.limit |
A positive number less than infinity truncates the support of the prior distribution to regions where the residual standard deviation is less than the specified limit. Any other value indicates support over the entire positive real line. |
truncate |
If |
A list of class SpikeSlabArPrior
containing the information
needed for the underlying C++ code to instantiate this prior.
Steven L. Scott [email protected]
Returns a vector containing the size of each state component (i.e. the state dimension) in the state vector.
StateSizes(state.specification)
StateSizes(state.specification)
state.specification |
A list containing state specification
components, such as would be passed to |
A numeric vector giving the dimension of each state component.
Steven L. Scott [email protected]
y <- rnorm(1000) state.specification <- AddLocalLinearTrend(list(), y) state.specification <- AddSeasonal(state.specification, y, 7) StateSizes(state.specification)
y <- rnorm(1000) state.specification <- AddLocalLinearTrend(list(), y) state.specification <- AddSeasonal(state.specification, y, 7) StateSizes(state.specification)
Add a state component to the state.specification
argument in a
bsts
model.
Steven L. Scott [email protected]
Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.
Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.
bsts
.
SdPrior
NormalPrior
Ar1CoefficientPrior
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 500) pred <- predict(model, horizon = 12, burn = 100) plot(pred)
Suggest the size of an MCMC burn in sample as a proportion of the total run.
SuggestBurn(proportion, bsts.object)
SuggestBurn(proportion, bsts.object)
proportion |
The proportion of the MCMC run to discard as burn in. |
bsts.object |
An object of class |
An integer number of iterations to discard.
Print a summary of a bsts
object.
## S3 method for class 'bsts' summary(object, burn = SuggestBurn(.1, object), ...)
## S3 method for class 'bsts' summary(object, burn = SuggestBurn(.1, object), ...)
object |
An object of class |
burn |
The number of MCMC iterations to discard as burn-in. |
... |
Additional arguments passed to
|
Returns a list with the following elements.
residual.sd |
The posterior mean of the residual standard deviation parameter. |
prediction.sd |
The standard deviation of the one-step-ahead prediction errors for the training data. |
rsquare |
Proportion by which the residual variance is less than the variance of the original observations. |
relative.gof |
Harvey's goodness of fit statistic. Let
This statistic is analogous to
which Harvey (1989, equation 5.5.14) argues is a more relevant baseline than a simple mean. Unlike a traditional R-square statistic, this can be negative. |
size |
Distribution of the number of nonzero coefficients appearing in the model |
coefficients |
If
|
Harvey's goodness of fit statistic is from Harvey (1989) Forecasting, structural time series models, and the Kalman filter. Page 268.
bsts
, plot.bsts
, summary.lm.spike
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 100) summary(model, burn = 20)
data(AirPassengers) y <- log(AirPassengers) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 100) summary(model, burn = 20)
Convert an object of class Date to class POSIXct without getting bogged down in timezone calculation.
DateToPOSIX(timestamps) YearMonToPOSIX(timestamps)
DateToPOSIX(timestamps) YearMonToPOSIX(timestamps)
timestamps |
An object of class |
Calling as.POSIXct
on another date/time object
(e.g. Date) applies a timezone correction to the object. This can
shift the time marker by a few hours, which can have the effect of
shifting the day by one unit. If the day was the first or last in a
month or year, then the month or year will be off by one as well.
Coercing the object to the character representation of a Date prevents this adjustment from being applied, and leaves the POSIXt return value with the intended day, month, and year.
Steven L. Scott [email protected]
A daily time series of electricity usaage in Turkey.
data(turkish)
data(turkish)
zoo time series
https://robjhyndman.com/data/turkey_elec.csv
data(turkish) plot(turkish)
data(turkish) plot(turkish)
Returns a logical vector indicating whether the given week contains the end of a month or quarter.
WeekEndsMonth(week.ending) WeekEndsQuarter(week.ending)
WeekEndsMonth(week.ending) WeekEndsQuarter(week.ending)
week.ending |
A vector of class |
A logical vector indicating whether the given week contains the end of a month or a quarter.
Steven L. Scott [email protected]
week.ending <- as.Date(c("2011-10-01", "2011-10-08", "2011-12-03", "2011-12-31")) WeekEndsMonth(week.ending) == c(TRUE, FALSE, TRUE, TRUE) WeekEndsQuarter(week.ending) == c(TRUE, FALSE, FALSE, TRUE)
week.ending <- as.Date(c("2011-10-01", "2011-10-08", "2011-12-03", "2011-12-31")) WeekEndsMonth(week.ending) == c(TRUE, FALSE, TRUE, TRUE) WeekEndsQuarter(week.ending) == c(TRUE, FALSE, FALSE, TRUE)
A character vector listing the names the days of the week.
weekday.names
weekday.names
Convert a multivariate time series between wide and long formats. In "wide" format there is one row per time point, with series organzied by columns. In "long" format there is one row per observation, with variables indicating the series and time point to which an observation belongs.
WideToLong(response, na.rm = TRUE) LongToWide(response, series.id, timestamps)
WideToLong(response, na.rm = TRUE) LongToWide(response, series.id, timestamps)
response |
For For |
na.rm |
If TRUE then missing values will be omitted from the returned data frame (their absence denoting missingness). Otherwise, missing values will be included as NA's. |
series.id |
A factor (or variable coercible to factor) of the
same length as |
timestamps |
A variable of the same length as |
LongToWide
returns a zoo matrix with the time series in wide format.
WideToLong
returns a 3-column data frame with columns "time", "series", and "values".
Steven L. Scott [email protected]
data(gdp) gdp.wide <- LongToWide(gdp$GDP, gdp$Country, gdp$Time) gdp.long <- WideToLong(gdp.wide)
data(gdp) gdp.wide <- LongToWide(gdp$GDP, gdp$Country, gdp$Time) gdp.long <- WideToLong(gdp.wide)