Title: | Regression Models and Utilities for Repeated Measures and Panel Data |
---|---|
Description: | Provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take advantage of the unique aspects of panel data. Among other capabilities, automates the "within-between" (also known as "between-within" and "hybrid") panel regression specification that combines the desirable aspects of both fixed effects and random effects econometric models and fits them as multilevel models (Allison, 2009 <doi:10.4135/9781412993869.d33>; Bell & Jones, 2015 <doi:10.1017/psrm.2014.7>). These models can also be estimated via generalized estimating equations (GEE; McNeish, 2019 <doi:10.1080/00273171.2019.1602504>) and Bayesian estimation is (optionally) supported via 'Stan'. Supports estimation of asymmetric effects models via first differences (Allison, 2019 <doi:10.1177/2378023119826441>) as well as a generalized linear model extension thereof using GEE. |
Authors: | Jacob A. Long [aut, cre] |
Maintainer: | Jacob A. Long <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.8.0.9000 |
Built: | 2024-10-27 04:21:38 UTC |
Source: | https://github.com/jacob-long/panelr |
This function is designed for use with panel_data()
objects.
are_varying(data, ..., type = "time")
are_varying(data, ..., type = "time")
data |
A data frame, typically of |
... |
Variable names. If none are given, all variables are checked. |
type |
Check for variance over time or across individuals? Default
is |
A named logical vector. If TRUE, the variable is varying.
wages <- panel_data(WageData, id = id, wave = t) wages %>% are_varying(occ, ind, fem, blk)
wages <- panel_data(WageData, id = id, wave = t) wages %>% are_varying(occ, ind, fem, blk)
The function fits the asymmetric effects first difference model described in Allison (2019) using GLS estimation.
asym( formula, data, id = NULL, wave = NULL, use.wave = FALSE, min.waves = 1, variance = c("toeplitz-1", "constrained", "unconstrained"), error.type = c("CR2", "CR1S"), ... )
asym( formula, data, id = NULL, wave = NULL, use.wave = FALSE, min.waves = 1, variance = c("toeplitz-1", "constrained", "unconstrained"), error.type = c("CR2", "CR1S"), ... )
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
variance |
One of |
error.type |
Either "CR2" or "CR1S". See the |
... |
Ignored. |
Allison, P. D. (2019). Asymmetric fixed-effects models for panel data. Socius, 5, 1-12. https://doi.org/10.1177/2378023119826441
## Not run: data("teen_poverty") # Convert to long format teen <- long_panel(teen_poverty, begin = 1, end = 5) model <- asym(hours ~ lag(pov) + spouse, data = teen) summary(model) ## End(Not run)
## Not run: data("teen_poverty") # Convert to long format teen <- long_panel(teen_poverty, begin = 1, end = 5) model <- asym(hours ~ lag(pov) + spouse, data = teen) summary(model) ## End(Not run)
Fit "within-between" and several other regression variants for panel data via generalized estimating equations.
asym_gee( formula, data, id = NULL, wave = NULL, cor.str = c("ar1", "exchangeable", "unstructured"), use.wave = FALSE, wave.factor = FALSE, min.waves = 1, family = gaussian, weights = NULL, offset = NULL, ... )
asym_gee( formula, data, id = NULL, wave = NULL, cor.str = c("ar1", "exchangeable", "unstructured"), use.wave = FALSE, wave.factor = FALSE, min.waves = 1, family = gaussian, weights = NULL, offset = NULL, ... )
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
cor.str |
Any correlation structure accepted by |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
family |
Use this to specify GLM link families. Default is |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
... |
Additional arguments provided to |
See the documentation for wbm()
for many details on formula syntax and
other arguments.
An asym_gee
object, which inherits from wbgee
and geeglm
.
Jacob A. Long
Allison, P. D. (2019). Asymmetric fixed-effects models for panel data. Socius, 5, 1-12. https://doi.org/10.1177/2378023119826441
McNeish, D. (2019). Effect partitioning in cross-sectionally clustered data without multilevel models. Multivariate Behavioral Research, Advance online publication. https://doi.org/10.1080/00273171.2019.1602504
McNeish, D., Stapleton, L. M., & Silverman, R. D. (2016). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22, 114-140. https://doi.org/10.1037/met0000078
if (requireNamespace("geepack")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- asym_gee(lwage ~ lag(union) + wks, data = wages) summary(model) }
if (requireNamespace("geepack")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- asym_gee(lwage ~ lag(union) + wks, data = wages) summary(model) }
This function allows you to define a minimum number of waves/periods and exclude all individuals with fewer observations than that.
complete_data(data, ..., formula = NULL, vars = NULL, min.waves = "all")
complete_data(data, ..., formula = NULL, vars = NULL, min.waves = "all")
data |
A |
... |
Optionally, unquoted variable names/expressions separated by
commas to be passed to |
formula |
A formula, like the one you'll be using to specify your model. |
vars |
As an alternative to formula, a vector of variable names. |
min.waves |
What is the minimum number of observations to be kept?
Default is |
If ...
(that is, unquoted variable name(s)) are included, then formula
and vars
are ignored. Likewise, formula
takes precedence over vars
.
These are just different methods for selecting variables and you can choose
whichever you prefer/are comfortable with. ...
corresponds with the
"tidyverse" way, formula
is useful for programming or working with
model formulas, and vars
is a "standard" evaluation method for when you
are working with strings.
A panel_data
frame.
data("WageData") wages <- panel_data(WageData, id = id, wave = t) complete_data(wages, wks, lwage, min.waves = 3)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) complete_data(wages, wks, lwage, min.waves = 3)
The function fits first difference models using GLS estimation.
fdm( formula, data, id = NULL, wave = NULL, use.wave = FALSE, min.waves = 1, variance = c("toeplitz-1", "constrained", "unconstrained"), error.type = c("CR2", "CR1S"), ... )
fdm( formula, data, id = NULL, wave = NULL, use.wave = FALSE, min.waves = 1, variance = c("toeplitz-1", "constrained", "unconstrained"), error.type = c("CR2", "CR1S"), ... )
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
variance |
One of |
error.type |
Either "CR2" or "CR1S". See the |
... |
Ignored. |
Allison, P. D. (2019). Asymmetric fixed-effects models for panel data. Socius, 5, 1-12. https://doi.org/10.1177/2378023119826441
if (requireNamespace("clubSandwich")) { data("teen_poverty") # Convert to long format teen <- long_panel(teen_poverty, begin = 1, end = 5) model <- fdm(hours ~ lag(pov) + spouse, data = teen) summary(model) }
if (requireNamespace("clubSandwich")) { data("teen_poverty") # Convert to long format teen <- long_panel(teen_poverty, begin = 1, end = 5) model <- fdm(hours ~ lag(pov) + spouse, data = teen) summary(model) }
wbm
objectsThis S3 method allows you to retrieve the formula used to
fit wbm
objects.
## S3 method for class 'wbm' formula(x, raw = FALSE, ...)
## S3 method for class 'wbm' formula(x, raw = FALSE, ...)
x |
A |
raw |
Return the formula used in the call to |
... |
further arguments passed to or from other methods. |
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks, data = wages) # Returns the original model formula rather than the one sent to lme4 formula(model)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks, data = wages) # Returns the original model formula rather than the one sent to lme4 formula(model)
get_id()
, get_wave()
, and get_periods()
are extractor
functions that can be used to retrieve the names of the id and wave
variables or time periods of a panel_data
frame.
get_wave(data) get_id(data) get_periods(data)
get_wave(data) get_id(data) get_periods(data)
data |
A |
A panel_data
frame
data("WageData") wages <- panel_data(WageData, id = id, wave = t) get_wave(wages) get_id(wages) get_periods(wages)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) get_wave(wages) get_id(wages) get_periods(wages)
This function uses three waves of data to estimate stability and reliability coefficients as described in Heise (1969).
heise(data, ..., waves = NULL)
heise(data, ..., waves = NULL)
data |
A |
... |
unquoted variable names that are passed to |
waves |
Which 3 waves should be used? If NULL (the default), the first, middle, and last waves are used. |
A tibble
with reliability (rel
), waves 1-3 stability (stab13
),
waves 1-2 stability (stab12
), and waves 2-3 stability (stab23
) and
the variable these values refer to (var
).
Heise, D. R. (1969). Separating reliability and stability in test-retest correlation. American Sociological Review, 34, 93–101. https://doi.org/10.2307/2092790
data("WageData") wages <- panel_data(WageData, id = id, wave = t) heise(wages, wks, lwage) # will use waves 1, 4, and 7 by default
data("WageData") wages <- panel_data(WageData, id = id, wave = t) heise(wages, wks, lwage) # will use waves 1, 4, and 7 by default
This is a convenience function that checks whether an object
is a panel_data
object.
is_panel(x)
is_panel(x)
x |
Any object. |
data("WageData") is_panel(WageData) # FALSE wages <- panel_data(WageData, id = id, wave = t) is_panel(wages) # TRUE
data("WageData") is_panel(WageData) # FALSE wages <- panel_data(WageData, id = id, wave = t) is_panel(wages) # TRUE
line_plot
allows for flexible visualization of repeated
measures variables from panel_data
frames.
line_plot( data, var, id = NULL, wave = NULL, overlay = TRUE, show.points = TRUE, subset.ids = FALSE, n.random.subset = 9, add.mean = FALSE, mean.function = "lm", line.size = 1, alpha = if (overlay) 0.5 else 1 )
line_plot( data, var, id = NULL, wave = NULL, overlay = TRUE, show.points = TRUE, subset.ids = FALSE, n.random.subset = 9, add.mean = FALSE, mean.function = "lm", line.size = 1, alpha = if (overlay) 0.5 else 1 )
data |
Either a |
var |
The unquoted name of the variable of interest. |
id |
If |
wave |
If |
overlay |
Should the lines be plotted in the same panel or each in their own facet/panel? Default is TRUE, meaning they are plotted in the same panel. |
show.points |
Plot a point at each wave? Default is TRUE. |
subset.ids |
Plot only a subset of the entities' lines? Default is NULL,
meaning plot all ids. If TRUE, a random subset (the number defined by
|
n.random.subset |
How many entities to randomly sample when |
add.mean |
Add a line representing the mean trend? Default is FALSE.
Cannot be combined with |
mean.function |
The mean function to supply to |
line.size |
The thickness of the plotted lines. Default: 0.5 |
alpha |
The transparency for the lines and points. When
|
The ggplot
object.
data("WageData") wages <- panel_data(WageData, id = id, wave = t) line_plot(wages, lwage, add.mean = TRUE, subset.ids = TRUE, overlay = FALSE)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) line_plot(wages, lwage, add.mean = TRUE, subset.ids = TRUE, overlay = FALSE)
This function takes wide format panels as input and converts them to long format.
long_panel( data, prefix = NULL, suffix = NULL, begin = NULL, end = NULL, id = "id", wave = "wave", periods = NULL, label_location = c("end", "beginning"), as_panel_data = TRUE, match = ".*", use.regex = FALSE, check.varying = TRUE )
long_panel( data, prefix = NULL, suffix = NULL, begin = NULL, end = NULL, id = "id", wave = "wave", periods = NULL, label_location = c("end", "beginning"), as_panel_data = TRUE, match = ".*", use.regex = FALSE, check.varying = TRUE )
data |
The wide data frame. |
prefix |
What character(s) go before the period indicator? If none, set this argument to NULL. |
suffix |
What character(s) go after the period indicator? If none, set this argument to NULL. |
begin |
What is the label for the first period? Could be |
end |
What is the label for the final period? Could be |
id |
The name of the ID variable as a string. If there is no ID variable, then this will be the name of the newly-created ID variable. |
wave |
This will be the name of the newly-created wave variable. |
periods |
If you period indicator does not lie in a sequence or is
not understood by the function, then you can supply them as a vector
instead. For instance, you could give |
label_location |
Where does the period label go on the variable?
If the variables are labeled like |
as_panel_data |
Should the return object be a |
match |
The regex that will match the part of the variable names other
than the wave indicator. By default it will match any character any
amount of times. Sometimes you might know that the variable names should
start with a digit, for instance, and you might use |
use.regex |
Should the |
check.varying |
Should the function check to make sure that every variable in the wide data with a wave indicator is actually time-varying? Default is TRUE, meaning that a constant like "race_W1" only measured in wave 1 will be defined in each wave in the long data. With very large datasets, however, sometimes setting this to FALSE can save memory. |
There is no easy way to convert panel data from wide to long format because the both formats are basically non-standard for other applications. This function can handle the common case in which the wide data frame has a regular labeling system for each period. The key thing is providing enough information for the function to understand the pattern.
In the end, this function calls stats::reshape()
but should be easier
to use and able to handle more situations, such as when the label occurs
at the beginning of the variable name. Also, just as important, this
function has built-in utilities to handle unbalanced data — when
variables occur more than once but every single period, which breaks
stats::reshape()
.
Either a data.frame
or panel_data
frame.
## We need a wide data frame, so we will make one from the long-format ## data included in the package. # Convert WageData to panel_data object wages <- panel_data(WageData, id = id, wave = t) # Convert wages to wide format wide_wages <- widen_panel(wages) # Note: wide_wages has variables in the following format: # var1_1, var1_2, var1_3, var2_1, var2_2, var2_3, etc. ## Not run: long_wages <- long_panel(wide_wages, prefix = "_", begin = 1, end = 7, id = "id", label_location = "end") ## End(Not run) # Note that in this case, the prefix and label_location arguments are # the defaults but are included just for clarity.
## We need a wide data frame, so we will make one from the long-format ## data included in the package. # Convert WageData to panel_data object wages <- panel_data(WageData, id = id, wave = t) # Convert wages to wide format wide_wages <- widen_panel(wages) # Note: wide_wages has variables in the following format: # var1_1, var1_2, var1_3, var2_1, var2_2, var2_3, etc. ## Not run: long_wages <- long_panel(wide_wages, prefix = "_", begin = 1, end = 7, id = "id", label_location = "end") ## End(Not run) # Note that in this case, the prefix and label_location arguments are # the defaults but are included just for clarity.
This is an interface to the internal functions that process data for
fdm()
, asym()
, and asym_gee()
.
make_diff_data( formula, data, id = NULL, wave = NULL, use.wave = FALSE, min.waves = 1, weights = NULL, offset = NULL, asym = FALSE, cumulative = FALSE, escape.names = FALSE, ... )
make_diff_data( formula, data, id = NULL, wave = NULL, use.wave = FALSE, min.waves = 1, weights = NULL, offset = NULL, asym = FALSE, cumulative = FALSE, escape.names = FALSE, ... )
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
asym |
Return asymmetric effects transformed data? Default is FALSE. |
cumulative |
Return cumulative positive/negative differences, most useful for fixed effects estimation and/or generalized linear models? Default is FALSE. |
escape.names |
Return only syntactically valid variable names? Default is FALSE. |
... |
Ignored. |
data("WageData") wages <- panel_data(WageData, id = id, wave = t) make_diff_data(wks ~ lwage + union, data = wages)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) make_diff_data(wks ~ lwage + union, data = wages)
This function allows users to make the changes to their data
that occur in wbm()
without having to fit the model.
make_wb_data( formula, data, id = NULL, wave = NULL, model = "w-b", detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, balance.correction = FALSE, dt.random = TRUE, dt.order = 1, weights = NULL, offset = NULL, interaction.style = c("double-demean", "demean", "raw"), ... )
make_wb_data( formula, data, id = NULL, wave = NULL, model = "w-b", detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, balance.correction = FALSE, dt.random = TRUE, dt.order = 1, weights = NULL, offset = NULL, interaction.style = c("double-demean", "demean", "raw"), ... )
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
model |
One of |
detrend |
Adjust within-subject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference). |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
balance.correction |
Correct between-subject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE. |
dt.random |
Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities. |
dt.order |
If detrending using |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
interaction.style |
The best way to calculate interactions in within
models is in some dispute. The conventional way ( |
... |
Additional arguments provided to |
A panel_data
object with the requested specification.
data("WageData") wages <- panel_data(WageData, id = id, wave = t) make_wb_data(lwage ~ wks + union | fem, data = wages)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) make_wb_data(lwage ~ wks + union | fem, data = wages)
This is similar to model.frame, but is designed specifically
for panel_data()
data frames. It's a workhorse in wbm()
but may be useful in scripting use as well.
model_frame(formula, data)
model_frame(formula, data)
formula |
A formula. Note that to get an individual-level mean with
incomplete data (e.g., panel attrition), you should use |
data |
A |
A panel_data()
frame with only the columns needed to fit
a model as described by the formula.
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model_frame(lwage ~ wks + exp, data = wages)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model_frame(lwage ~ wks + exp, data = wages)
These data come from the years 1990-1994 in the National Longitudinal Survey of Youth, with information about 581 individuals. These data are in the "wide" format for demonstration purposes.
nlsy
nlsy
A data frame with 581 rows and 16 variables:
Mother's age at birth
0 if boy, 1 if girl
1 if mother works, 0 if not)
1 if parents are married, 0 if not
1 if child is Hispanic, 0 if not
1 if child is black, 0 if not
Child's age at first interview
A measure of anti-social behavior antisocial behavior measured on a scale from 0 to 6, taken in 1990
A measure of anti-social behavior antisocial behavior measured on a scale from 0 to 6, taken in 1992
A measure of anti-social behavior antisocial behavior measured on a scale from 0 to 6, taken in 1994
A measure of self-esteem measured on a scale from 6 to 24, taken in 1990
A measure of self-esteem measured on a scale from 6 to 24, taken in 1992
A measure of self-esteem measured on a scale from 6 to 24, taken in 1994
1 if family is in poverty, 0 if not, in 1990
1 if family is in poverty, 0 if not, in 1992
1 if family is in poverty, 0 if not, in 1994
These data originate with the U.S. Department of Labor. The particular subset used here come from Paul Allison via Statistical Horizons: https://statisticalhorizons.com/wp-content/uploads/nlsy.dta
wbm
modelsThis S3 method allows you to retrieve either the number of
observations or number of entities in the data used to fit wbm
objects.
## S3 method for class 'wbm' nobs(object, entities = TRUE, ...)
## S3 method for class 'wbm' nobs(object, entities = TRUE, ...)
object |
A fitted model object. |
entities |
Should |
... |
Further arguments to be passed to methods. |
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks, data = wages) nobs(model)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks, data = wages) nobs(model)
Format your data for use with panelr.
panel_data(data, id = id, wave = wave, ...) as_pdata.frame(data) as_panel_data(data, ...) ## Default S3 method: as_panel_data(data, id = id, wave = wave, ...) ## S3 method for class 'pdata.frame' as_panel_data(data, ...) as_panel(data, ...)
panel_data(data, id = id, wave = wave, ...) as_pdata.frame(data) as_panel_data(data, ...) ## Default S3 method: as_panel_data(data, id = id, wave = wave, ...) ## S3 method for class 'pdata.frame' as_panel_data(data, ...) as_panel(data, ...)
data |
A data frame. |
id |
The name of the column (unquoted) that identifies
participants/entities. A new column will be created called |
wave |
The name of the column (unquoted) that identifies
waves or periods. A new column will be created called |
... |
Attributes for adding onto this method. See
|
A panel_data
object.
data("WageData") wages <- panel_data(WageData, id = id, wave = t)
data("WageData") wages <- panel_data(WageData, id = id, wave = t)
These methods facilitate fairly straightforward predictions
from wbgee
models.
## S3 method for class 'wbgee' predict( object, newdata = NULL, se.fit = FALSE, raw = FALSE, type = c("link", "response"), ... )
## S3 method for class 'wbgee' predict( object, newdata = NULL, se.fit = FALSE, raw = FALSE, type = c("link", "response"), ... )
object |
Object of class inheriting from |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
se.fit |
A switch indicating if standard errors are required. |
raw |
Is |
type |
Type of prediction (response or model term). Can be abbreviated. |
... |
further arguments passed to or from other methods. |
if (requireNamespace("geepack")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbgee(lwage ~ lag(union) + wks, data = wages) # By default, assumes you're using the processed data for newdata predict(model) }
if (requireNamespace("geepack")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbgee(lwage ~ lag(union) + wks, data = wages) # By default, assumes you're using the processed data for newdata predict(model) }
These methods facilitate fairly straightforward predictions
and simulations from wbm
models.
## S3 method for class 'wbm' predict( object, newdata = NULL, se.fit = FALSE, raw = FALSE, use.re.var = FALSE, re.form = NULL, type = c("link", "response"), allow.new.levels = TRUE, na.action = na.pass, ... ) ## S3 method for class 'wbm' simulate( object, nsim = 1, seed = NULL, use.u = FALSE, newdata = NULL, raw = FALSE, newparams = NULL, re.form = NA, type = c("link", "response"), allow.new.levels = FALSE, na.action = na.pass, ... )
## S3 method for class 'wbm' predict( object, newdata = NULL, se.fit = FALSE, raw = FALSE, use.re.var = FALSE, re.form = NULL, type = c("link", "response"), allow.new.levels = TRUE, na.action = na.pass, ... ) ## S3 method for class 'wbm' simulate( object, nsim = 1, seed = NULL, use.u = FALSE, newdata = NULL, raw = FALSE, newparams = NULL, re.form = NA, type = c("link", "response"), allow.new.levels = FALSE, na.action = na.pass, ... )
object |
a fitted model object |
newdata |
data frame for which to evaluate predictions. |
se.fit |
Include standard errors with the predictions? Note that these standard errors by default include only fixed effects variance. See details for more info. Default is FALSE. |
raw |
Is |
use.re.var |
If |
re.form |
(formula, |
type |
character string - either |
allow.new.levels |
logical if new levels (or NA values) in
|
na.action |
|
... |
When |
nsim |
positive integer scalar - the number of responses to simulate. |
seed |
an optional seed to be used in |
use.u |
(logical) if |
newparams |
new parameters to use in evaluating predictions,
specified as in the |
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks, data = wages) # By default, assumes you're using the processed data for newdata predict(model)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks, data = wages) # By default, assumes you're using the processed data for newdata predict(model)
summary
method for panel_data
objects.
## S3 method for class 'panel_data' summary(object, ..., by.wave = TRUE, by.id = FALSE, skim_with = NULL)
## S3 method for class 'panel_data' summary(object, ..., by.wave = TRUE, by.id = FALSE, skim_with = NULL)
object |
A |
... |
Optionally, unquoted variable names/expressions separated by
commas to be passed to |
by.wave |
(if |
by.id |
(if |
skim_with |
A closure from |
data("WageData") wages <- panel_data(WageData, id = id, wave = t) summary(wages, lwage, exp, wks)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) summary(wages, lwage, exp, wks)
These data come from the years 1979-1983 in the National Longitudinal Survey of Youth, with information about 1141 teenage women. These data are in the "wide" format for demonstration purposes.
teen_poverty
teen_poverty
A data frame with 1141 rows and 28 variables:
Unique identifier for the respondent
Age at first interview
1 if subject is black, 0 if not
1 if subject is in poverty, 0 if not, at time 1
1 if subject is in poverty, 0 if not, at time 2
1 if subject is in poverty, 0 if not, at time 3
1 if subject is in poverty, 0 if not, at time 4
1 if subject is in poverty, 0 if not, at time 5
1 if subject has had a child, 0 if not, at time 1
1 if subject has had a child, 0 if not, at time 2
1 if subject has had a child, 0 if not, at time 3
1 if subject has had a child, 0 if not, at time 4
1 if subject has had a child, 0 if not, at time 5
1 if subject lives with a spouse, 0 if not, at time 1
1 if subject lives with a spouse, 0 if not, at time 2
1 if subject lives with a spouse, 0 if not, at time 3
1 if subject lives with a spouse, 0 if not, at time 4
1 if subject lives with a spouse, 0 if not, at time 5
1 if subject is in school, 0 if not, at time 1
1 if subject is in school, 0 if not, at time 2
1 if subject is in school, 0 if not, at time 3
1 if subject is in school, 0 if not, at time 4
1 if subject is in school, 0 if not, at time 5
Hours worked during the week of the survey, at time 1
Hours worked during the week of the survey, at time 2
Hours worked during the week of the survey, at time 3
Hours worked during the week of the survey, at time 4
Hours worked during the week of the survey, at time 5
These data originate with the U.S. Department of Labor. The particular subset used here come from Paul Allison via Statistical Horizons: https://statisticalhorizons.com/wp-content/uploads/teenpov.dta
fdm
and asym
modelspanelr
provides methods to access fdm
and asym
data in a
tidy format
## S3 method for class 'asym' tidy(x, conf.int = FALSE, conf.level = 0.95, ...) ## S3 method for class 'fdm' tidy(x, conf.int = FALSE, conf.level = 0.95, ...) ## S3 method for class 'fdm' glance(x, ...)
## S3 method for class 'asym' tidy(x, conf.int = FALSE, conf.level = 0.95, ...) ## S3 method for class 'fdm' tidy(x, conf.int = FALSE, conf.level = 0.95, ...) ## S3 method for class 'fdm' glance(x, ...)
x |
An |
conf.int |
Logical indicating whether or not to include a confidence
interval in the tidied output. Defaults to |
conf.level |
The confidence level to use for the confidence interval if
|
... |
Ignored |
if (requireNamespace("clubSandwich")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- fdm(lwage ~ wks + union, data = wages) if (requireNamespace("generics")) { generics::tidy(model) } }
if (requireNamespace("clubSandwich")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- fdm(lwage ~ wks + union, data = wages) if (requireNamespace("generics")) { generics::tidy(model) } }
wbgee
modelspanelr
provides methods to access wbgee
data in a tidy format
## S3 method for class 'asym_gee' tidy(x, conf.int = FALSE, conf.level = 0.95, ...) ## S3 method for class 'wbgee' tidy(x, conf.int = FALSE, conf.level = 0.95, ...) ## S3 method for class 'wbgee' glance(x, ...)
## S3 method for class 'asym_gee' tidy(x, conf.int = FALSE, conf.level = 0.95, ...) ## S3 method for class 'wbgee' tidy(x, conf.int = FALSE, conf.level = 0.95, ...) ## S3 method for class 'wbgee' glance(x, ...)
x |
A |
conf.int |
Logical indicating whether or not to include a confidence
interval in the tidied output. Defaults to |
conf.level |
The confidence level to use for the confidence interval if
|
... |
Ignored |
if (requireNamespace("geepack")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbgee(lwage ~ lag(union) + wks, data = wages) if (requireNamespace("generics")) { generics::tidy(model) } }
if (requireNamespace("geepack")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbgee(lwage ~ lag(union) + wks, data = wages) if (requireNamespace("generics")) { generics::tidy(model) } }
wbm
modelspanelr
provides methods to access wbm
data in a tidy format
## S3 method for class 'wbm' tidy( x, conf.int = FALSE, conf.level = 0.95, effects = c("fixed", "ran_pars"), conf.method = "Wald", ran_prefix = NULL, ... ) ## S3 method for class 'wbm' glance(x, ...) ## S3 method for class 'summ.wbm' glance(x, ...) ## S3 method for class 'summ.wbm' tidy(x, ...)
## S3 method for class 'wbm' tidy( x, conf.int = FALSE, conf.level = 0.95, effects = c("fixed", "ran_pars"), conf.method = "Wald", ran_prefix = NULL, ... ) ## S3 method for class 'wbm' glance(x, ...) ## S3 method for class 'summ.wbm' glance(x, ...) ## S3 method for class 'summ.wbm' tidy(x, ...)
x |
An object of class |
conf.int |
whether to include a confidence interval |
conf.level |
confidence level for CI |
effects |
A character vector including one or more of "fixed"
(fixed-effect parameters); "ran_pars" (variances and covariances or
standard deviations and correlations of random effect terms);
"ran_vals" (conditional modes/BLUPs/latent variable estimates); or
"ran_coefs" (predicted parameter values for each group, as returned by
|
conf.method |
method for computing confidence intervals (see |
ran_prefix |
a length-2 character vector specifying the strings to use as prefixes for self- (variance/standard deviation) and cross- (covariance/correlation) random effects terms |
... |
Additional arguments (passed to |
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks, data = wages) if (requireNamespace("broom.mixed")) { broom.mixed::tidy(model) }
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks, data = wages) if (requireNamespace("broom.mixed")) { broom.mixed::tidy(model) }
This convenience function removes the special features of
panel_data
.
unpanel(panel)
unpanel(panel)
panel |
A |
An ungrouped tibble
.
data("WageData") wages <- panel_data(WageData, id = id, wave = t) wages_non_panel <- unpanel(wages)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) wages_non_panel <- unpanel(wages)
These data come from the years 1976-1982 in the Panel Study of Income Dynamics (PSID), with information about the demographics and earnings of 595 individuals.
WageData
WageData
A data frame with 4165 rows and 14 variables:
Unique identifier for each survey respondent
A number corresponding to each wave of the survey, 1 through 7
Weeks worked in the past year
Natural logarithm of earnings in the past year
Binary indicator whether respondent is a member of union (1 = union member)
Binary indicator for whether respondent is married (1 = married)
Binary indicator for whether respondent is a blue collar (= 0) or white collar (= 1) worker.
Binary indicator for whether respondent works in manufacturing (= 1)
Binary indicator for whether respondent lives in the South (= 1)
Binary indicator for whether respondent lives in a standard metropolitan area (SMSA; = 1)
Binary indicator for whether respondent is female (= 1)
Binary indicator for whether respondent is African-American (= 1)
Years of education
Years in the workforce.
These data are all over the place. This particular file was downloaded from Richard Williams at https://www3.nd.edu/~rwilliam/statafiles/wages.dta, though he doesn't claim ownership of these data.
The data were shared as a supplement to Baltagi (2005) at https://www.wiley.com/legacy/wileychi/baltagi3e/data_sets.html.
They were also shared as a supplement to Greene (2008) at https://pages.stern.nyu.edu/~wgreene/Text/Edition6/tablelist6.htm.
The data are also available in numerous other locations, including in
slightly different formats as Wages
in the plm
package and PSID7682
in the AER
package.
Fit "within-between" and several other regression variants for panel data via generalized estimating equations.
wbgee( formula, data, id = NULL, wave = NULL, model = "w-b", cor.str = c("ar1", "exchangeable", "unstructured"), detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, family = gaussian, balance.correction = FALSE, dt.random = TRUE, dt.order = 1, weights = NULL, offset = NULL, interaction.style = c("double-demean", "demean", "raw"), scale = FALSE, scale.response = FALSE, n.sd = 1, calc.fit.stats = TRUE, ... )
wbgee( formula, data, id = NULL, wave = NULL, model = "w-b", cor.str = c("ar1", "exchangeable", "unstructured"), detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, family = gaussian, balance.correction = FALSE, dt.random = TRUE, dt.order = 1, weights = NULL, offset = NULL, interaction.style = c("double-demean", "demean", "raw"), scale = FALSE, scale.response = FALSE, n.sd = 1, calc.fit.stats = TRUE, ... )
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
model |
One of |
cor.str |
Any correlation structure accepted by |
detrend |
Adjust within-subject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference). |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
family |
Use this to specify GLM link families. Default is |
balance.correction |
Correct between-subject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE. |
dt.random |
Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities. |
dt.order |
If detrending using |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
interaction.style |
The best way to calculate interactions in within
models is in some dispute. The conventional way ( |
scale |
If |
scale.response |
Should the response variable also be rescaled? Default
is |
n.sd |
How many standard deviations should you divide by for standardization? Default is 1, though some prefer 2. |
calc.fit.stats |
Calculate fit statistics? Default is TRUE, but occasionally poor-fitting models might trip up here. |
... |
Additional arguments provided to |
See the documentation for wbm()
for many details on formula syntax and
other arguments.
A wbgee
object, which inherits from geeglm
.
Jacob A. Long
Allison, P. (2009). Fixed effects regression models. Thousand Oaks, CA: SAGE Publications. https://doi.org/10.4135/9781412993869.d33
Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3, 133–153. https://doi.org/10.1017/psrm.2014.7
Curran, P. J., & Bauer, D. J. (2011). The disaggregation of within-person and between-person effects in longitudinal models of change. Annual Review of Psychology, 62, 583–619. https://doi.org/10.1146/annurev.psych.093008.100356
Giesselmann, M., & Schmidt-Catran, A. W. (2020). Interactions in fixed effects regression models. Sociological Methods & Research, 1–28. https://doi.org/10.1177/0049124120914934
McNeish, D. (2019). Effect partitioning in cross-sectionally clustered data without multilevel models. Multivariate Behavioral Research, Advance online publication. https://doi.org/10.1080/00273171.2019.1602504
McNeish, D., Stapleton, L. M., & Silverman, R. D. (2016). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22, 114-140. https://doi.org/10.1037/met0000078
Schunck, R., & Perales, F. (2017). Within- and between-cluster effects in
generalized linear mixed models: A discussion of approaches and the
xthybrid
command. The Stata Journal, 17, 89–115.
https://doi.org/10.1177/1536867X1701700106
if (requireNamespace("geepack")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbgee(lwage ~ lag(union) + wks | blk + fem | blk * lag(union), data = wages) summary(model) }
if (requireNamespace("geepack")) { data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbgee(lwage ~ lag(union) + wks | blk + fem | blk * lag(union), data = wages) summary(model) }
Fit "within-between" and several other regression variants for panel data in a multilevel modeling framework.
wbm( formula, data, id = NULL, wave = NULL, model = "w-b", detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, family = gaussian, balance.correction = FALSE, dt.random = TRUE, dt.order = 1, pR2 = TRUE, pvals = TRUE, t.df = "Satterthwaite", weights = NULL, offset = NULL, interaction.style = c("double-demean", "demean", "raw"), scale = FALSE, scale.response = FALSE, n.sd = 1, dt_random = dt.random, dt_order = dt.order, balance_correction = balance.correction, ... )
wbm( formula, data, id = NULL, wave = NULL, model = "w-b", detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, family = gaussian, balance.correction = FALSE, dt.random = TRUE, dt.order = 1, pR2 = TRUE, pvals = TRUE, t.df = "Satterthwaite", weights = NULL, offset = NULL, interaction.style = c("double-demean", "demean", "raw"), scale = FALSE, scale.response = FALSE, n.sd = 1, dt_random = dt.random, dt_order = dt.order, balance_correction = balance.correction, ... )
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
model |
One of |
detrend |
Adjust within-subject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference). |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
family |
Use this to specify GLM link families. Default is |
balance.correction |
Correct between-subject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE. |
dt.random |
Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities. |
dt.order |
If detrending using |
pR2 |
Calculate a pseudo R-squared? Default is TRUE, but in some cases may cause errors or add computation time. |
pvals |
Calculate p values? Default is TRUE but for some complex
linear models, this may take a long time to compute using the |
t.df |
For linear models only. User may choose the method for
calculating the degrees of freedom in t-tests. Default is
|
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
interaction.style |
The best way to calculate interactions in within
models is in some dispute. The conventional way ( |
scale |
If |
scale.response |
Should the response variable also be rescaled? Default
is |
n.sd |
How many standard deviations should you divide by for standardization? Default is 1, though some prefer 2. |
dt_random |
Deprecated. Equivalent to |
dt_order |
Deprecated. Equivalent to |
balance_correction |
Deprecated. Equivalent to |
... |
Additional arguments provided to |
Formula syntax
The within-between models, and multilevel panel models more generally,
distinguish between time-varying and time-invariant predictors. These are,
as they sound, variables that are either measured repeatedly (in every wave)
in the case of time-varying predictors or only once in the case of
time-invariant predictors. You need to specify these separately in the
formula to tell the model which variables you expect to change over time and
which will not. The primary way of doing so is via the |
operator.
As an example, we can look at the WageData included in this
package. We will create a model that predicts the logarithm of the
individual's wages (lwage
) with their union status (union
), which can
change over time, and their race (blk
; dichotomized as black or
non-black),
which does not change throughout the period of study. Our formula will look
like this:
lwage ~ union | blk
Put time-varying variables before the first |
and time-invariant
variables afterwards. You can specify lags like lag(union)
for time-varying
variables; for more than 1 lag, include the number: lag(union, 2)
.
After the first |
go the time-invariant variables. Note that if you put a
time-varying variable here, what you get is the observed value rather than
one adjusted to isolate within-entity effects. You may also take a
time-varying variable — let's say weeks worked (wks
) — and use
imean(wks)
to include the individual's mean across all waves as a
predictor while omitting the per-wave measures.
There is also a place for a second |
. Here you can specify cross-level
interactions (within-level interactions can be specified here as well).
If I wanted the interaction term for union
and blk
— to see whether
the effect of union status depended on one's race — I would specify the
formula this way:
lwage ~ union | blk | union * blk
Another use for the post-second |
section of the formula is for changing
the random effects specification. By default, only a random intercept is
specified in the call to lme4::lmer()
/lme4::glmer()
. If you would like
to specify other random slopes, include them here using the typical lme4
syntax:
lwage ~ union | blk | (union | id)
You can also include the wave variable in a random effects term to specify a latent growth curve model:
lwage ~ union | blk + t | (t | id)
One last thing to know: If you want to use the second |
but not the first,
put a 1 or 0 after the first, like this:
lwage ~ union | 1 | (union | id)
Of course, with no time-invariant variables, you need no |
operators at
all.
Models
As a convenience, wbm
does the heavy lifting for specifying the
within-between model correctly. As a side effect it only
takes a few easy tweaks to specify the model slightly differently. You
can change this behavior with the model
argument.
By default, the argument is "w-b"
(equivalently, "within-between"
).
This means, for each time-varying predictor, you have two types of
variables in the model. The "between" effect is represented by the
individual-level mean for each entity (e.g., each respondent to a panel
survey). The "within" effect is represented by each wave's measure with
the individual-level mean subtracted. Some refer to this as "de-meaning."
Thinking in a Hausman test framework — with the within-between model as
described here — you should expect the within and between
coefficients to be the same if a random effects model were appropriate.
The contextual model is very similar (use argument "contextual"
). In
some situations, this will be more intuitive to interpret. Empirically,
the only difference compared to the within-between specification is that
the contextual model does not subtract the individual-level means from the
wave-level measures. This also changes the interpretation of the
between-subject coefficients: In the contextual model, they are the
difference between the within and between effects. If there's no
difference between within and between effects, then, the coefficients will
be 0.
To fit a random effects model, use either "between"
or "random"
. This
involves no de-meaning and no individual-level means whatsoever.
To fit a fixed effects model, use either "within"
or "fixed"
. Any
between-subjects terms in the formula will be ignored. The time-varying
variables will be de-meaned, but the individual-level mean is not included
in the model.
A wbm
object, which inherits from merMod
.
Jacob A. Long
Allison, P. (2009). Fixed effects regression models. Thousand Oaks, CA: SAGE Publications. https://doi.org/10.4135/9781412993869.d33
Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3, 133–153. https://doi.org/10.1017/psrm.2014.7
Curran, P. J., & Bauer, D. J. (2011). The disaggregation of within-person and between-person effects in longitudinal models of change. Annual Review of Psychology, 62, 583–619. https://doi.org/10.1146/annurev.psych.093008.100356
Giesselmann, M., & Schmidt-Catran, A. (2018). Interactions in fixed effects regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin, German Institute for Economic Research. Retrieved from https://ideas.repec.org/p/diw/diwwpp/dp1748.html
Schunck, R., & Perales, F. (2017). Within- and between-cluster effects in
generalized linear mixed models: A discussion of approaches and the
xthybrid
command. The Stata Journal, 17, 89–115.
https://doi.org/10.1177/1536867X1701700106
wbm_stan()
for a Bayesian estimation option.
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks | blk + fem | blk * lag(union), data = wages) summary(model)
data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm(lwage ~ lag(union) + wks | blk + fem | blk * lag(union), data = wages) summary(model)
A near-equivalent of wbm()
that instead uses Stan,
via rstan and brms.
wbm_stan( formula, data, id = NULL, wave = NULL, model = "w-b", detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, model.cor = FALSE, family = gaussian, fit_model = TRUE, balance.correction = FALSE, dt.random = TRUE, dt.order = 1, chains = 3, iter = 2000, scale = FALSE, save_ranef = FALSE, interaction.style = c("double-demean", "demean", "raw"), weights = NULL, offset = NULL, ... )
wbm_stan( formula, data, id = NULL, wave = NULL, model = "w-b", detrend = FALSE, use.wave = FALSE, wave.factor = FALSE, min.waves = 2, model.cor = FALSE, family = gaussian, fit_model = TRUE, balance.correction = FALSE, dt.random = TRUE, dt.order = 1, chains = 3, iter = 2000, scale = FALSE, save_ranef = FALSE, interaction.style = c("double-demean", "demean", "raw"), weights = NULL, offset = NULL, ... )
formula |
Model formula. See details for crucial
info on |
data |
The data, either a |
id |
If |
wave |
If |
model |
One of |
detrend |
Adjust within-subject effects for trends in the predictors? Default is FALSE, but some research suggests this is a better idea (see Curran and Bauer (2011) reference). |
use.wave |
Should the wave be included as a predictor? Default is FALSE. |
wave.factor |
Should the wave variable be treated as an unordered factor instead of continuous? Default is FALSE. |
min.waves |
What is the minimum number of waves an individual must
have participated in to be included in the analysis? Default is |
model.cor |
Do you want to model residual autocorrelation?
This is often appropriate for linear models ( |
family |
Use this to specify GLM link families. Default is |
fit_model |
Fit the model? Default is TRUE. If FALSE, only the model code is returned. |
balance.correction |
Correct between-subject effects for unbalanced panels following the procedure in Curran and Bauer (2011)? Default is FALSE. |
dt.random |
Should the detrending procedure be performed with a random slope for each entity? Default is TRUE but for short panels FALSE may be better, fitting a trend for all entities. |
dt.order |
If detrending using |
chains |
How many Markov chains should be used? Default is 3, to leave you with one unused thread if you're on a typical dual-core machine. |
iter |
How many iterations, including warmup? Default is 2000, leaving 1000 per chain after warmup. For some models and data, you may need quite a few more. |
scale |
Standardize predictors? This can speed up model fit. Default is FALSE. |
save_ranef |
Save random effect estimates? This can be crucial for predicting from the model and for certain post-estimation procedures. On the other hand, it drastically increases the size of the resulting model. Default is FALSE. |
interaction.style |
The best way to calculate interactions in within
models is in some dispute. The conventional way ( |
weights |
If using weights, either the name of the column in the data that contains the weights or a vector of the weights. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during
fitting. This should be |
... |
Additional arguments passed on to |
See wbm()
for details on the formula syntax, model types,
and some other stuff.
A wbm_stan
object, which is a list containing a model
object
with the brm
model and a stan_code
object with the model code.
If fit_model = FALSE
, instead a list is returned containing a stan_code
object and a stan_data
object, leaving you with the tools you need to
run the model yourself using rstan
.
Jacob A. Long
## Not run: data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm_stan(lwage ~ lag(union) + wks | blk + fem | blk * lag(union), data = wages, chains = 1, iter = 2000) summary(model) ## End(Not run)
## Not run: data("WageData") wages <- panel_data(WageData, id = id, wave = t) model <- wbm_stan(lwage ~ lag(union) + wks | blk + fem | blk * lag(union), data = wages, chains = 1, iter = 2000) summary(model) ## End(Not run)
wbm
) classModels fit using wbm()
return values of this class, which
inherits from merMod-class
.
call_info
A list of metadata about the arguments used.
call
The actual function call.
summ
The jtools::summ()
object returned from calling it on the
merMod
object.
summ_atts
The attributes of the summ
object.
orig_data
The data provided to the data
argument in the function
call.
This function takes panel_data()
objects as input as converts
them to wide format for use in SEM and other situations when such a format
is needed.
widen_panel(data, separator = "_", ignore.attributes = FALSE, varying = NULL)
widen_panel(data, separator = "_", ignore.attributes = FALSE, varying = NULL)
data |
The |
separator |
When the variables are labeled with the wave number,
what should separate the variable name and wave number? By default,
it is "_". In other words, a variable named |
ignore.attributes |
If the |
varying |
If you want to skip the checks for whether variables are
varying and specify yourself, as is done with |
This is a wrapper for stats::reshape()
, which is renowned for being
pretty confusing to use. This function automatically detects which of the
variables vary over time and which don't, not appending wave information
to constants.
A data.frame with 1 row per respondent.
wages <- panel_data(WageData, id = id, wave = t) wide_wages <- widen_panel(wages)
wages <- panel_data(WageData, id = id, wave = t) wide_wages <- widen_panel(wages)