Quantcast
Channel: ggplot – Strenge Jacke!
Viewing all 35 articles
Browse latest View live

More support for Bayesian analysis in the sj!-packages #rstats #rstan #brms

$
0
0

Another quick preview of my R-packages, especially sjPlot, which now also support brmsfit-objects from the great brms-package. To demonstrate the new features, I load all my „core“-packages at once, using the strengejacke-package, which is only available from GitHub. This package simply loads four packages (sjlabelled, sjmisc, sjstats and sjPlot).

First, I fit two sample models, one with brms and one with rstanarm:

#> install pkg "strengejacke" via GitHub:
#> devtools::install_github("strengejacke/strengejacke")
library(strengejacke)
library(ggplot2)
library(lme4)
library(glmmTMB)
library(brms)
library(rstanarm)

data(Owls)
m1 <- brm(
  SiblingNegotiation ~ FoodTreatment + ArrivalTime + SexParent + (1 | Nest),
  data = Owls,
  family = zero_inflated_poisson(link = "log", link_zi = "logit")
)
m2 <- stan_glmer.nb(
  SiblingNegotiation ~ FoodTreatment + ArrivalTime + SexParent + (1 | Nest),
  data = Owls
)
m3 <- stan_lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)

Next, some summary statistics, at first simply the High Density Interval, obtained from the hdi()-function. You can define the probabilites of the interval, however, here I simply use the defaults (which is the 90% HDI).

hdi(m1)
#> # A tibble: 6 x 3
#>                      term       hdi.low      hdi.high
#>
#> 1             b_Intercept  3.798693e+00  4.584998e+00
#> 2 b_FoodTreatmentSatiated -2.795021e-01 -1.586102e-01
#> 3           b_ArrivalTime -9.768530e-02 -6.616026e-02
#> 4         b_SexParentMale -7.766154e-02  3.794376e-02
#> 5                      zi  2.304234e-01  2.900676e-01
#> 6                    lp__ -2.020651e+03 -2.002387e+03

hdi(m2)
#> # A tibble: 5 x 3
#>                    term     hdi.low    hdi.high
#>
#> 1           (Intercept)  3.82688888  5.93003148
#> 2 FoodTreatmentSatiated -0.86327579 -0.51376279
#> 3           ArrivalTime -0.15414266 -0.07134095
#> 4         SexParentMale -0.09371495  0.24440447
#> 5 reciprocal_dispersion  0.77345433  1.00889671

Next, I implemented an own tidyr-function, similar to broom’s tidy(), called tidy_stan(). Unlike broom’s tidy-function, tidy_stan() computes the HDI instead of credibility intervals, and you may return fixed or random effects only, or both. Furthermore, tidy_stan() also returns the ratio of effective numbers of samples, n_eff, and Rhat statistics.

tidy_stan(m1)
#> # A tibble: 6 x 7
#>                      term  estimate std.error   hdi.low  hdi.high n_eff  Rhat
#>
#> 1             b_Intercept     4.206     0.241     3.819     4.581 1.000 1.000
#> 2 b_FoodTreatmentSatiated    -0.220     0.037    -0.279    -0.162 1.000 1.000
#> 3           b_ArrivalTime    -0.082     0.009    -0.096    -0.065 1.000 1.000
#> 4         b_SexParentMale    -0.021     0.037    -0.077     0.036 1.000 1.000
#> 5                      zi     0.259     0.019     0.230     0.288 1.000 0.999
#> 6                    lp__ -2011.050     5.537 -2020.651 -2002.925 0.186 1.001

tidy_stan(m2)
#> # A tibble: 5 x 7
#>                    term estimate std.error hdi.low hdi.high n_eff  Rhat
#>
#> 1           (Intercept)    4.873     0.652   3.847    5.880     1 1.000
#> 2 FoodTreatmentSatiated   -0.693     0.112  -0.860   -0.522     1 1.000
#> 3           ArrivalTime   -0.115     0.025  -0.154   -0.074     1 1.000
#> 4         SexParentMale    0.071     0.102  -0.088    0.241     1 0.999
#> 5 reciprocal_dispersion    0.883     0.073   0.776    1.005     1 1.000

Finally, the new and generic plot_model() function in sjPlot also supports brmsfit or stanreg objects, and plots estimates, random effects and marginal effects.

The coefficients-plot was already shown in a previous post. It shows the 50% and 89% HDI and the posterior median. The style can be changed using the bpe and bpe.style arguments.

theme_set(theme_sjplot())
plot_model(m1, axis.lim = c(.6, 1.4))
plot_model(m2, bpe = "mean", bpe.style = "dot", colors = "grey30")

Coefficients, brms-model

Coefficients, rstanarm-model

Random effects are displayed in a similar way. For models with random slopes and intercepts, random effects are displayed in a grid layout (use grid = FALSE to create a separate plot for each random effect).

plot_model(m1, type = "re")
plot_model(m3, type = "re")

Random effects, brms-model

Random effects, rstanarm-model

Marginal effects in sjPlot are based on the ggeffects-package. By default, predictions are computed with rstantools::posterior_linpred().

plot_model(m1, type = "pred", terms = c("ArrivalTime", "FoodTreatment"))
plot_model(m2, type = "pred", terms = c("ArrivalTime", "FoodTreatment"))

Marginal effects, brms-model

Marginal effects, rstanarm-model

The work on sjPlot for the next CRAN release is almost done… I hope to submit the sjPlot-update in the course of the next week! In the meantime, you may try out the new features using the GitHub-versions of my R-packages – at least sjstats and ggeffects are required.


Tagged: brms, data visualization, ggplot, R, rstan, rstanarm, rstats, sjPlot, Stan

Marginal Effects for Regression Models in R #rstats #dataviz

$
0
0

Regression coefficients are typically presented as tables that are easy to understand. Sometimes, estimates are difficult to interpret. This is especially true for interaction or transformed terms (quadratic or cubic terms, polynomials, splines), in particular for more complex models. In such cases, coefficients are no longer interpretable in a direct way and marginal effects are far easier to understand. Specifically, the visualization of marginal effects makes it possible to intuitively get the idea of how predictors and outcome are associated, even for complex models.

The ggeffects-package (Lüdecke 2018) aims at easily calculating marginal effects for a broad range of different regression models, beginning with classical models fitted with lm() or glm() to complex mixed models fitted with lme4 and glmmTMB or even Bayesian models from brms and rstanarm. The goal of the ggeffects-package is to provide a simple, user-friendly interface to calculate marginal effects, which is mainly achieved by one function: ggpredict(). Independent from the type of regression model, the output is always the same, a data frame with a consistent structure.

The idea behind this function is to compute (and visualize) the relationship between a model predictor (independent variable) and the model response (dependent variable). The predictor of interest needs to be specified in the terms-argument.

data(mtcars)
m <- lm(mpg ~ hp + wt + cyl + am, data = mtcars)
ggpredict(m, "cyl")
#> # A tibble: 3 x 5
#>       x predicted conf.low conf.high group
#>                  
#> 1     4      21.7     19.1      24.4 1    
#> 2     6      20.2     19.3      21.1 1    
#> 3     8      18.7     16.5      21.0 1

The relationship can be differentiated depending on further predictors, which is useful e.g. for interaction terms. Up to two further predictors that indicate the „grouping“ structure can be used to calculate marginal effects. The names of these predictors need to be passed as character vector to ggpredict().

m <- lm(mpg ~ wt * cyl + am + wt + cyl, data = mtcars)
p <- ggpredict(m, c("wt", "cyl"))
p
#> # A tibble: 27 x 5
#>        x predicted conf.low conf.high group
#>                   
#>  1   1.5      31.7     28.5      34.9 4    
#>  2   1.5      26.3     23.5      29.2 6    
#>  3   1.5      21.0     16.8      25.1 8    
#>  4   2        28.7     26.6      30.8 4    
#>  5   2        24.2     22.2      26.3 6    
#>  6   2        19.8     16.4      23.1 8    
#>  7   2.5      25.7     24.3      27.2 4    
#>  8   2.5      22.1     20.7      23.5 6    
#>  9   2.5      18.5     15.9      21.2 8    
#> 10   3        22.8     20.8      24.7 4    
#> # ... with 17 more rows

There’s a plot()-method, based on ggplot2:

plot(p)

The simple approach of ggpredict() can be used for all supported regression models. Thus, to calculate marginal effects with ggpredict(), it makes no differences if the model is a simpel linear model or a negative biniomial multilevel model or a cumulative link model etc. In case of cumulative link models, ggpredict() automatically takes care of proper grouping, in this case for the different levels of the response variable:

library(MASS)
library(ordinal)
data(housing)
m <- clm(Sat ~ Type * Cont + Infl, weights = Freq, data = housing)
p <- ggpredict(m, c("Cont", "Type"))
plot(p)

ggeffects also allows easily calculating marginal effects at specific levels of other predictors. This is particularly useful for interaction effects with continuous variables. In the following example, both variables of the interaction term have a larger range of values, which obscure the moderating effect:

m <- lm(mpg ~ wt * hp + am + wt, data = mtcars)
p <- ggpredict(m, c("hp", "wt"))
plot(p)

However, you can directly specify certain values, at which marginal effects should be calculated, or use „shortcuts“ that compute convenient values, like mean +/- 1 SD etc.

p <- ggpredict(m, c("hp", "wt [meansd]"))
plot(p)

The latest update of ggeffects on CRAN introduced some new features. There is a dedicated website that describes all the details of this package, including some vignettes with lots of examples.

Marginal Effects for (mixed effects) regression models #rstats

$
0
0

ggeffects (CRAN, website) is a package that computes marginal effects at the mean (MEMs) or representative values (MERs) for many different models, including mixed effects or Bayesian models. One of the advantages of the package is its easy-to-use interface: No matter if you fit a simple or complex model, with interactions or splines, the function call is always the same. This also holds true for the returned output, which is always a data frame with the same, consistent column names.

The past package-update introduced some new features I wanted to describe here: a revised print()-method as well as a new opportunity to plot marginal effects at different levels of random effects in mixed models…

The new print()-method

The former print()-method simply showed the first predicted values, including confidence intervals. For numeric predictor variables with many values, you could, for instance, only see the first 10 of more than 100 predicted values. While it makes sense to shorten the (console-)output, there was no information about the predictions for the last or other „representative“ values of the term in question. Now, the print()-method automatically prints a selection of representative values, so you get a quick and clean impression of the range of predicted values for continuous variables:

library(ggeffects)
data(efc)
efc$c172code <- as.factor(efc$c172code)
fit <- lm(barthtot ~ c12hour * c172code + neg_c_7, data = efc)

ggpredict(fit, "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX 
#> # x = average number of hours of care per week 
#> 
#>    x predicted std.error conf.low conf.high
#>    0    72.804     2.516   67.872    77.736
#>   20    68.060     2.097   63.951    72.170
#>   45    62.131     1.824   58.555    65.706
#>   65    57.387     1.886   53.691    61.083
#>   85    52.643     2.179   48.373    56.913
#>  105    47.900     2.626   42.752    53.047
#>  125    43.156     3.164   36.955    49.357
#>  170    32.482     4.531   23.602    41.363
#> 
#> Adjusted for:
#> * c172code =     1
#> *  neg_c_7 = 11.83

If you print predicted values of a term, grouped by the levels of another term (which makes sense in the above example due to the present interaction), the print()-method automatically adjusts the range of printed values to keep the console-output short. In the following example, only 6 instead of 8 values per „block“ are shown:

ggpredict(fit, c("c12hour", "c172code"))
#> 
#> # Predicted values of Total score BARTHEL INDEX 
#> # x = average number of hours of care per week 
#> 
#> # c172code = 1
#>    x predicted std.error conf.low conf.high
#>    0    72.804     2.516   67.872    77.736
#>   30    65.689     1.946   61.874    69.503
#>   55    59.759     1.823   56.186    63.331
#>   85    52.643     2.179   48.373    56.913
#>  115    45.528     2.887   39.870    51.186
#>  170    32.482     4.531   23.602    41.363
#> 
#> # c172code = 2
#>    x predicted std.error conf.low conf.high
#>    0    76.853     1.419   74.073    79.633
#>   30    68.921     1.115   66.737    71.106
#>   55    62.311     1.122   60.112    64.510
#>   85    54.379     1.438   51.560    57.198
#>  115    46.447     1.934   42.656    50.238
#>  170    31.905     3.007   26.011    37.800
#> 
#> # c172code = 3
#>    x predicted std.error conf.low conf.high
#>    0    73.862     2.502   68.958    78.766
#>   30    66.925     1.976   63.053    70.798
#>   55    61.145     2.155   56.920    65.369
#>   85    54.208     2.963   48.400    60.016
#>  115    47.271     4.057   39.320    55.222
#>  170    34.554     6.303   22.200    46.907
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

Marginal effects at specific levels of random effects

Marginal effects can also be calculated for each group level in mixed models. Simply add the name of the related random effects term to the terms-argument, and set type = "re". In the following example, we fit a linear mixed model and first simply plot the marginal effetcs, not conditioned on random effects.

library(sjlabelled)
library(lme4)
data(efc)
efc$e15relat <- as_label(efc$e15relat)
m <- lmer(neg_c_7 ~ c12hour + c160age + c161sex + (1 | e15relat), data = efc)
me <- ggpredict(m, terms = "c12hour")
plot(me)

To compute marginal effects for each grouping level, add the related random term to the terms-argument. In this case, confidence intervals are not calculated, but marginal effects are conditioned on each group level of the random effects.

me <- ggpredict(m, terms = c("c12hour", "e15relat"), type = "re")
plot(me)

Marginal effects, conditioned on random effects, can also be calculated for specific levels only. Add the related values into brackets after the variable name in the terms-argument.

me <- ggpredict(m, terms = c("c12hour", "e15relat [child,cousin]"), type = "re")
plot(me)

If the group factor has too many levels, you can also take a random sample of all possible levels and plot the marginal effects for this subsample of group levels. To do this, use term = "groupfactor [sample=n]".

data("sleepstudy")
m <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
me <- ggpredict(m, terms = c("Days", "Subject [sample=8]"), type = "re")
plot(me)

ggeffects 0.8.0 now on CRAN: marginal effects for regression models #rstats

$
0
0

I’m happy to announce that version 0.8.0 of my ggeffects-package is on CRAN now. The update has fixed some bugs from the previous version and comes along with many new features or improvements. One major part that was addressed in the latest version are fixed and improvements for mixed models, especially zero-inflated mixed models (fitted with the glmmTMB-package).

In this post, I want to demonstrate the different options to calculate and visualize marginal effects from mixed models.

Marginal effects for mixed effects models

Basically, the type of predictions, i.e. whether to account for the uncertainty of random effects or not, can be set with the type-argument.

Marginal effects conditioned on fixed effects

The default, type = "fe", means that predictions are on the population-level and do not account for the random effect variances.

library(ggeffects)
library(lme4)
data(sleepstudy)
m <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)

pr <- ggpredict(m, "Days")
pr
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted std.error conf.low conf.high
#>  0   251.405     6.825  238.029   264.781
#>  1   261.872     6.787  248.570   275.174
#>  2   272.340     7.094  258.435   286.244
#>  3   282.807     7.705  267.705   297.909
#>  5   303.742     9.581  284.963   322.520
#>  6   314.209    10.732  293.174   335.244
#>  7   324.676    11.973  301.210   348.142
#>  9   345.611    14.629  316.939   374.283
#> 
#> Adjusted for:
#> * Subject = 0 (population-level)
plot(pr)

Marginal effects conditioned on fixed effects with random effects uncertainty

When type = "re", the predicted values are still on the population-level. However, the random effect variances are taken into account, meaning that the prediction interval becomes larger. More technically speaking, type = "re" accounts for the uncertainty of the fixed effects conditional on the estimates of the random-effect variances and conditional modes (BLUPs).

The random-effect variance is the mean random-effect variance. Calculation is based on the proposal from Johnson et al. 2014, which is applicable for mixed models with more complex random effects structures.

As can be seen, compared to the previous example with type = "fe", predicted values are identical (both on the population-level). However, standard errors, and thus the resulting confidence (or prediction) intervals are much larger .

pr <- ggpredict(m, "Days", type = "re")
pr
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted std.error conf.low conf.high
#>  0   251.405    41.769  169.539   333.271
#>  1   261.872    41.763  180.019   343.726
#>  2   272.340    41.814  190.386   354.293
#>  3   282.807    41.922  200.642   364.972
#>  5   303.742    42.307  220.822   386.661
#>  6   314.209    42.582  230.749   397.669
#>  7   324.676    42.912  240.571   408.781
#>  9   345.611    43.727  259.907   431.315
#> 
#> Adjusted for:
#> * Subject = 0 (population-level)
plot(pr)

The reason why both type = "fe" and type = "re" return predictions at population-level is because ggpredict() returns predicted values of the response at specific levels of given model predictors, which are defined in the data frame that is passed to the newdata-argument (of predict()). The data frame requires data from all model terms, including random effect terms. This again requires to choose certain levels or values also for each random effect term, or to set those terms to zero or NA (for population-level). Since there is no general rule, which level(s) of random effect terms to choose in order to represent the random effects structure in the data, using the population-level seems the most clear and consistent approach.

Marginal effects conditioned on fixed effects and specific group levels

To get predicted values for a specific level of the random effect term, simply define this level in the condition-argument.

ggpredict(m, "Days", type = "re", condition = c(Subject = 330))
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted std.error conf.low conf.high
#>  0   275.096    41.769  193.230   356.961
#>  1   280.749    41.763  198.895   362.602
#>  2   286.402    41.814  204.448   368.355
#>  3   292.054    41.922  209.889   374.220
#>  5   303.360    42.307  220.440   386.280
#>  6   309.013    42.582  225.554   392.473
#>  7   314.666    42.912  230.561   398.772
#>  9   325.972    43.727  240.268   411.676

Marginal effects based on simulations

Finally, it is possible to obtain predicted values by simulating from the model, where predictions are based on simulate().

pr <- ggpredict(m, "Days", type = "sim")
pr
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted conf.low conf.high
#>  0   251.440  200.838   301.996
#>  1   261.860  212.637   311.678
#>  2   272.157  221.595   321.667
#>  3   282.800  233.416   332.738
#>  5   303.770  252.720   353.472
#>  6   314.146  264.651   363.752
#>  7   324.606  273.460   374.462
#>  9   345.319  295.069   394.735
#> 
#> Adjusted for:
#> * Subject = 0 (population-level)
plot(pr)

Marginal effects for zero-inflated mixed models

For zero-inflated mixed effects models, typically fitted with the glmmTMB-package, predicted values can be conditioned on

  • the fixed effects of the conditional model only (type = "fe")
  • the fixed effects and zero-inflation component (type = "fe.zi")
  • the fixed effects of the conditional model only (population-level), taking the random-effect variances into account (type = "re")
  • the fixed effects and zero-inflation component (population-level), taking the random-effect variances into account (type = "re.zi")
  • all model parameters (type = "sim")
library(glmmTMB)
data(Salamanders)
m <- glmmTMB(
  count ~ spp + mined + (1 | site), 
  ziformula = ~ spp + mined, 
  family = truncated_poisson, 
  data = Salamanders
)

Marginal effects conditioned on the count model

Similar to mixed models without zero-inflation component, type = "fe" and type = "re" for glmmTMB-models (with zero-inflation) both return predictions on the population-level, where the latter option accounts for the uncertainty of the random effects. In short, predict(..., type = "link") is called (however, predictions are finally back-transformed to the response scale).

pr <- ggpredict(m, "spp")
pr
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.935     0.206    0.624     1.400
#>  2     0.555     0.308    0.304     1.015
#>  3     1.171     0.192    0.804     1.704
#>  4     0.769     0.241    0.480     1.233
#>  5     1.786     0.182    1.250     2.550
#>  6     1.713     0.182    1.200     2.445
#>  7     0.979     0.196    0.667     1.437
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)
plot(pr)

For models with log-link, it make sense to use a log-transformed y-axis as well, to get proportional confidence intervals for the plot. You can do this by using the log.y-argument:

plot(pr, log.y = TRUE)

Marginal effects conditioned on the count model with random effects uncertainty

ggpredict(m, "spp", type = "re")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.935     0.309    0.510     1.714
#>  2     0.555     0.384    0.261     1.180
#>  3     1.171     0.300    0.650     2.107
#>  4     0.769     0.333    0.400     1.478
#>  5     1.786     0.294    1.004     3.175
#>  6     1.713     0.294    0.964     3.045
#>  7     0.979     0.303    0.541     1.772
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

Marginal effects conditioned on the count and zero-inflation model

For type = "fe.zi", the predicted response value is the expected value mu*(1-p) without conditioning on random effects. Since the zero inflation and the conditional model are working in “opposite directions”, a higher expected value for the zero inflation means a lower response, but a higher value for the conditional model means a higher response. While it is possible to calculate predicted values with predict(..., type = "response"), standard errors and confidence intervals can not be derived directly from the predict()-function. Thus, confidence intervals for type = "fe.zi" are based on quantiles of simulated draws from a multivariate normal distribution (see also Brooks et al. 2017, pp.391-392 for details).

ggpredict(m, "spp", type = "fe.zi")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.138     0.045    0.052     0.224
#>  2     0.017     0.009    0.000     0.035
#>  3     0.245     0.072    0.109     0.381
#>  4     0.042     0.018    0.007     0.076
#>  5     0.374     0.108    0.166     0.582
#>  6     0.433     0.117    0.208     0.657
#>  7     0.205     0.063    0.082     0.328
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

Marginal effects conditioned on the count and zero-inflation model with random effects uncertainty

For type = "re.zi", the predicted response value is the expected value mu*(1-p), accounting for the random-effect variances. Prediction intervals are calculated in the same way as for type = "fe.zi", except that the mean random effect variance is considered for the confidence intervals.

ggpredict(m, "spp", type = "re.zi")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.138     0.235    0.032     0.354
#>  2     0.017     0.231    0.000     0.054
#>  3     0.245     0.243    0.065     0.609
#>  4     0.042     0.231    0.002     0.126
#>  5     0.374     0.257    0.098     0.932
#>  6     0.433     0.263    0.122     1.060
#>  7     0.205     0.239    0.054     0.510
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

Marginal effects simulated from zero-inflated models

Finally, it is possible to obtain predicted values by simulating from the model, where predictions are based on simulate() (see Brooks et al. 2017, pp.392-393 for details). To achieve this, use type = "sim".

ggpredict(m, "spp", type = "sim")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     1.089     1.288        0     4.131
#>  2     0.292     0.667        0     2.306
#>  3     1.520     1.550        0     5.241
#>  4     0.536     0.946        0     3.087
#>  5     2.212     2.125        0     7.153
#>  6     2.289     2.065        0     7.121
#>  7     1.314     1.367        0     4.697
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

References

  • Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378–400.
  • Johnson PC, O’Hara RB. 2014. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946. (doi: 10.1111/2041-210X.12225)

Likert-plots and grouped Likert-plots #rstats

$
0
0

I’m pleased to anounce an update of my sjPlot-package, a package for Data Visualization for Statistics in Social Science. Thanks to the help of Alexander, it is now possible to create grouped Likert-plots. This is what I want to show in this post…

First, we load the required packages and sample data. We want to plot several items from an index, where all these items are 4-point Likert scales. The items deal with coping of family caregivers, i.e. how well they cope with their role of caring for an older relative.

To find the required variables in the data set, we search all variables for a specific pattern. We know that the variables from the COPE-index all have a cop in their variable name. We can easily search for variables in a data set with the find_var()-function.

library(sjPlot)
library(sjmisc)
data(efc)

# find all variables from COPE-Index, which all 
# have a "cop" in their variable name, and then
# plot the items as likert-plot
mydf <- find_var(efc, pattern = "cop", out = "df")
plot_likert(mydf)

The plot is not perfect, because for those values with just a few answers, we have overlapping values. However, there are quite some options to tweak the plot. For instance, we can increase the axis-range (grid.range), show cumulative percentage-values only at the ende of the bars (values = "sum.outside") and show the percentage-sign (show.prc.sign = TRUE).

plot_likert(
  mydf,
  grid.range = c(1.2, 1.4),
  expand.grid = FALSE,
  values = "sum.outside",
  show.prc.sign = TRUE
)

The interesting question is, whether we can reduce the dimensions of this scale and try to extract principle components, in order to group single items into different sub-scales. To do that, we first run a PCA on the data. This can be done, e.g., with sjt.pca() or sjp.pca().

# creates a HTML-table of the results of an PCA.
sjt.pca(mydf)
Principal Component Analysis
 Component 1Component 2
do you feel you cope well as caregiver?0.290.60
do you find caregiving too demanding?-0.60-0.42
does caregiving cause difficulties in your relationship with your friends?-0.69-0.16
does caregiving have negative effect on your physical health?-0.73-0.12
does caregiving cause difficulties in your relationship with your family?-0.64-0.01
does caregiving cause financial difficulties?-0.690.12
do you feel trapped in your role as caregiver?-0.68-0.38
do you feel supported by friends/neighbours?-0.070.64
do you feel caregiving worthwhile?0.070.75
Cronbach’s α0.780.45
varimax-rotation

As we can see, six items are associated with component one, while three items mainly load on the second component. The indices that indicate which items is associated with which component is returned by the function in the element $factor.index. So we save this in an object that can be used to create a grouped Likert-plot.

groups <- sjt.pca(mydf)$factor.index
plot_likert(mydf, groups = groups, values = "sum.outside")

There are even mote options to tweak the Likert-plots. Find the full documentation at
https://strengejacke.github.io/sjPlot/index.html.

More support for Bayesian analysis in the sj!-packages #rstats #rstan #brms

$
0
0

Another quick preview of my R-packages, especially sjPlot, which now also support brmsfit-objects from the great brms-package. To demonstrate the new features, I load all my „core“-packages at once, using the strengejacke-package, which is only available from GitHub. This package simply loads four packages (sjlabelled, sjmisc, sjstats and sjPlot).

First, I fit two sample models, one with brms and one with rstanarm:

#> install pkg "strengejacke" via GitHub:
#> devtools::install_github("strengejacke/strengejacke")
library(strengejacke)
library(ggplot2)
library(lme4)
library(glmmTMB)
library(brms)
library(rstanarm)

data(Owls)
m1 <- brm(
  SiblingNegotiation ~ FoodTreatment + ArrivalTime + SexParent + (1 | Nest), 
  data = Owls,
  family = zero_inflated_poisson(link = "log", link_zi = "logit")
)
m2 <- stan_glmer.nb(
  SiblingNegotiation ~ FoodTreatment + ArrivalTime + SexParent + (1 | Nest), 
  data = Owls
)
m3 <- stan_lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)

Next, some summary statistics, at first simply the High Density Interval, obtained from the hdi()-function. You can define the probabilites of the interval, however, here I simply use the defaults (which is the 90% HDI).

hdi(m1)
#> # A tibble: 6 x 3
#>                      term       hdi.low      hdi.high
#>                                       
#> 1             b_Intercept  3.798693e+00  4.584998e+00
#> 2 b_FoodTreatmentSatiated -2.795021e-01 -1.586102e-01
#> 3           b_ArrivalTime -9.768530e-02 -6.616026e-02
#> 4         b_SexParentMale -7.766154e-02  3.794376e-02
#> 5                      zi  2.304234e-01  2.900676e-01
#> 6                    lp__ -2.020651e+03 -2.002387e+03

hdi(m2)
#> # A tibble: 5 x 3
#>                    term     hdi.low    hdi.high
#>                                 
#> 1           (Intercept)  3.82688888  5.93003148
#> 2 FoodTreatmentSatiated -0.86327579 -0.51376279
#> 3           ArrivalTime -0.15414266 -0.07134095
#> 4         SexParentMale -0.09371495  0.24440447
#> 5 reciprocal_dispersion  0.77345433  1.00889671

Next, I implemented an own tidyr-function, similar to broom’s tidy(), called tidy_stan(). Unlike broom’s tidy-function, tidy_stan() computes the HDI instead of credibility intervals, and you may return fixed or random effects only, or both. Furthermore, tidy_stan() also returns the ratio of effective numbers of samples, n_eff, and Rhat statistics.

tidy_stan(m1)
#> # A tibble: 6 x 7
#>                      term  estimate std.error   hdi.low  hdi.high n_eff  Rhat
#>                                           
#> 1             b_Intercept     4.206     0.241     3.819     4.581 1.000 1.000
#> 2 b_FoodTreatmentSatiated    -0.220     0.037    -0.279    -0.162 1.000 1.000
#> 3           b_ArrivalTime    -0.082     0.009    -0.096    -0.065 1.000 1.000
#> 4         b_SexParentMale    -0.021     0.037    -0.077     0.036 1.000 1.000
#> 5                      zi     0.259     0.019     0.230     0.288 1.000 0.999
#> 6                    lp__ -2011.050     5.537 -2020.651 -2002.925 0.186 1.001

tidy_stan(m2)
#> # A tibble: 5 x 7
#>                    term estimate std.error hdi.low hdi.high n_eff  Rhat
#>                                     
#> 1           (Intercept)    4.873     0.652   3.847    5.880     1 1.000
#> 2 FoodTreatmentSatiated   -0.693     0.112  -0.860   -0.522     1 1.000
#> 3           ArrivalTime   -0.115     0.025  -0.154   -0.074     1 1.000
#> 4         SexParentMale    0.071     0.102  -0.088    0.241     1 0.999
#> 5 reciprocal_dispersion    0.883     0.073   0.776    1.005     1 1.000

Finally, the new and generic plot_model() function in sjPlot also supports brmsfit or stanreg objects, and plots estimates, random effects and marginal effects.

The coefficients-plot was already shown in a previous post. It shows the 50% and 89% HDI and the posterior median. The style can be changed using the bpe and bpe.style arguments.

theme_set(theme_sjplot())
plot_model(m1, axis.lim = c(.6, 1.4))
plot_model(m2, bpe = "mean", bpe.style = "dot", colors = "grey30")

Coefficients, brms-model

Coefficients, rstanarm-model

Random effects are displayed in a similar way. For models with random slopes and intercepts, random effects are displayed in a grid layout (use grid = FALSE to create a separate plot for each random effect).

plot_model(m1, type = "re")
plot_model(m3, type = "re")

Random effects, brms-model

Random effects, rstanarm-model

Marginal effects in sjPlot are based on the ggeffects-package. By default, predictions are computed with rstantools::posterior_linpred().

plot_model(m1, type = "pred", terms = c("ArrivalTime", "FoodTreatment"))
plot_model(m2, type = "pred", terms = c("ArrivalTime", "FoodTreatment"))

Marginal effects, brms-model

Marginal effects, rstanarm-model

The work on sjPlot for the next CRAN release is almost done… I hope to submit the sjPlot-update in the course of the next week! In the meantime, you may try out the new features using the GitHub-versions of my R-packages – at least sjstats and ggeffects are required.

Marginal Effects for Regression Models in R #rstats #dataviz

$
0
0

Regression coefficients are typically presented as tables that are easy to understand. Sometimes, estimates are difficult to interpret. This is especially true for interaction or transformed terms (quadratic or cubic terms, polynomials, splines), in particular for more complex models. In such cases, coefficients are no longer interpretable in a direct way and marginal effects are far easier to understand. Specifically, the visualization of marginal effects makes it possible to intuitively get the idea of how predictors and outcome are associated, even for complex models.

The ggeffects-package (Lüdecke 2018) aims at easily calculating marginal effects for a broad range of different regression models, beginning with classical models fitted with lm() or glm() to complex mixed models fitted with lme4 and glmmTMB or even Bayesian models from brms and rstanarm. The goal of the ggeffects-package is to provide a simple, user-friendly interface to calculate marginal effects, which is mainly achieved by one function: ggpredict(). Independent from the type of regression model, the output is always the same, a data frame with a consistent structure.

The idea behind this function is to compute (and visualize) the relationship between a model predictor (independent variable) and the model response (dependent variable). The predictor of interest needs to be specified in the terms-argument.

data(mtcars)
m <- lm(mpg ~ hp + wt + cyl + am, data = mtcars)
ggpredict(m, "cyl")
#> # A tibble: 3 x 5
#>       x predicted conf.low conf.high group
#>                  
#> 1     4      21.7     19.1      24.4 1    
#> 2     6      20.2     19.3      21.1 1    
#> 3     8      18.7     16.5      21.0 1

The relationship can be differentiated depending on further predictors, which is useful e.g. for interaction terms. Up to two further predictors that indicate the „grouping“ structure can be used to calculate marginal effects. The names of these predictors need to be passed as character vector to ggpredict().

m <- lm(mpg ~ wt * cyl + am + wt + cyl, data = mtcars)
p <- ggpredict(m, c("wt", "cyl"))
p
#> # A tibble: 27 x 5
#>        x predicted conf.low conf.high group
#>                   
#>  1   1.5      31.7     28.5      34.9 4    
#>  2   1.5      26.3     23.5      29.2 6    
#>  3   1.5      21.0     16.8      25.1 8    
#>  4   2        28.7     26.6      30.8 4    
#>  5   2        24.2     22.2      26.3 6    
#>  6   2        19.8     16.4      23.1 8    
#>  7   2.5      25.7     24.3      27.2 4    
#>  8   2.5      22.1     20.7      23.5 6    
#>  9   2.5      18.5     15.9      21.2 8    
#> 10   3        22.8     20.8      24.7 4    
#> # ... with 17 more rows

There’s a plot()-method, based on ggplot2:

plot(p)

The simple approach of ggpredict() can be used for all supported regression models. Thus, to calculate marginal effects with ggpredict(), it makes no differences if the model is a simpel linear model or a negative biniomial multilevel model or a cumulative link model etc. In case of cumulative link models, ggpredict() automatically takes care of proper grouping, in this case for the different levels of the response variable:

library(MASS)
library(ordinal)
data(housing)
m <- clm(Sat ~ Type * Cont + Infl, weights = Freq, data = housing)
p <- ggpredict(m, c("Cont", "Type"))
plot(p)

ggeffects also allows easily calculating marginal effects at specific levels of other predictors. This is particularly useful for interaction effects with continuous variables. In the following example, both variables of the interaction term have a larger range of values, which obscure the moderating effect:

m <- lm(mpg ~ wt * hp + am + wt, data = mtcars)
p <- ggpredict(m, c("hp", "wt"))
plot(p)

However, you can directly specify certain values, at which marginal effects should be calculated, or use „shortcuts“ that compute convenient values, like mean +/- 1 SD etc.

p <- ggpredict(m, c("hp", "wt [meansd]"))
plot(p)

The latest update of ggeffects on CRAN introduced some new features. There is a dedicated website that describes all the details of this package, including some vignettes with lots of examples.

Marginal Effects for (mixed effects) regression models #rstats

$
0
0

ggeffects (CRAN, website) is a package that computes marginal effects at the mean (MEMs) or representative values (MERs) for many different models, including mixed effects or Bayesian models. One of the advantages of the package is its easy-to-use interface: No matter if you fit a simple or complex model, with interactions or splines, the function call is always the same. This also holds true for the returned output, which is always a data frame with the same, consistent column names.

The past package-update introduced some new features I wanted to describe here: a revised print()-method as well as a new opportunity to plot marginal effects at different levels of random effects in mixed models…

The new print()-method

The former print()-method simply showed the first predicted values, including confidence intervals. For numeric predictor variables with many values, you could, for instance, only see the first 10 of more than 100 predicted values. While it makes sense to shorten the (console-)output, there was no information about the predictions for the last or other „representative“ values of the term in question. Now, the print()-method automatically prints a selection of representative values, so you get a quick and clean impression of the range of predicted values for continuous variables:

library(ggeffects)
data(efc)
efc$c172code <- as.factor(efc$c172code)
fit <- lm(barthtot ~ c12hour * c172code + neg_c_7, data = efc)

ggpredict(fit, "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX 
#> # x = average number of hours of care per week 
#> 
#>    x predicted std.error conf.low conf.high
#>    0    72.804     2.516   67.872    77.736
#>   20    68.060     2.097   63.951    72.170
#>   45    62.131     1.824   58.555    65.706
#>   65    57.387     1.886   53.691    61.083
#>   85    52.643     2.179   48.373    56.913
#>  105    47.900     2.626   42.752    53.047
#>  125    43.156     3.164   36.955    49.357
#>  170    32.482     4.531   23.602    41.363
#> 
#> Adjusted for:
#> * c172code =     1
#> *  neg_c_7 = 11.83

If you print predicted values of a term, grouped by the levels of another term (which makes sense in the above example due to the present interaction), the print()-method automatically adjusts the range of printed values to keep the console-output short. In the following example, only 6 instead of 8 values per „block“ are shown:

ggpredict(fit, c("c12hour", "c172code"))
#> 
#> # Predicted values of Total score BARTHEL INDEX 
#> # x = average number of hours of care per week 
#> 
#> # c172code = 1
#>    x predicted std.error conf.low conf.high
#>    0    72.804     2.516   67.872    77.736
#>   30    65.689     1.946   61.874    69.503
#>   55    59.759     1.823   56.186    63.331
#>   85    52.643     2.179   48.373    56.913
#>  115    45.528     2.887   39.870    51.186
#>  170    32.482     4.531   23.602    41.363
#> 
#> # c172code = 2
#>    x predicted std.error conf.low conf.high
#>    0    76.853     1.419   74.073    79.633
#>   30    68.921     1.115   66.737    71.106
#>   55    62.311     1.122   60.112    64.510
#>   85    54.379     1.438   51.560    57.198
#>  115    46.447     1.934   42.656    50.238
#>  170    31.905     3.007   26.011    37.800
#> 
#> # c172code = 3
#>    x predicted std.error conf.low conf.high
#>    0    73.862     2.502   68.958    78.766
#>   30    66.925     1.976   63.053    70.798
#>   55    61.145     2.155   56.920    65.369
#>   85    54.208     2.963   48.400    60.016
#>  115    47.271     4.057   39.320    55.222
#>  170    34.554     6.303   22.200    46.907
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

Marginal effects at specific levels of random effects

Marginal effects can also be calculated for each group level in mixed models. Simply add the name of the related random effects term to the terms-argument, and set type = "re". In the following example, we fit a linear mixed model and first simply plot the marginal effetcs, not conditioned on random effects.

library(sjlabelled)
library(lme4)
data(efc)
efc$e15relat <- as_label(efc$e15relat)
m <- lmer(neg_c_7 ~ c12hour + c160age + c161sex + (1 | e15relat), data = efc)
me <- ggpredict(m, terms = "c12hour")
plot(me)

To compute marginal effects for each grouping level, add the related random term to the terms-argument. In this case, confidence intervals are not calculated, but marginal effects are conditioned on each group level of the random effects.

me <- ggpredict(m, terms = c("c12hour", "e15relat"), type = "re")
plot(me)

Marginal effects, conditioned on random effects, can also be calculated for specific levels only. Add the related values into brackets after the variable name in the terms-argument.

me <- ggpredict(m, terms = c("c12hour", "e15relat [child,cousin]"), type = "re")
plot(me)

If the group factor has too many levels, you can also take a random sample of all possible levels and plot the marginal effects for this subsample of group levels. To do this, use term = "groupfactor [sample=n]".

data("sleepstudy")
m <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
me <- ggpredict(m, terms = c("Days", "Subject [sample=8]"), type = "re")
plot(me)


ggeffects 0.8.0 now on CRAN: marginal effects for regression models #rstats

$
0
0

I’m happy to announce that version 0.8.0 of my ggeffects-package is on CRAN now. The update has fixed some bugs from the previous version and comes along with many new features or improvements. One major part that was addressed in the latest version are fixed and improvements for mixed models, especially zero-inflated mixed models (fitted with the glmmTMB-package).

In this post, I want to demonstrate the different options to calculate and visualize marginal effects from mixed models.

Marginal effects for mixed effects models

Basically, the type of predictions, i.e. whether to account for the uncertainty of random effects or not, can be set with the type-argument.

Marginal effects conditioned on fixed effects

The default, type = "fe", means that predictions are on the population-level and do not account for the random effect variances.

library(ggeffects)
library(lme4)
data(sleepstudy)
m <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)

pr <- ggpredict(m, "Days")
pr
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted std.error conf.low conf.high
#>  0   251.405     6.825  238.029   264.781
#>  1   261.872     6.787  248.570   275.174
#>  2   272.340     7.094  258.435   286.244
#>  3   282.807     7.705  267.705   297.909
#>  5   303.742     9.581  284.963   322.520
#>  6   314.209    10.732  293.174   335.244
#>  7   324.676    11.973  301.210   348.142
#>  9   345.611    14.629  316.939   374.283
#> 
#> Adjusted for:
#> * Subject = 0 (population-level)
plot(pr)

Marginal effects conditioned on fixed effects with random effects uncertainty

When type = "re", the predicted values are still on the population-level. However, the random effect variances are taken into account, meaning that the prediction interval becomes larger. More technically speaking, type = "re" accounts for the uncertainty of the fixed effects conditional on the estimates of the random-effect variances and conditional modes (BLUPs).

The random-effect variance is the mean random-effect variance. Calculation is based on the proposal from Johnson et al. 2014, which is applicable for mixed models with more complex random effects structures.

As can be seen, compared to the previous example with type = "fe", predicted values are identical (both on the population-level). However, standard errors, and thus the resulting confidence (or prediction) intervals are much larger .

pr <- ggpredict(m, "Days", type = "re")
pr
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted std.error conf.low conf.high
#>  0   251.405    41.769  169.539   333.271
#>  1   261.872    41.763  180.019   343.726
#>  2   272.340    41.814  190.386   354.293
#>  3   282.807    41.922  200.642   364.972
#>  5   303.742    42.307  220.822   386.661
#>  6   314.209    42.582  230.749   397.669
#>  7   324.676    42.912  240.571   408.781
#>  9   345.611    43.727  259.907   431.315
#> 
#> Adjusted for:
#> * Subject = 0 (population-level)
plot(pr)

The reason why both type = "fe" and type = "re" return predictions at population-level is because ggpredict() returns predicted values of the response at specific levels of given model predictors, which are defined in the data frame that is passed to the newdata-argument (of predict()). The data frame requires data from all model terms, including random effect terms. This again requires to choose certain levels or values also for each random effect term, or to set those terms to zero or NA (for population-level). Since there is no general rule, which level(s) of random effect terms to choose in order to represent the random effects structure in the data, using the population-level seems the most clear and consistent approach.

Marginal effects conditioned on fixed effects and specific group levels

To get predicted values for a specific level of the random effect term, simply define this level in the condition-argument.

ggpredict(m, "Days", type = "re", condition = c(Subject = 330))
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted std.error conf.low conf.high
#>  0   275.096    41.769  193.230   356.961
#>  1   280.749    41.763  198.895   362.602
#>  2   286.402    41.814  204.448   368.355
#>  3   292.054    41.922  209.889   374.220
#>  5   303.360    42.307  220.440   386.280
#>  6   309.013    42.582  225.554   392.473
#>  7   314.666    42.912  230.561   398.772
#>  9   325.972    43.727  240.268   411.676

Marginal effects based on simulations

Finally, it is possible to obtain predicted values by simulating from the model, where predictions are based on simulate().

pr <- ggpredict(m, "Days", type = "sim")
pr
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted conf.low conf.high
#>  0   251.440  200.838   301.996
#>  1   261.860  212.637   311.678
#>  2   272.157  221.595   321.667
#>  3   282.800  233.416   332.738
#>  5   303.770  252.720   353.472
#>  6   314.146  264.651   363.752
#>  7   324.606  273.460   374.462
#>  9   345.319  295.069   394.735
#> 
#> Adjusted for:
#> * Subject = 0 (population-level)
plot(pr)

Marginal effects for zero-inflated mixed models

For zero-inflated mixed effects models, typically fitted with the glmmTMB-package, predicted values can be conditioned on

  • the fixed effects of the conditional model only (type = "fe")
  • the fixed effects and zero-inflation component (type = "fe.zi")
  • the fixed effects of the conditional model only (population-level), taking the random-effect variances into account (type = "re")
  • the fixed effects and zero-inflation component (population-level), taking the random-effect variances into account (type = "re.zi")
  • all model parameters (type = "sim")
library(glmmTMB)
data(Salamanders)
m <- glmmTMB(
  count ~ spp + mined + (1 | site), 
  ziformula = ~ spp + mined, 
  family = truncated_poisson, 
  data = Salamanders
)

Marginal effects conditioned on the count model

Similar to mixed models without zero-inflation component, type = "fe" and type = "re" for glmmTMB-models (with zero-inflation) both return predictions on the population-level, where the latter option accounts for the uncertainty of the random effects. In short, predict(..., type = "link") is called (however, predictions are finally back-transformed to the response scale).

pr <- ggpredict(m, "spp")
pr
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.935     0.206    0.624     1.400
#>  2     0.555     0.308    0.304     1.015
#>  3     1.171     0.192    0.804     1.704
#>  4     0.769     0.241    0.480     1.233
#>  5     1.786     0.182    1.250     2.550
#>  6     1.713     0.182    1.200     2.445
#>  7     0.979     0.196    0.667     1.437
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)
plot(pr)

For models with log-link, it make sense to use a log-transformed y-axis as well, to get proportional confidence intervals for the plot. You can do this by using the log.y-argument:

plot(pr, log.y = TRUE)

Marginal effects conditioned on the count model with random effects uncertainty

ggpredict(m, "spp", type = "re")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.935     0.309    0.510     1.714
#>  2     0.555     0.384    0.261     1.180
#>  3     1.171     0.300    0.650     2.107
#>  4     0.769     0.333    0.400     1.478
#>  5     1.786     0.294    1.004     3.175
#>  6     1.713     0.294    0.964     3.045
#>  7     0.979     0.303    0.541     1.772
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

Marginal effects conditioned on the count and zero-inflation model

For type = "fe.zi", the predicted response value is the expected value mu*(1-p) without conditioning on random effects. Since the zero inflation and the conditional model are working in “opposite directions”, a higher expected value for the zero inflation means a lower response, but a higher value for the conditional model means a higher response. While it is possible to calculate predicted values with predict(..., type = "response"), standard errors and confidence intervals can not be derived directly from the predict()-function. Thus, confidence intervals for type = "fe.zi" are based on quantiles of simulated draws from a multivariate normal distribution (see also Brooks et al. 2017, pp.391-392 for details).

ggpredict(m, "spp", type = "fe.zi")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.138     0.045    0.052     0.224
#>  2     0.017     0.009    0.000     0.035
#>  3     0.245     0.072    0.109     0.381
#>  4     0.042     0.018    0.007     0.076
#>  5     0.374     0.108    0.166     0.582
#>  6     0.433     0.117    0.208     0.657
#>  7     0.205     0.063    0.082     0.328
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

Marginal effects conditioned on the count and zero-inflation model with random effects uncertainty

For type = "re.zi", the predicted response value is the expected value mu*(1-p), accounting for the random-effect variances. Prediction intervals are calculated in the same way as for type = "fe.zi", except that the mean random effect variance is considered for the confidence intervals.

ggpredict(m, "spp", type = "re.zi")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.138     0.235    0.032     0.354
#>  2     0.017     0.231    0.000     0.054
#>  3     0.245     0.243    0.065     0.609
#>  4     0.042     0.231    0.002     0.126
#>  5     0.374     0.257    0.098     0.932
#>  6     0.433     0.263    0.122     1.060
#>  7     0.205     0.239    0.054     0.510
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

Marginal effects simulated from zero-inflated models

Finally, it is possible to obtain predicted values by simulating from the model, where predictions are based on simulate() (see Brooks et al. 2017, pp.392-393 for details). To achieve this, use type = "sim".

ggpredict(m, "spp", type = "sim")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     1.089     1.288        0     4.131
#>  2     0.292     0.667        0     2.306
#>  3     1.520     1.550        0     5.241
#>  4     0.536     0.946        0     3.087
#>  5     2.212     2.125        0     7.153
#>  6     2.289     2.065        0     7.121
#>  7     1.314     1.367        0     4.697
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

References

  • Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378–400.
  • Johnson PC, O’Hara RB. 2014. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946. (doi: 10.1111/2041-210X.12225)

Likert-plots and grouped Likert-plots #rstats

$
0
0

I’m pleased to anounce an update of my sjPlot-package, a package for Data Visualization for Statistics in Social Science. Thanks to the help of Alexander, it is now possible to create grouped Likert-plots. This is what I want to show in this post…

First, we load the required packages and sample data. We want to plot several items from an index, where all these items are 4-point Likert scales. The items deal with coping of family caregivers, i.e. how well they cope with their role of caring for an older relative.

To find the required variables in the data set, we search all variables for a specific pattern. We know that the variables from the COPE-index all have a cop in their variable name. We can easily search for variables in a data set with the find_var()-function.

library(sjPlot)
library(sjmisc)
data(efc)

# find all variables from COPE-Index, which all 
# have a "cop" in their variable name, and then
# plot the items as likert-plot
mydf <- find_var(efc, pattern = "cop", out = "df")
plot_likert(mydf)

The plot is not perfect, because for those values with just a few answers, we have overlapping values. However, there are quite some options to tweak the plot. For instance, we can increase the axis-range (grid.range), show cumulative percentage-values only at the ende of the bars (values = "sum.outside") and show the percentage-sign (show.prc.sign = TRUE).

plot_likert(
  mydf,
  grid.range = c(1.2, 1.4),
  expand.grid = FALSE,
  values = "sum.outside",
  show.prc.sign = TRUE
)

The interesting question is, whether we can reduce the dimensions of this scale and try to extract principle components, in order to group single items into different sub-scales. To do that, we first run a PCA on the data. This can be done, e.g., with sjt.pca() or sjp.pca().

# creates a HTML-table of the results of an PCA.
sjt.pca(mydf)
Principal Component Analysis
 Component 1Component 2
do you feel you cope well as caregiver?0.290.60
do you find caregiving too demanding?-0.60-0.42
does caregiving cause difficulties in your relationship with your friends?-0.69-0.16
does caregiving have negative effect on your physical health?-0.73-0.12
does caregiving cause difficulties in your relationship with your family?-0.64-0.01
does caregiving cause financial difficulties?-0.690.12
do you feel trapped in your role as caregiver?-0.68-0.38
do you feel supported by friends/neighbours?-0.070.64
do you feel caregiving worthwhile?0.070.75
Cronbach’s α0.780.45
varimax-rotation

As we can see, six items are associated with component one, while three items mainly load on the second component. The indices that indicate which items is associated with which component is returned by the function in the element $factor.index. So we save this in an object that can be used to create a grouped Likert-plot.

groups <- sjt.pca(mydf)$factor.index
plot_likert(mydf, groups = groups, values = "sum.outside")

There are even mote options to tweak the Likert-plots. Find the full documentation at
https://strengejacke.github.io/sjPlot/index.html.

More support for Bayesian analysis in the sj!-packages #rstats #rstan #brms

$
0
0

Another quick preview of my R-packages, especially sjPlot, which now also support brmsfit-objects from the great brms-package. To demonstrate the new features, I load all my „core“-packages at once, using the strengejacke-package, which is only available from GitHub. This package simply loads four packages (sjlabelled, sjmisc, sjstats and sjPlot).

First, I fit two sample models, one with brms and one with rstanarm:

#> install pkg "strengejacke" via GitHub:
#> devtools::install_github("strengejacke/strengejacke")
library(strengejacke)
library(ggplot2)
library(lme4)
library(glmmTMB)
library(brms)
library(rstanarm)

data(Owls)
m1 <- brm(
  SiblingNegotiation ~ FoodTreatment + ArrivalTime + SexParent + (1 | Nest), 
  data = Owls,
  family = zero_inflated_poisson(link = "log", link_zi = "logit")
)
m2 <- stan_glmer.nb(
  SiblingNegotiation ~ FoodTreatment + ArrivalTime + SexParent + (1 | Nest), 
  data = Owls
)
m3 <- stan_lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)

Next, some summary statistics, at first simply the High Density Interval, obtained from the hdi()-function. You can define the probabilites of the interval, however, here I simply use the defaults (which is the 90% HDI).

hdi(m1)
#> # A tibble: 6 x 3
#>                      term       hdi.low      hdi.high
#>                                       
#> 1             b_Intercept  3.798693e+00  4.584998e+00
#> 2 b_FoodTreatmentSatiated -2.795021e-01 -1.586102e-01
#> 3           b_ArrivalTime -9.768530e-02 -6.616026e-02
#> 4         b_SexParentMale -7.766154e-02  3.794376e-02
#> 5                      zi  2.304234e-01  2.900676e-01
#> 6                    lp__ -2.020651e+03 -2.002387e+03

hdi(m2)
#> # A tibble: 5 x 3
#>                    term     hdi.low    hdi.high
#>                                 
#> 1           (Intercept)  3.82688888  5.93003148
#> 2 FoodTreatmentSatiated -0.86327579 -0.51376279
#> 3           ArrivalTime -0.15414266 -0.07134095
#> 4         SexParentMale -0.09371495  0.24440447
#> 5 reciprocal_dispersion  0.77345433  1.00889671

Next, I implemented an own tidyr-function, similar to broom’s tidy(), called tidy_stan(). Unlike broom’s tidy-function, tidy_stan() computes the HDI instead of credibility intervals, and you may return fixed or random effects only, or both. Furthermore, tidy_stan() also returns the ratio of effective numbers of samples, n_eff, and Rhat statistics.

tidy_stan(m1)
#> # A tibble: 6 x 7
#>                      term  estimate std.error   hdi.low  hdi.high n_eff  Rhat
#>                                           
#> 1             b_Intercept     4.206     0.241     3.819     4.581 1.000 1.000
#> 2 b_FoodTreatmentSatiated    -0.220     0.037    -0.279    -0.162 1.000 1.000
#> 3           b_ArrivalTime    -0.082     0.009    -0.096    -0.065 1.000 1.000
#> 4         b_SexParentMale    -0.021     0.037    -0.077     0.036 1.000 1.000
#> 5                      zi     0.259     0.019     0.230     0.288 1.000 0.999
#> 6                    lp__ -2011.050     5.537 -2020.651 -2002.925 0.186 1.001

tidy_stan(m2)
#> # A tibble: 5 x 7
#>                    term estimate std.error hdi.low hdi.high n_eff  Rhat
#>                                     
#> 1           (Intercept)    4.873     0.652   3.847    5.880     1 1.000
#> 2 FoodTreatmentSatiated   -0.693     0.112  -0.860   -0.522     1 1.000
#> 3           ArrivalTime   -0.115     0.025  -0.154   -0.074     1 1.000
#> 4         SexParentMale    0.071     0.102  -0.088    0.241     1 0.999
#> 5 reciprocal_dispersion    0.883     0.073   0.776    1.005     1 1.000

Finally, the new and generic plot_model() function in sjPlot also supports brmsfit or stanreg objects, and plots estimates, random effects and marginal effects.

The coefficients-plot was already shown in a previous post. It shows the 50% and 89% HDI and the posterior median. The style can be changed using the bpe and bpe.style arguments.

theme_set(theme_sjplot())
plot_model(m1, axis.lim = c(.6, 1.4))
plot_model(m2, bpe = "mean", bpe.style = "dot", colors = "grey30")

Coefficients, brms-model

Coefficients, rstanarm-model

Random effects are displayed in a similar way. For models with random slopes and intercepts, random effects are displayed in a grid layout (use grid = FALSE to create a separate plot for each random effect).

plot_model(m1, type = "re")
plot_model(m3, type = "re")

Random effects, brms-model

Random effects, rstanarm-model

Marginal effects in sjPlot are based on the ggeffects-package. By default, predictions are computed with rstantools::posterior_linpred().

plot_model(m1, type = "pred", terms = c("ArrivalTime", "FoodTreatment"))
plot_model(m2, type = "pred", terms = c("ArrivalTime", "FoodTreatment"))

Marginal effects, brms-model

Marginal effects, rstanarm-model

The work on sjPlot for the next CRAN release is almost done… I hope to submit the sjPlot-update in the course of the next week! In the meantime, you may try out the new features using the GitHub-versions of my R-packages – at least sjstats and ggeffects are required.

Marginal Effects for Regression Models in R #rstats #dataviz

$
0
0

Regression coefficients are typically presented as tables that are easy to understand. Sometimes, estimates are difficult to interpret. This is especially true for interaction or transformed terms (quadratic or cubic terms, polynomials, splines), in particular for more complex models. In such cases, coefficients are no longer interpretable in a direct way and marginal effects are far easier to understand. Specifically, the visualization of marginal effects makes it possible to intuitively get the idea of how predictors and outcome are associated, even for complex models.

The ggeffects-package (Lüdecke 2018) aims at easily calculating marginal effects for a broad range of different regression models, beginning with classical models fitted with lm() or glm() to complex mixed models fitted with lme4 and glmmTMB or even Bayesian models from brms and rstanarm. The goal of the ggeffects-package is to provide a simple, user-friendly interface to calculate marginal effects, which is mainly achieved by one function: ggpredict(). Independent from the type of regression model, the output is always the same, a data frame with a consistent structure.

The idea behind this function is to compute (and visualize) the relationship between a model predictor (independent variable) and the model response (dependent variable). The predictor of interest needs to be specified in the terms-argument.

data(mtcars)
m <- lm(mpg ~ hp + wt + cyl + am, data = mtcars)
ggpredict(m, "cyl")
#> # A tibble: 3 x 5
#>       x predicted conf.low conf.high group
#>                  
#> 1     4      21.7     19.1      24.4 1    
#> 2     6      20.2     19.3      21.1 1    
#> 3     8      18.7     16.5      21.0 1

The relationship can be differentiated depending on further predictors, which is useful e.g. for interaction terms. Up to two further predictors that indicate the „grouping“ structure can be used to calculate marginal effects. The names of these predictors need to be passed as character vector to ggpredict().

m <- lm(mpg ~ wt * cyl + am + wt + cyl, data = mtcars)
p <- ggpredict(m, c("wt", "cyl"))
p
#> # A tibble: 27 x 5
#>        x predicted conf.low conf.high group
#>                   
#>  1   1.5      31.7     28.5      34.9 4    
#>  2   1.5      26.3     23.5      29.2 6    
#>  3   1.5      21.0     16.8      25.1 8    
#>  4   2        28.7     26.6      30.8 4    
#>  5   2        24.2     22.2      26.3 6    
#>  6   2        19.8     16.4      23.1 8    
#>  7   2.5      25.7     24.3      27.2 4    
#>  8   2.5      22.1     20.7      23.5 6    
#>  9   2.5      18.5     15.9      21.2 8    
#> 10   3        22.8     20.8      24.7 4    
#> # ... with 17 more rows

There’s a plot()-method, based on ggplot2:

plot(p)

The simple approach of ggpredict() can be used for all supported regression models. Thus, to calculate marginal effects with ggpredict(), it makes no differences if the model is a simpel linear model or a negative biniomial multilevel model or a cumulative link model etc. In case of cumulative link models, ggpredict() automatically takes care of proper grouping, in this case for the different levels of the response variable:

library(MASS)
library(ordinal)
data(housing)
m <- clm(Sat ~ Type * Cont + Infl, weights = Freq, data = housing)
p <- ggpredict(m, c("Cont", "Type"))
plot(p)

ggeffects also allows easily calculating marginal effects at specific levels of other predictors. This is particularly useful for interaction effects with continuous variables. In the following example, both variables of the interaction term have a larger range of values, which obscure the moderating effect:

m <- lm(mpg ~ wt * hp + am + wt, data = mtcars)
p <- ggpredict(m, c("hp", "wt"))
plot(p)

However, you can directly specify certain values, at which marginal effects should be calculated, or use „shortcuts“ that compute convenient values, like mean +/- 1 SD etc.

p <- ggpredict(m, c("hp", "wt [meansd]"))
plot(p)

The latest update of ggeffects on CRAN introduced some new features. There is a dedicated website that describes all the details of this package, including some vignettes with lots of examples.

Marginal Effects for (mixed effects) regression models #rstats

$
0
0

ggeffects (CRAN, website) is a package that computes marginal effects at the mean (MEMs) or representative values (MERs) for many different models, including mixed effects or Bayesian models. One of the advantages of the package is its easy-to-use interface: No matter if you fit a simple or complex model, with interactions or splines, the function call is always the same. This also holds true for the returned output, which is always a data frame with the same, consistent column names.

The past package-update introduced some new features I wanted to describe here: a revised print()-method as well as a new opportunity to plot marginal effects at different levels of random effects in mixed models…

The new print()-method

The former print()-method simply showed the first predicted values, including confidence intervals. For numeric predictor variables with many values, you could, for instance, only see the first 10 of more than 100 predicted values. While it makes sense to shorten the (console-)output, there was no information about the predictions for the last or other „representative“ values of the term in question. Now, the print()-method automatically prints a selection of representative values, so you get a quick and clean impression of the range of predicted values for continuous variables:

library(ggeffects)
data(efc)
efc$c172code <- as.factor(efc$c172code)
fit <- lm(barthtot ~ c12hour * c172code + neg_c_7, data = efc)

ggpredict(fit, "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX 
#> # x = average number of hours of care per week 
#> 
#>    x predicted std.error conf.low conf.high
#>    0    72.804     2.516   67.872    77.736
#>   20    68.060     2.097   63.951    72.170
#>   45    62.131     1.824   58.555    65.706
#>   65    57.387     1.886   53.691    61.083
#>   85    52.643     2.179   48.373    56.913
#>  105    47.900     2.626   42.752    53.047
#>  125    43.156     3.164   36.955    49.357
#>  170    32.482     4.531   23.602    41.363
#> 
#> Adjusted for:
#> * c172code =     1
#> *  neg_c_7 = 11.83

If you print predicted values of a term, grouped by the levels of another term (which makes sense in the above example due to the present interaction), the print()-method automatically adjusts the range of printed values to keep the console-output short. In the following example, only 6 instead of 8 values per „block“ are shown:

ggpredict(fit, c("c12hour", "c172code"))
#> 
#> # Predicted values of Total score BARTHEL INDEX 
#> # x = average number of hours of care per week 
#> 
#> # c172code = 1
#>    x predicted std.error conf.low conf.high
#>    0    72.804     2.516   67.872    77.736
#>   30    65.689     1.946   61.874    69.503
#>   55    59.759     1.823   56.186    63.331
#>   85    52.643     2.179   48.373    56.913
#>  115    45.528     2.887   39.870    51.186
#>  170    32.482     4.531   23.602    41.363
#> 
#> # c172code = 2
#>    x predicted std.error conf.low conf.high
#>    0    76.853     1.419   74.073    79.633
#>   30    68.921     1.115   66.737    71.106
#>   55    62.311     1.122   60.112    64.510
#>   85    54.379     1.438   51.560    57.198
#>  115    46.447     1.934   42.656    50.238
#>  170    31.905     3.007   26.011    37.800
#> 
#> # c172code = 3
#>    x predicted std.error conf.low conf.high
#>    0    73.862     2.502   68.958    78.766
#>   30    66.925     1.976   63.053    70.798
#>   55    61.145     2.155   56.920    65.369
#>   85    54.208     2.963   48.400    60.016
#>  115    47.271     4.057   39.320    55.222
#>  170    34.554     6.303   22.200    46.907
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

Marginal effects at specific levels of random effects

Marginal effects can also be calculated for each group level in mixed models. Simply add the name of the related random effects term to the terms-argument, and set type = "re". In the following example, we fit a linear mixed model and first simply plot the marginal effetcs, not conditioned on random effects.

library(sjlabelled)
library(lme4)
data(efc)
efc$e15relat <- as_label(efc$e15relat)
m <- lmer(neg_c_7 ~ c12hour + c160age + c161sex + (1 | e15relat), data = efc)
me <- ggpredict(m, terms = "c12hour")
plot(me)

To compute marginal effects for each grouping level, add the related random term to the terms-argument. In this case, confidence intervals are not calculated, but marginal effects are conditioned on each group level of the random effects.

me <- ggpredict(m, terms = c("c12hour", "e15relat"), type = "re")
plot(me)

Marginal effects, conditioned on random effects, can also be calculated for specific levels only. Add the related values into brackets after the variable name in the terms-argument.

me <- ggpredict(m, terms = c("c12hour", "e15relat [child,cousin]"), type = "re")
plot(me)

If the group factor has too many levels, you can also take a random sample of all possible levels and plot the marginal effects for this subsample of group levels. To do this, use term = "groupfactor [sample=n]".

data("sleepstudy")
m <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
me <- ggpredict(m, terms = c("Days", "Subject [sample=8]"), type = "re")
plot(me)

ggeffects 0.8.0 now on CRAN: marginal effects for regression models #rstats

$
0
0

I’m happy to announce that version 0.8.0 of my ggeffects-package is on CRAN now. The update has fixed some bugs from the previous version and comes along with many new features or improvements. One major part that was addressed in the latest version are fixed and improvements for mixed models, especially zero-inflated mixed models (fitted with the glmmTMB-package).

In this post, I want to demonstrate the different options to calculate and visualize marginal effects from mixed models.

Marginal effects for mixed effects models

Basically, the type of predictions, i.e. whether to account for the uncertainty of random effects or not, can be set with the type-argument.

Marginal effects conditioned on fixed effects

The default, type = "fe", means that predictions are on the population-level and do not account for the random effect variances.

library(ggeffects)
library(lme4)
data(sleepstudy)
m <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)

pr <- ggpredict(m, "Days")
pr
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted std.error conf.low conf.high
#>  0   251.405     6.825  238.029   264.781
#>  1   261.872     6.787  248.570   275.174
#>  2   272.340     7.094  258.435   286.244
#>  3   282.807     7.705  267.705   297.909
#>  5   303.742     9.581  284.963   322.520
#>  6   314.209    10.732  293.174   335.244
#>  7   324.676    11.973  301.210   348.142
#>  9   345.611    14.629  316.939   374.283
#> 
#> Adjusted for:
#> * Subject = 0 (population-level)
plot(pr)

Marginal effects conditioned on fixed effects with random effects uncertainty

When type = "re", the predicted values are still on the population-level. However, the random effect variances are taken into account, meaning that the prediction interval becomes larger. More technically speaking, type = "re" accounts for the uncertainty of the fixed effects conditional on the estimates of the random-effect variances and conditional modes (BLUPs).

The random-effect variance is the mean random-effect variance. Calculation is based on the proposal from Johnson et al. 2014, which is applicable for mixed models with more complex random effects structures.

As can be seen, compared to the previous example with type = "fe", predicted values are identical (both on the population-level). However, standard errors, and thus the resulting confidence (or prediction) intervals are much larger .

pr <- ggpredict(m, "Days", type = "re")
pr
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted std.error conf.low conf.high
#>  0   251.405    41.769  169.539   333.271
#>  1   261.872    41.763  180.019   343.726
#>  2   272.340    41.814  190.386   354.293
#>  3   282.807    41.922  200.642   364.972
#>  5   303.742    42.307  220.822   386.661
#>  6   314.209    42.582  230.749   397.669
#>  7   324.676    42.912  240.571   408.781
#>  9   345.611    43.727  259.907   431.315
#> 
#> Adjusted for:
#> * Subject = 0 (population-level)
plot(pr)

The reason why both type = "fe" and type = "re" return predictions at population-level is because ggpredict() returns predicted values of the response at specific levels of given model predictors, which are defined in the data frame that is passed to the newdata-argument (of predict()). The data frame requires data from all model terms, including random effect terms. This again requires to choose certain levels or values also for each random effect term, or to set those terms to zero or NA (for population-level). Since there is no general rule, which level(s) of random effect terms to choose in order to represent the random effects structure in the data, using the population-level seems the most clear and consistent approach.

Marginal effects conditioned on fixed effects and specific group levels

To get predicted values for a specific level of the random effect term, simply define this level in the condition-argument.

ggpredict(m, "Days", type = "re", condition = c(Subject = 330))
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted std.error conf.low conf.high
#>  0   275.096    41.769  193.230   356.961
#>  1   280.749    41.763  198.895   362.602
#>  2   286.402    41.814  204.448   368.355
#>  3   292.054    41.922  209.889   374.220
#>  5   303.360    42.307  220.440   386.280
#>  6   309.013    42.582  225.554   392.473
#>  7   314.666    42.912  230.561   398.772
#>  9   325.972    43.727  240.268   411.676

Marginal effects based on simulations

Finally, it is possible to obtain predicted values by simulating from the model, where predictions are based on simulate().

pr <- ggpredict(m, "Days", type = "sim")
pr
#> 
#> # Predicted values of Reaction 
#> # x = Days 
#> 
#>  x predicted conf.low conf.high
#>  0   251.440  200.838   301.996
#>  1   261.860  212.637   311.678
#>  2   272.157  221.595   321.667
#>  3   282.800  233.416   332.738
#>  5   303.770  252.720   353.472
#>  6   314.146  264.651   363.752
#>  7   324.606  273.460   374.462
#>  9   345.319  295.069   394.735
#> 
#> Adjusted for:
#> * Subject = 0 (population-level)
plot(pr)

Marginal effects for zero-inflated mixed models

For zero-inflated mixed effects models, typically fitted with the glmmTMB-package, predicted values can be conditioned on

  • the fixed effects of the conditional model only (type = "fe")
  • the fixed effects and zero-inflation component (type = "fe.zi")
  • the fixed effects of the conditional model only (population-level), taking the random-effect variances into account (type = "re")
  • the fixed effects and zero-inflation component (population-level), taking the random-effect variances into account (type = "re.zi")
  • all model parameters (type = "sim")
library(glmmTMB)
data(Salamanders)
m <- glmmTMB(
  count ~ spp + mined + (1 | site), 
  ziformula = ~ spp + mined, 
  family = truncated_poisson, 
  data = Salamanders
)

Marginal effects conditioned on the count model

Similar to mixed models without zero-inflation component, type = "fe" and type = "re" for glmmTMB-models (with zero-inflation) both return predictions on the population-level, where the latter option accounts for the uncertainty of the random effects. In short, predict(..., type = "link") is called (however, predictions are finally back-transformed to the response scale).

pr <- ggpredict(m, "spp")
pr
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.935     0.206    0.624     1.400
#>  2     0.555     0.308    0.304     1.015
#>  3     1.171     0.192    0.804     1.704
#>  4     0.769     0.241    0.480     1.233
#>  5     1.786     0.182    1.250     2.550
#>  6     1.713     0.182    1.200     2.445
#>  7     0.979     0.196    0.667     1.437
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)
plot(pr)

For models with log-link, it make sense to use a log-transformed y-axis as well, to get proportional confidence intervals for the plot. You can do this by using the log.y-argument:

plot(pr, log.y = TRUE)

Marginal effects conditioned on the count model with random effects uncertainty

ggpredict(m, "spp", type = "re")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.935     0.309    0.510     1.714
#>  2     0.555     0.384    0.261     1.180
#>  3     1.171     0.300    0.650     2.107
#>  4     0.769     0.333    0.400     1.478
#>  5     1.786     0.294    1.004     3.175
#>  6     1.713     0.294    0.964     3.045
#>  7     0.979     0.303    0.541     1.772
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

Marginal effects conditioned on the count and zero-inflation model

For type = "fe.zi", the predicted response value is the expected value mu*(1-p) without conditioning on random effects. Since the zero inflation and the conditional model are working in “opposite directions”, a higher expected value for the zero inflation means a lower response, but a higher value for the conditional model means a higher response. While it is possible to calculate predicted values with predict(..., type = "response"), standard errors and confidence intervals can not be derived directly from the predict()-function. Thus, confidence intervals for type = "fe.zi" are based on quantiles of simulated draws from a multivariate normal distribution (see also Brooks et al. 2017, pp.391-392 for details).

ggpredict(m, "spp", type = "fe.zi")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.138     0.045    0.052     0.224
#>  2     0.017     0.009    0.000     0.035
#>  3     0.245     0.072    0.109     0.381
#>  4     0.042     0.018    0.007     0.076
#>  5     0.374     0.108    0.166     0.582
#>  6     0.433     0.117    0.208     0.657
#>  7     0.205     0.063    0.082     0.328
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

Marginal effects conditioned on the count and zero-inflation model with random effects uncertainty

For type = "re.zi", the predicted response value is the expected value mu*(1-p), accounting for the random-effect variances. Prediction intervals are calculated in the same way as for type = "fe.zi", except that the mean random effect variance is considered for the confidence intervals.

ggpredict(m, "spp", type = "re.zi")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     0.138     0.235    0.032     0.354
#>  2     0.017     0.231    0.000     0.054
#>  3     0.245     0.243    0.065     0.609
#>  4     0.042     0.231    0.002     0.126
#>  5     0.374     0.257    0.098     0.932
#>  6     0.433     0.263    0.122     1.060
#>  7     0.205     0.239    0.054     0.510
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

Marginal effects simulated from zero-inflated models

Finally, it is possible to obtain predicted values by simulating from the model, where predictions are based on simulate() (see Brooks et al. 2017, pp.392-393 for details). To achieve this, use type = "sim".

ggpredict(m, "spp", type = "sim")
#> 
#> # Predicted counts of count 
#> # x = spp 
#> 
#>  x predicted std.error conf.low conf.high
#>  1     1.089     1.288        0     4.131
#>  2     0.292     0.667        0     2.306
#>  3     1.520     1.550        0     5.241
#>  4     0.536     0.946        0     3.087
#>  5     2.212     2.125        0     7.153
#>  6     2.289     2.065        0     7.121
#>  7     1.314     1.367        0     4.697
#> 
#> Adjusted for:
#> * mined = yes
#> *  site = NA (population-level)

References

  • Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378–400.
  • Johnson PC, O’Hara RB. 2014. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946. (doi: 10.1111/2041-210X.12225)

Likert-plots and grouped Likert-plots #rstats

$
0
0

I’m pleased to anounce an update of my sjPlot-package, a package for Data Visualization for Statistics in Social Science. Thanks to the help of Alexander, it is now possible to create grouped Likert-plots. This is what I want to show in this post…

First, we load the required packages and sample data. We want to plot several items from an index, where all these items are 4-point Likert scales. The items deal with coping of family caregivers, i.e. how well they cope with their role of caring for an older relative.

To find the required variables in the data set, we search all variables for a specific pattern. We know that the variables from the COPE-index all have a cop in their variable name. We can easily search for variables in a data set with the find_var()-function.

library(sjPlot)
library(sjmisc)
data(efc)

# find all variables from COPE-Index, which all 
# have a "cop" in their variable name, and then
# plot the items as likert-plot
mydf <- find_var(efc, pattern = "cop", out = "df")
plot_likert(mydf)

The plot is not perfect, because for those values with just a few answers, we have overlapping values. However, there are quite some options to tweak the plot. For instance, we can increase the axis-range (grid.range), show cumulative percentage-values only at the ende of the bars (values = "sum.outside") and show the percentage-sign (show.prc.sign = TRUE).

plot_likert(
  mydf,
  grid.range = c(1.2, 1.4),
  expand.grid = FALSE,
  values = "sum.outside",
  show.prc.sign = TRUE
)

The interesting question is, whether we can reduce the dimensions of this scale and try to extract principle components, in order to group single items into different sub-scales. To do that, we first run a PCA on the data. This can be done, e.g., with sjt.pca() or sjp.pca().

# creates a HTML-table of the results of an PCA.
sjt.pca(mydf)
Principal Component Analysis
 Component 1Component 2
do you feel you cope well as caregiver?0.290.60
do you find caregiving too demanding?-0.60-0.42
does caregiving cause difficulties in your relationship with your friends?-0.69-0.16
does caregiving have negative effect on your physical health?-0.73-0.12
does caregiving cause difficulties in your relationship with your family?-0.64-0.01
does caregiving cause financial difficulties?-0.690.12
do you feel trapped in your role as caregiver?-0.68-0.38
do you feel supported by friends/neighbours?-0.070.64
do you feel caregiving worthwhile?0.070.75
Cronbach’s α0.780.45
varimax-rotation

As we can see, six items are associated with component one, while three items mainly load on the second component. The indices that indicate which items is associated with which component is returned by the function in the element $factor.index. So we save this in an object that can be used to create a grouped Likert-plot.

groups <- sjt.pca(mydf)$factor.index
plot_likert(mydf, groups = groups, values = "sum.outside")

There are even mote options to tweak the Likert-plots. Find the full documentation at
https://strengejacke.github.io/sjPlot/index.html.


Viewing all 35 articles
Browse latest View live