I have a question regarding `anova()` function of the "analysis of deviance table".
On roughly line no. 400 of the Rmd file,
We used:
```{r}
fit_smoke <- glm(data=nhanes_BP, formula=current_smoking~gender + nsa(age), family=binomial)
fit_smoke_int <- glm(data=nhanes_BP, formula=current_smoking~gender * nsa(age), family=binomial)
anova(fit_smoke, fit_smoke_int, test="LRT")
```
to evalute the effect of age, sex interaction on deviance of the model. It returns:
""" Model 1: current_smoking ~ gender + nsa(age) Model 2: current_smoking ~ gender * nsa(age) Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 9372 9466.8 2 9369 9465.3 3 1.4653 0.6903
"""
Out of curiosity I created two more glm models by adding bmi to each one and compare them with `anova()`:
```{r}
fit_smoke1 <- glm(data=nhanes_BP, formula=current_smoking~bmi+gender + nsa(age), family=binomial)
fit_smoke2 <- glm(data=nhanes_BP, formula=current_smoking~bmi+gender * nsa(age), family=binomial)
anova(fit_smoke1, fit_smoke2, test="LRT")
```
The results deviate from the first one, although not much:
"""
Model 1: current_smoking ~ bmi + gender + nsa(age) Model 2: current_smoking ~ bmi + gender * nsa(age) Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 9371 9377.7 2 9368 9376.4 3 1.3415 0.7193
"""
Intuitively I would assume the effect of bmi cancel out each other so that we obtain the same difference in deviance. What is the reason for that? Thanks in advance!
Hi Tianlin, yes that seems to be the case at least approximately. But because of correlation of bmi with gender/age, this does not give exactly the same p-value. You could also get a much different p-value if, for example, bmi would be perfectly explained by gender*nsa(age) (which we know is not the case). Then, including the interaction would not change much the model and p-value would be even higher. (It is theoretically possible that you see a smaller p-value, if the effects of gender*nsa(age) on smoking and on bmi are of different sign - we would call that case negative confounding.)