Overcooked models .smaller[Mixing prediction, explanation, confounders, and mediators]

class: left bottom hide-count
background-image: url(figures/moffittlogo.png)
background-size: 22%
background-position: bottom 5% right 5%

.talk-meta[
.talk-title[
# Overcooked models <br> .smaller[Mixing prediction, explanation, confounders, and mediators]
]

.talk-author[
Travis Gerke, ScD<br>
.small[@travisgerke]
]

.talk-date[

]
]

.hide-count .remark-slide-number {
  display: none;
}

</style>

---

background-image: url(figures/allcancers.png)
background-size: 70%
background-position: bottom 20% center

## Motivating example: prostate cancer

---

background-image: url(figures/autopsy.png)
background-size: 45%
background-position: bottom 10% center

## More men die with prostate cancer than from it

- 5-year survival `\(\approx\)` 98% and <10% of prostate cancer patients have fatal disease<br><br>
- 2.9 million men living with a diagnosis in US
  + 42 million latent (undiagnosed) cases!

---

background-image: url(figures/incidence.png)
background-size: 55%
background-position: bottom 10% center

## PSA screening trends determine cancer incidence

- PSA screening `\(\longrightarrow\)` high rates of overdiagnosis and overtreatment
  + Up to 2/3 of prostate cancers are overdiagnosed, most are treated<sup>.smaller[1]</sup>
<style type="text/css">
.footer-2 {
  position: absolute;
  top: 28%;
  left: 75%;
}
</style>
.footer-2[.smaller[1\. Loeb et al. _Eur Urol_ 2014]]

---

background-image: url(figures/survFigure.png)
background-size: 55%
background-position: bottom 5% center

## An urgent clinical challenge

- How can we distinguish indolent from lethal disease?
  + Patients with indolent tumors could avoid overtreatment
  + Patients with potentially lethal tumors could receive timely treatment
  
<style type="text/css">
.footer-3 {
  position: absolute;
  top: 93%;
  left: 55%;
}
</style>
.footer-3[.smaller[Pettersson<sup>∗</sup>, Gerke<sup>∗</sup>, et al. _Cancer_, 2017]]

---

background-image: url(figures/mystery.gif)
background-size: 45%
background-position: bottom 10% center

## A different(???) clinical challenge

- What are the causes of lethal disease?
  + If we know these, won't we also know the answer to the previous question, "How can we distinguish indolent from lethal disease?"

---

background-image: url(figures/complicated.gif)
background-size: 80%
background-position: bottom 20% center

## It's complicated.

---

background-image: url(figures/explainpredict.png)
background-size: 90%
background-position: bottom 20% center

## Good prediction models `\(\ne\)` good causal models

---

## Explanatory modeling = causal inference

- Test causal hypotheses for mechanistic understanding
  + Randomized experiments/trials are a gold standard
  + Increasingly, causal inference is conducted by evaluating association patterns within observational data according to specific rules
  + Success: an understandable statistical model (e.g. regression) that fits data well according to an expert-guided mechanistic theory
  
## Predictive modeling = predicting future events
- Models that use input values to accurately predict future outputs
  + Study design includes training and validation data sets
  + Success: a model built in the training data which need not be easily interpretable (e.g. neural net) works well in the validation data

---

background-image: url(figures/appendix.png)
background-size: 60%
background-position: bottom 20% center

## Proof that good prediction `\(\ne\)` good explanation

- TL;DR version: prediction error is a tradeoff between bias and variance. You can use a biased model (in a causal sense) that has low variance to reduce error

---

background-image: url(figures/icecream.png)
background-size: 60%
background-position: bottom 20% center

## Simpler for good prediction `\(\ne\)` good explanation

- You can build good prediction models with variables that have nothing to do with mechanism

---

## Correlation is not causation, except when it is

- The previous slide gave a good example of non-causative correlation
  - But I also just said we can use associational patterns to infer causation
  - So, which is it?<br><br>
- Directed acyclic graphs (DAGs) help understand when correlation == causation
  - The most basic rule is that association flows through edges
  - When two nodes are connected, we observe a statistical association
  - When association persists when all spurious edges between two variables are blocked, the observed association is causal

.pull-left[
<img src="figures/dag1.png">
]

.pull-right[
<img src="figures/dag2.png">
]

---

background-image: url(figures/appshot.png)
background-size: 70%
background-position: bottom 20% center

## DAGs: there's an app for that
- There are rules governing how association flows beyond the scope of this talk
  + [https://apps.gerkelab.com/shinyDAG/](https://apps.gerkelab.com/shinyDAG/)

---

## Example: Randomized trials as a gold stardard

- Randomized experiments provide causal effect estimates
  + Here's a DAG for an RCT under the null
  + Boilerplate explanation: "Because randomization adjusts for all confounders"<br><br>
- _Very important definition:_ A confounder is a common cause of exposure and outcome

.pull-left[
<img src="figures/dag3.png">
]

.pull-right[
<img src="figures/wiki.png">
]

---

background-image: url(figures/deathrates.png)
background-size: 90%
background-position: bottom 20% center

## Back to the motivating example: PCa disparities

---

background-image: url(figures/notrisk.png)
background-size: 90%
background-position: bottom 10% center

## An artifact? "Confounding"?

.pull-left[
<img src="figures/myth.png">
]

.pull-right[
<img src="figures/jamadess.png">
]

---

background-image: url(figures/racedag.png)
background-size: 70%
background-position: bottom 1% center

## Let the DAG help us know what to do!
- Are there confounders of race?

---

background-image: url(figures/confounders.png)
background-size: 90%
background-position: bottom 25% center

## Oops.

---

background-image: url(figures/ncdb.png)
background-size: 80%
background-position: bottom 10% center

## Let's try again
- 1,380,357 prostate cancer patients in NCDB
  + After subsetting to those with complete follow-up, created a density-matched 1:1 case-control sample of 12,256 patients
  + Crude analysis by race: HR for death of 1.30 (95% CI: 1.18-1.44) comparing black to white patients
  
---

## How to properly think about the role of access

- Another _very important definition_: A mediator is a variable caused by exposure, which in turn, causes the outcome
  + Effects can then be separated into _direct_ and _indirect_ components
  + Methods for mediation analyses are a distinct branch of statistics
  + Recommended resource: VanderWeele 2015 _Explanation in Causal Inference_
  + For implementation in R, [https://github.com/GerkeLab/mediator](https://github.com/GerkeLab/mediator)
  
.pull-left[
<img src="figures/mediationdag.png">
]

.pull-right[
<img src="figures/racedag.png">
]

---

background-image: url(figures/fullmodel.png)
background-size: 30%
background-position: bottom 10% right 15%

## Effect of race is minimally mediated by access

- One way to measure this is through insurance status
  + The proportion of race effect mediated through insurance status `\(\approx 1\)`%
  + We needed to adjust for many mediator-outcome confounders 
  + [Hand-wave over other nuances &#x1F44B;]
  + [Can discuss these at conclusion]
  
.pull-right[
<img src="figures/mediatorhex.png" height=100px>
]

---

background-image: url(figures/auc.png)
background-size: 40%
background-position: bottom 10% center

## The published effort is still useful

- Turns out, it's a decent predictive model!
  + The below is AUC on a 20% hold-out validation data set from a simple logistic regression with all included factors
  + And we be even better if we used more modern machine learning

---
background-image: url(figures/hexcollage.png)
background-size: 30%
background-position: bottom 10% right 10%

## The real heroes 
<img src="figures/ack.png" height=300px>

## Further details
- https://www.gerkelab.com/
- https://github.com/gerkelab