library(tidyverse)
library(survival)
library(brms)
library(tidybayes)
library(ggthemes)
First I’ll repurpose a random right-censored weibull random number generator I wrote in a previous post. It took me a while to simplify and appreciate how to do this. The idea is that we’re drawing from two data generating processes. If the first (the censoring process) returns a randomnly generated time prior to that of a second generator, then the observation is censored otherwise we’ve observed death (conversion).
Therefore we have two data generating processes that are mixing together in these cases. If censoring is random, the death time process will shine through. Information in the censoring process will influence observed times.
rweibull_cens <- function(n, shape, scale) {
a_random_death_time <- rweibull(n, shape = shape, scale = scale)
a_random_censor_time <- rweibull(n, shape = shape, scale = scale)
observed_time <- pmin(a_random_censor_time, a_random_death_time)
censor <- observed_time == a_random_death_time
tibble(time = observed_time, censor = censor)
}
rweibull_cens(1e4, shape = 1, scale = 10) -> d
Fit using the common Frequentist approach,
d %>%
survival::survreg(Surv(time, censor) ~ 1, data = .) %>%
summary()
##
## Call:
## survival::survreg(formula = Surv(time, censor) ~ 1, data = .)
## Value Std. Error z p
## (Intercept) 2.2916 0.0142 161.93 0.000
## Log(scale) -0.0169 0.0110 -1.54 0.124
##
## Scale= 0.983
##
## Weibull distribution
## Loglik(model)= -16561.6 Loglik(intercept only)= -16561.6
## Number of Newton-Raphson Iterations: 6
## n= 10000
Fit using brms
, a Bayesian model ready to have prior material incorporated.
d %>%
# Note cens uses the opposite 0/1 encoding of the survival package,
# I prefer the brms style personally. 0 for death, and 1 for censored.
mutate(censor = if_else(censor == 0, 1, 0)) %>%
brm(time | cens(censor) ~ 1, data = ., family = "weibull",
refresh = 2e3, cores = 4) -> bfit
## Compiling the C++ model
## Start sampling
print(bfit, digits = 3)
## Family: weibull
## Links: mu = log; shape = identity
## Formula: time | cens(censor) ~ 1
## Data: . (Number of observations: 10000)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept 2.285 0.015 2.255 2.316 1994 1.000
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## shape 1.017 0.011 0.995 1.039 2183 1.000
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
as_sample_tibble(bfit) %>%
mutate_at(vars(b_Intercept), exp) %>%
ggplot(aes(x = b_Intercept, y = shape)) +
geom_density_2d(colour = "red", size = 1) +
geom_point(alpha = 0.1) +
geom_hline(yintercept = 1, colour = "blue") +
geom_vline(xintercept = 10, colour = "blue") +
xlab("Scale") +
ylab("Shape") +
ggtitle("Right-censored weibull(shape, scale)") +
theme_fivethirtyeight()