The reliability of many processes are preyed upon by unexpected forced outages. Restoration of regular operations is a time and resource allocation problem. We use a version of the predator-prey model to model to a first approximation the timing of restoration of regular operations.
Our stock variables are hours of operation \(H\), hours of forced outage \(F\), and hour of restored operations \(R\), all across time \(t\), here in weeks.
\[ \begin{align} \frac{dH}{dt} & = -\beta H F \\ \frac{dF}{dt} & = \beta H F - \nu F \\ \frac{dR}{dt} & = \nu F \end{align} \]
Here is the Vensim model of this formulation.
We can export the xmile
file to recreate this model using
readsdr::read_xmile()
. From the file we
pull the function rendered in R.
options( digits=5, scipen=9999999)
#
hfr <- function(t,x,parms){
H <- x[1]
F <- x[2]
R <- x[3]
with(as.list(parms),
{
dH <- -beta*H*F
dF <- beta*H*F - nu*F
dR <- nu*F
out <- c(dH,dF,dR)
list(out)
})
}
#
H0 <- 1e4
parms <- c(H0=H0,beta=0.0001, nu = 1/7)
times <- seq(0,30,0.1)
y0 <- c(H0,1,0)
out_HFR <- ode(y=y0, times, hfr, parms)
colnames(out_HFR) <- c("time","H","F","R")
n_times <- length(times)
t <- rep( 1:n_times, 3)
states <- c( rep( "Hours Operated", n_times), rep( "Forced Outage", n_times), rep( "Restored", n_times))
hours <- c( out_HFR[,2],
out_HFR[,3],
out_HFR[,4]
)
d_plot_out <- tibble(
weeks=out_HFR[,1],
H=out_HFR[,2],
F=out_HFR[,3],
R=out_HFR[,4]
)
p <- d_plot_out |>
ggplot( aes( x=weeks, y=H ) ) +
geom_line( color = "blue") +
geom_line( aes( x=weeks, y=F ), color="red" ) +
geom_line( aes( x=weeks, y=R ), color="black" ) +
labs( title="Hours of operation, forced outage, restored hours",
x="weeks",
y="hours"
)
ggplotly( p )
#
We recently measured the number of hours per week restored.
set.seed(42)
plant_0 <- c(0, 4, 10, 15, 18, 21, 31, 51, 53, 97, 125, 183, 292, 390, 448,
641, 771, 701, 696, 867, 925, 801, 580, 409, 351, 210, 113, 65,
52, 51, 39, 33)
plant <- abs(plant_0*(1 + rlogis(length(plant_0))))
cat(plant)
## 0 14.803 0.85792 38.832 28.493 22.605 62.878 43.876 87.446 181.54 103.82 355.03 1068.9 27.247 380.3 2404.9 3704.7 712.53 626.33 1077.3 2999.6 661.66 3183.6 1585.5 494.8 221.94 62.55 212.07 40.928 134.07 79.307 81.077
plant <- c( 0,1,11,5,13,0,12,67,129,353,327,170,472,703,1079,2211,705,9, 1939,185,2161,1286,403,513,1636,236,124,20,100,215,42,18)
cumplant <- cumsum(plant)
weeks <- 0:31
d_plot_obs <- tibble(
weeks=weeks,
cumplant=cumplant
)
d_plot_obs |>
ggplot( aes( x=weeks, y=cumplant) ) +
geom_point( color = "darkred") +
labs( title="Cumulative Restored Hours Forced Out",
x="Weeks",
y="Cumulative Restoration (hours)")
#
Now comes the hard work of estimating the parameters \(\beta \,\text{and}\, \nu\). Our strategy here will be to keep the estimation entirely in for the time being. To that end we will use Laplace’s quadratic approximation routine. Effectively this means we will maximize the log posterior likelihood of observing restoration data given that \(\beta \,\text{and}\, \nu\) range fairly widely as priors.
The estimator will work if we transform the parameters, as well as
input initial values to the ODE, into log scores. When we pull the
parameters back we will take advantage of obvious logistic structure of
the ODE and of the data. Thus we will use the plogis()
instead of exp()
. We will make good use of the
mle2()
function from the bbmle
package.
# Form likelihood function Pr( y | beta, nu)Pr(beta)Pr(nu)
hfrLL <- function(lbeta, lnu, logH0, logF0) {
parms <- c(beta=plogis(lbeta), nu=plogis(lnu))
y0 <- c(H=exp(logH0), F=exp(logF0), R=0)
out <- ode(y=y0, weeks, hfr, parms)
SD <- sqrt(sum( (cumplant-out[,4])^2)/length(weeks) )
-sum(dnorm(cumplant, mean=out[,4], sd=SD, log=TRUE))
}
#
lbeta <- qlogis(1e-5); lnu <- qlogis(0.2)
logH0 <- log(1e6); logF0=log(1)
hfrLL(lbeta, lnu, logH0, logF0)
## [1] 481.65
# priors are all equally likely
#
# minimize negative-log-likelihood
fit1 <- mle2(hfrLL,
start=list(lbeta=qlogis(1e-5),
lnu=qlogis(.2),
logH0=log(1e6), logF0=log(1) ),
method="Nelder-Mead",
control=list(maxit=1E5,trace=0),
trace=FALSE)
summary(fit1)
## Maximum likelihood estimation
##
## Call:
## mle2(minuslogl = hfrLL, start = list(lbeta = qlogis(0.00001),
## lnu = qlogis(0.2), logH0 = log(1000000), logF0 = log(1)),
## method = "Nelder-Mead", trace = FALSE, control = list(maxit = 100000,
## trace = 0))
##
## Coefficients:
## Estimate Std. Error z value Pr(z)
## lbeta -10.06574 0.00975 -1032.5 <0.0000000000000002 ***
## lnu 0.90717 0.01892 48.0 <0.0000000000000002 ***
## logH0 10.15779 0.00948 1071.9 <0.0000000000000002 ***
## logF0 2.44696 0.07760 31.5 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## -2 log L: 470.38
#
We can try to deviate from these finely tuned start values, but would do so very hesitantly. Our Vensim experience shows a very high sensitivity of ODE solutions to very small changes in \(\beta\) and \(\nu\). The most sensitive component of the model is the linkage between \(H\) and \(R\), namely, the non-linear integration of \(F\).
We calculate the ODE solution with estimated parameters as the \(\mu\) of this regression.
par_mean_1 <- fit1@coef
par_varcov_1 <- fit1@vcov
#
samples_fit1 <- rmvnorm(1000, par_mean_1, par_varcov_1 )
summary(exp(samples_fit1))
## lbeta lnu logH0 logF0
## Min. :0.0000413 Min. :2.35 Min. :25016 Min. : 9.02
## 1st Qu.:0.0000422 1st Qu.:2.45 1st Qu.:25625 1st Qu.:10.93
## Median :0.0000425 Median :2.47 Median :25796 Median :11.52
## Mean :0.0000425 Mean :2.48 Mean :25790 Mean :11.58
## 3rd Qu.:0.0000428 3rd Qu.:2.51 3rd Qu.:25948 3rd Qu.:12.14
## Max. :0.0000437 Max. :2.64 Max. :26567 Max. :15.32
#
hfrMU <- function(par_mean) {
lbeta <- par_mean[[1]]; lnu <- par_mean[[2]]
logH0 <- par_mean[[3]]; logF0 <- par_mean[[2]]
parms <- c(beta=plogis(lbeta), nu=plogis(lnu))
y0 <- c(H=exp(logH0), F=exp(logF0), R=0)
out <- ode(y=y0, weeks, hfr, parms)
out[,4]
}
mu_1 <- hfrMU( par_mean_1 )
sigma_1 <- exp((par_varcov_1[3,3])^0.5)
#
samples_fit1 <- as_tibble(samples_fit1) |>
mutate(prediction = rnorm( 1000, mu_1, sigma_1) )
#
cumplant_pred <- samples_fit1$prediction
quantile(cumplant_pred, 0.045)
## 4.5%
## 1.9264
quantile(cumplant, 0.045)
## 4.5%
## 5.345
#
quantile(cumplant_pred, 0.5)
## 50%
## 1300.4
quantile(cumplant, 0.5)
## 50%
## 5905.5
#
quantile(cumplant_pred, 0.945)
## 94.5%
## 14462
quantile(cumplant, 0.945)
## 94.5%
## 15097
The lower tail does not fitting as well as we might want, but the upper tail does. The median deviates by a wide margin. That divergence is a characteristic of the interactive relationship between hours of operation and forced outage, and non-commutative forced outage and restoration.
The plot might then possibly be misleading.
#
theta <- as.numeric(c(plogis(coef(fit1)[1:2]),
exp(coef(fit1)[3:4])) )
parms <- c(beta=theta[1], nu = theta[2])
times <- seq(0,30,0.1)
y0 <- c(theta[3],theta[4],0)
out_HFR1 <- ode(y=y0, times, hfr, parms)
colnames(out_HFR1) <- c("time","H","F","R")
d_plot <- out_HFR1 |> as_tibble()
p <- ggplot( data=d_plot, aes( x=time, y=R )) +
geom_line( color = "blue" ) +
geom_point( data = d_plot_obs, aes( x=weeks, y=cumplant ), color = "red") +
labs( title="Cumulative Restored Hours Forced Out",
x="Weeks",
y="Cumulative Restoration (hours)")
p
## Don't know how to automatically pick scale for object of type <deSolve/matrix>.
## Defaulting to continuous.
## Don't know how to automatically pick scale for object of type <deSolve/matrix>.
## Defaulting to continuous.
One more attempt at inference are the HPDIs (High Density Probability Intervals).
library( tidybayes )
library( rethinking )
## Loading required package: rstan
## Loading required package: StanHeaders
##
## rstan version 2.32.3 (Stan version 2.26.1)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
## change `threads_per_chain` option:
## rstan_options(threads_per_chain = 1)
## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file
##
## Attaching package: 'rstan'
## The following object is masked from 'package:tidyr':
##
## extract
## Loading required package: cmdstanr
## This is cmdstanr version 0.7.1
## - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
## - CmdStan path: C:/Users/wgfoo/OneDrive/Documents/.cmdstan/cmdstan-2.34.1
## - CmdStan version: 2.34.1
## Loading required package: parallel
## rethinking (Version 2.31)
##
## Attaching package: 'rethinking'
## The following object is masked from 'package:rstan':
##
## stan
## The following object is masked from 'package:mvtnorm':
##
## standardize
## The following object is masked from 'package:purrr':
##
## map
## The following object is masked from 'package:stats':
##
## rstudent
library( coda )
##
## Attaching package: 'coda'
## The following object is masked from 'package:rstan':
##
## traceplot
prob <- c(0.89)
#
# a helper function to make a table of lower and upper
# estimates of highest posterior density intervals
# for estimated parameters and predictions
#
hpdi_tbl <- function( samples, prob=0.89){
column <- colnames(samples)
x <- sapply(column,
function(c) HPDinterval(mcmc(eval(
parse(text = concat("samples$", c)))),
prob = prob))
rownames(x) <- c( concat("|lower ", prob),
concat("upper ", prob, "|") )
return( t(x) )
}
hpdi_samples <- hpdi_tbl( samples_fit1, prob=0.89)
hpdi_samples
## |lower 0.89 upper 0.89|
## lbeta -10.08213 -10.05001
## lnu 0.87876 0.94091
## logH0 10.14331 10.17276
## logF0 2.32517 2.57194
## prediction -0.17951 13640.17611
# filter samples <= 5000
hpdi_samples <- samples_fit1 |> filter ( prediction > 5000 ) |>
hpdi_tbl( prob=0.89)
That interval for prediction is breath-takingly alarming!
set.seed(4242)
samples_fit1 |> filter( prediction > 5000 ) |>
ggplot(aes(x = prediction)) +
geom_histogram(color = "blue", alpha=0.7) +
geom_vline( xintercept=hpdi_samples["prediction",1],
color="red") +
geom_vline( xintercept=hpdi_samples["prediction",2],
color="red") +
labs(title = "Posterior predictive distribution",
subtitle = "89% High Posterior Density Interval",
x = "predicted restored hours",
y = "frequency of predicted outcomes"
) +
theme(panel.grid = element_blank())
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
However we slice this data, we will see there are more than one set of behaviors embedded in this highly non-linear soup.
Ever being optimistic, we try to update our estimates using this routine.
#
# Another fit piggy-backing on our first try `fit1`
#
fit2 <- mle2(hfrLL,
start=as.list(coef(fit1)),
fixed=list(logH0=coef(fit1)[3],
logF0=coef(fit1)[4]),
method="Nelder-Mead",
control=list(maxit=1E5,trace=2),
trace=TRUE)
## Nelder-Mead direct search function minimizer
## function value for initial parameters = 235.189548
## Scaled convergence tolerance is 3.5046e-06
## Stepsize computed as 1.006574
## BUILD 3 355.780270 235.189548
## LO-REDUCTION 5 336.649941 235.189548
## HI-REDUCTION 7 335.950526 235.189548
## LO-REDUCTION 9 321.535570 235.189548
## HI-REDUCTION 11 297.990018 235.189548
## HI-REDUCTION 13 286.358619 235.189548
## HI-REDUCTION 15 268.288119 235.189548
## HI-REDUCTION 17 257.913193 235.189548
## HI-REDUCTION 19 248.872615 235.189548
## HI-REDUCTION 21 241.667385 235.189548
## HI-REDUCTION 23 240.688910 235.189548
## LO-REDUCTION 25 237.486659 235.189548
## HI-REDUCTION 27 236.388869 235.189548
## LO-REDUCTION 29 235.976597 235.189548
## LO-REDUCTION 31 235.625743 235.189548
## LO-REDUCTION 33 235.338612 235.189548
## HI-REDUCTION 35 235.269713 235.189548
## LO-REDUCTION 37 235.236374 235.189548
## LO-REDUCTION 39 235.211728 235.189548
## HI-REDUCTION 41 235.202987 235.189548
## LO-REDUCTION 43 235.198048 235.189548
## LO-REDUCTION 45 235.191295 235.189548
## HI-REDUCTION 47 235.189877 235.189548
## HI-REDUCTION 49 235.189724 235.189417
## HI-REDUCTION 51 235.189548 235.189416
## HI-REDUCTION 53 235.189417 235.189364
## HI-REDUCTION 55 235.189416 235.189360
## LO-REDUCTION 57 235.189364 235.189353
## HI-REDUCTION 59 235.189360 235.189347
## HI-REDUCTION 61 235.189353 235.189345
## Exiting from Nelder Mead minimizer
## 63 function evaluations used
summary( fit2 )
## Maximum likelihood estimation
##
## Call:
## mle2(minuslogl = hfrLL, start = as.list(coef(fit1)), method = "Nelder-Mead",
## fixed = list(logH0 = coef(fit1)[3], logF0 = coef(fit1)[4]),
## trace = TRUE, control = list(maxit = 100000, trace = 2))
##
## Coefficients:
## Estimate Std. Error z value Pr(z)
## lbeta -10.06603 0.00883 -1140.2 <0.0000000000000002 ***
## lnu 0.90586 0.00936 96.8 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## -2 log L: 470.38
par_mean_2 <- fit2@coef
par_varcov_2 <- fit2@vcov
library(mvtnorm)
samples_fit2 <- as.data.frame(rmvnorm(1000, par_mean_2, par_varcov_2 ) )
summary(exp(samples_fit2))
## lbeta lnu
## Min. :0.0000415 Min. :2.40
## 1st Qu.:0.0000422 1st Qu.:2.46
## Median :0.0000425 Median :2.47
## Mean :0.0000425 Mean :2.47
## 3rd Qu.:0.0000427 3rd Qu.:2.49
## Max. :0.0000439 Max. :2.55
#
hfrMU_2 <- function(par_mean) {
lbeta <- par_mean[[1]]; lnu <- par_mean[[2]]
logH0 <- coef(fit1)[3]; logF0 <- coef(fit1)[4]
parms <- c(beta=plogis(lbeta), nu=plogis(lnu))
y0 <- c(H=exp(logH0), F=exp(logF0), R=0)
out <- ode(y=y0, weeks, hfr, parms)
out[,4]
}
mu_2 <- hfrMU_2( par_mean_2 )
sigma_2 <- (par_varcov_1[3,3])^0.5
#
samples_fit2 <- as_tibble(samples_fit2) |>
mutate(prediction = rnorm( 1000, mu_1, sigma_1) )
#
# HPDI next
#
hpdi_samples <- samples_fit2 |> filter ( prediction > 5000 ) |>
hpdi_tbl( prob=0.89)
hpdi_samples
## |lower 0.89 upper 0.89|
## lbeta -10.07957 -10.05295
## lnu 0.89401 0.92211
## prediction 6816.60315 14747.05288
That interval for prediction is still takes our statistical breath away!
set.seed(4242)
samples_fit1 |> filter( prediction > 5000 ) |>
ggplot(aes(x = prediction)) +
geom_histogram(color = "blue", alpha=0.7) +
geom_vline( xintercept=hpdi_samples["prediction",1],
color="red") +
geom_vline( xintercept=hpdi_samples["prediction",2],
color="red") +
labs(title = "Posterior predictive distribution",
subtitle = "beta, nu fit: 89% High Posterior Density Interval",
x = "predicted restored hours",
y = "frequency of predicted outcomes"
) +
theme(panel.grid = element_blank())
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#
# and this initial plot
#
theta2 <- as.numeric(c(plogis(coef(fit2)[1:2]),
exp(coef(fit1)[3:4])) ) # fit2 has only two parms
parms <- c(beta=theta2[1], nu = theta2[2])
times <- seq(0,30,0.1)
y0 <- c(theta2[3],theta2[4],0)
out_HFR2 <- ode(y=y0, times, hfr, parms)
colnames(out_HFR2) <- c("time","H","F","R")
d_plot <- out_HFR2 |> as_tibble()
#
p <- ggplot( data = d_plot_obs, aes( x=weeks, y=cumplant)) +
geom_quantile(data = d_plot_obs, method = "rqss", lambda = 0.6, color = "red", size=1.1) +
geom_point( data = d_plot_obs, aes( x=weeks, y=cumplant ), color = "red") +
geom_line( data = d_plot, aes( x=time, y=R), color = "gray30", size=1.5) +
labs( title="Cumulative Restored Hours Forced Out",
subtitle="5%-95% quantile predictive intervals (red)",
x="Weeks",
y="Cumulative Restoration (hours)")
p
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 0.6)
#
That looks pretty tight!
We still miss the top, and skim the quantile interval at the high end for that matter. But when does the model predict that we are half-way through the restoration? It still seems that the answer lies somewhere between weeks 15 and 20 for both fits. So perhaps we have beaten our model with too large a hammer and have our answer already.