Operational Forced Outages and Restoration

The reliability of many processes are preyed upon by unexpected forced outages. Restoration of regular operations is a time and resource allocation problem. We use a version of the predator-prey model to model to a first approximation the timing of restoration of regular operations.

Our stock variables are hours of operation \(H\), hours of forced outage \(F\), and hour of restored operations \(R\), all across time \(t\), here in weeks.

\[ \begin{align} \frac{dH}{dt} & = -\beta H F \\ \frac{dF}{dt} & = \beta H F - \nu F \\ \frac{dR}{dt} & = \nu F \end{align} \]

Here is the Vensim model of this formulation.

Fig. 1. Hours of operation, forced outage, and restored hours. We can export the xmile file to recreate this model using readsdr::read_xmile(). From the file we pull the function rendered in R.

options( digits=5, scipen=9999999)
#
hfr <- function(t,x,parms){
  H <- x[1]
  F <- x[2]
  R <- x[3]
  with(as.list(parms),
       {
         dH <- -beta*H*F
         dF <- beta*H*F - nu*F
         dR <- nu*F
         out <- c(dH,dF,dR)
         list(out)
       })
}
#
H0 <- 1e4
parms <- c(H0=H0,beta=0.0001, nu = 1/7)
times <- seq(0,30,0.1)
y0 <- c(H0,1,0)
out_HFR <- ode(y=y0, times, hfr, parms)
colnames(out_HFR) <- c("time","H","F","R")
n_times <- length(times)
t <- rep( 1:n_times, 3)
states <- c( rep( "Hours Operated", n_times), rep( "Forced Outage", n_times), rep( "Restored", n_times))
hours <- c( out_HFR[,2],
            out_HFR[,3],
            out_HFR[,4]
            )
d_plot_out <- tibble(
  weeks=out_HFR[,1],
  H=out_HFR[,2],
  F=out_HFR[,3],
  R=out_HFR[,4]
)
p <- d_plot_out |>
  ggplot( aes( x=weeks, y=H ) ) +
  geom_line( color = "blue") +
  geom_line( aes( x=weeks, y=F ), color="red" ) +
  geom_line( aes( x=weeks, y=R ), color="black" ) +
  labs( title="Hours of operation, forced outage, restored hours",
        x="weeks",
        y="hours"
  )
ggplotly( p )
#

We recently measured the number of hours per week restored.

set.seed(42)
plant_0 <- c(0, 4, 10, 15, 18, 21, 31, 51, 53, 97, 125, 183, 292, 390, 448,
            641, 771, 701, 696, 867, 925, 801, 580, 409, 351, 210, 113, 65, 
            52, 51, 39, 33)
plant <- abs(plant_0*(1 + rlogis(length(plant_0))))
cat(plant)
## 0 14.803 0.85792 38.832 28.493 22.605 62.878 43.876 87.446 181.54 103.82 355.03 1068.9 27.247 380.3 2404.9 3704.7 712.53 626.33 1077.3 2999.6 661.66 3183.6 1585.5 494.8 221.94 62.55 212.07 40.928 134.07 79.307 81.077
plant <- c( 0,1,11,5,13,0,12,67,129,353,327,170,472,703,1079,2211,705,9, 1939,185,2161,1286,403,513,1636,236,124,20,100,215,42,18)
cumplant <- cumsum(plant)
weeks <- 0:31
d_plot_obs <- tibble(
  weeks=weeks,
  cumplant=cumplant
)
d_plot_obs |>
  ggplot( aes( x=weeks, y=cumplant) ) +
  geom_point( color = "darkred") +
  labs( title="Cumulative Restored Hours Forced Out", 
        x="Weeks", 
        y="Cumulative Restoration (hours)")

#

Now comes the hard work of estimating the parameters \(\beta \,\text{and}\, \nu\). Our strategy here will be to keep the estimation entirely in for the time being. To that end we will use Laplace’s quadratic approximation routine. Effectively this means we will maximize the log posterior likelihood of observing restoration data given that \(\beta \,\text{and}\, \nu\) range fairly widely as priors.

The estimator will work if we transform the parameters, as well as input initial values to the ODE, into log scores. When we pull the parameters back we will take advantage of obvious logistic structure of the ODE and of the data. Thus we will use the plogis() instead of exp(). We will make good use of the mle2() function from the bbmle package.

# Form likelihood function Pr( y | beta, nu)Pr(beta)Pr(nu)
hfrLL <- function(lbeta, lnu, logH0, logF0) {
  parms <- c(beta=plogis(lbeta), nu=plogis(lnu))
  y0 <- c(H=exp(logH0), F=exp(logF0), R=0)
  out <- ode(y=y0, weeks, hfr, parms)
  SD <- sqrt(sum( (cumplant-out[,4])^2)/length(weeks) )
  -sum(dnorm(cumplant, mean=out[,4], sd=SD, log=TRUE))
}
#
lbeta <- qlogis(1e-5); lnu <- qlogis(0.2) 
logH0 <- log(1e6); logF0=log(1)
hfrLL(lbeta, lnu, logH0, logF0)
## [1] 481.65
# priors are all equally likely
# 
# minimize negative-log-likelihood
fit1 <- mle2(hfrLL, 
            start=list(lbeta=qlogis(1e-5), 
                       lnu=qlogis(.2), 
                       logH0=log(1e6), logF0=log(1) ),  
            method="Nelder-Mead",
            control=list(maxit=1E5,trace=0),
            trace=FALSE)
summary(fit1)
## Maximum likelihood estimation
## 
## Call:
## mle2(minuslogl = hfrLL, start = list(lbeta = qlogis(0.00001), 
##     lnu = qlogis(0.2), logH0 = log(1000000), logF0 = log(1)), 
##     method = "Nelder-Mead", trace = FALSE, control = list(maxit = 100000, 
##         trace = 0))
## 
## Coefficients:
##        Estimate Std. Error z value               Pr(z)    
## lbeta -10.06574    0.00975 -1032.5 <0.0000000000000002 ***
## lnu     0.90717    0.01892    48.0 <0.0000000000000002 ***
## logH0  10.15779    0.00948  1071.9 <0.0000000000000002 ***
## logF0   2.44696    0.07760    31.5 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -2 log L: 470.38
#

We can try to deviate from these finely tuned start values, but would do so very hesitantly. Our Vensim experience shows a very high sensitivity of ODE solutions to very small changes in \(\beta\) and \(\nu\). The most sensitive component of the model is the linkage between \(H\) and \(R\), namely, the non-linear integration of \(F\).

We calculate the ODE solution with estimated parameters as the \(\mu\) of this regression.

par_mean_1 <- fit1@coef
par_varcov_1 <- fit1@vcov
#
samples_fit1 <- rmvnorm(1000, par_mean_1, par_varcov_1 ) 
summary(exp(samples_fit1))
##      lbeta                lnu           logH0           logF0      
##  Min.   :0.0000413   Min.   :2.35   Min.   :25016   Min.   : 9.02  
##  1st Qu.:0.0000422   1st Qu.:2.45   1st Qu.:25625   1st Qu.:10.93  
##  Median :0.0000425   Median :2.47   Median :25796   Median :11.52  
##  Mean   :0.0000425   Mean   :2.48   Mean   :25790   Mean   :11.58  
##  3rd Qu.:0.0000428   3rd Qu.:2.51   3rd Qu.:25948   3rd Qu.:12.14  
##  Max.   :0.0000437   Max.   :2.64   Max.   :26567   Max.   :15.32
#
hfrMU <- function(par_mean) {
  lbeta <- par_mean[[1]]; lnu <- par_mean[[2]] 
  logH0 <- par_mean[[3]]; logF0 <- par_mean[[2]]
  parms <- c(beta=plogis(lbeta), nu=plogis(lnu))
  y0 <- c(H=exp(logH0), F=exp(logF0), R=0)
  out <- ode(y=y0, weeks, hfr, parms)
  out[,4]
}
mu_1 <- hfrMU( par_mean_1 )
sigma_1 <- exp((par_varcov_1[3,3])^0.5)
#
samples_fit1 <- as_tibble(samples_fit1) |> 
  mutate(prediction = rnorm( 1000, mu_1, sigma_1) )
#
cumplant_pred <- samples_fit1$prediction
quantile(cumplant_pred, 0.045)
##   4.5% 
## 1.9264
quantile(cumplant, 0.045)
##  4.5% 
## 5.345
#
quantile(cumplant_pred, 0.5)
##    50% 
## 1300.4
quantile(cumplant, 0.5)
##    50% 
## 5905.5
#
quantile(cumplant_pred, 0.945)
## 94.5% 
## 14462
quantile(cumplant, 0.945)
## 94.5% 
## 15097

The lower tail does not fitting as well as we might want, but the upper tail does. The median deviates by a wide margin. That divergence is a characteristic of the interactive relationship between hours of operation and forced outage, and non-commutative forced outage and restoration.

The plot might then possibly be misleading.

#
theta <- as.numeric(c(plogis(coef(fit1)[1:2]),
                      exp(coef(fit1)[3:4])) )
parms <- c(beta=theta[1], nu = theta[2])
times <- seq(0,30,0.1)
y0 <- c(theta[3],theta[4],0)
out_HFR1 <- ode(y=y0, times, hfr, parms)
colnames(out_HFR1) <- c("time","H","F","R")
d_plot <- out_HFR1 |> as_tibble()

p <- ggplot( data=d_plot, aes( x=time, y=R )) +
  geom_line( color = "blue" ) +
  geom_point( data = d_plot_obs, aes( x=weeks, y=cumplant ), color = "red") +
  labs( title="Cumulative Restored Hours Forced Out", 
        x="Weeks", 
        y="Cumulative Restoration (hours)")
p
## Don't know how to automatically pick scale for object of type <deSolve/matrix>.
## Defaulting to continuous.
## Don't know how to automatically pick scale for object of type <deSolve/matrix>.
## Defaulting to continuous.

One more attempt at inference are the HPDIs (High Density Probability Intervals).

library( tidybayes )
library( rethinking )
## Loading required package: rstan
## Loading required package: StanHeaders
## 
## rstan version 2.32.3 (Stan version 2.26.1)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
## change `threads_per_chain` option:
## rstan_options(threads_per_chain = 1)
## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file
## 
## Attaching package: 'rstan'
## The following object is masked from 'package:tidyr':
## 
##     extract
## Loading required package: cmdstanr
## This is cmdstanr version 0.7.1
## - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
## - CmdStan path: C:/Users/wgfoo/OneDrive/Documents/.cmdstan/cmdstan-2.34.1
## - CmdStan version: 2.34.1
## Loading required package: parallel
## rethinking (Version 2.31)
## 
## Attaching package: 'rethinking'
## The following object is masked from 'package:rstan':
## 
##     stan
## The following object is masked from 'package:mvtnorm':
## 
##     standardize
## The following object is masked from 'package:purrr':
## 
##     map
## The following object is masked from 'package:stats':
## 
##     rstudent
library( coda )
## 
## Attaching package: 'coda'
## The following object is masked from 'package:rstan':
## 
##     traceplot
prob <- c(0.89)
#
# a helper function to make a table of lower and upper
# estimates of highest posterior density intervals
# for estimated parameters and predictions
#
hpdi_tbl <- function( samples, prob=0.89){
  column <- colnames(samples)
  x <- sapply(column, 
      function(c) HPDinterval(mcmc(eval(
        parse(text = concat("samples$", c)))), 
              prob = prob))
  rownames(x) <- c( concat("|lower ", prob), 
                    concat("upper ", prob, "|") )
  return( t(x) )
}
hpdi_samples <- hpdi_tbl( samples_fit1, prob=0.89)
hpdi_samples
##            |lower 0.89 upper 0.89|
## lbeta        -10.08213   -10.05001
## lnu            0.87876     0.94091
## logH0         10.14331    10.17276
## logF0          2.32517     2.57194
## prediction    -0.17951 13640.17611
# filter samples <= 5000
hpdi_samples <- samples_fit1 |> filter ( prediction > 5000 ) |>
  hpdi_tbl( prob=0.89)

That interval for prediction is breath-takingly alarming!

set.seed(4242)
samples_fit1 |> filter( prediction > 5000 ) |>
  ggplot(aes(x = prediction)) +
  geom_histogram(color = "blue", alpha=0.7) +
  geom_vline( xintercept=hpdi_samples["prediction",1], 
              color="red") +
  geom_vline( xintercept=hpdi_samples["prediction",2], 
              color="red") +
  labs(title = "Posterior predictive distribution",
       subtitle = "89% High Posterior Density Interval",
       x = "predicted restored hours",
       y = "frequency of predicted outcomes"
       ) +
  theme(panel.grid = element_blank())
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

However we slice this data, we will see there are more than one set of behaviors embedded in this highly non-linear soup.

Ever being optimistic, we try to update our estimates using this routine.

#
# Another fit piggy-backing on our first try `fit1`
# 
fit2 <- mle2(hfrLL, 
             start=as.list(coef(fit1)),
             fixed=list(logH0=coef(fit1)[3], 
                        logF0=coef(fit1)[4]), 
             method="Nelder-Mead",
             control=list(maxit=1E5,trace=2),
             trace=TRUE)
##   Nelder-Mead direct search function minimizer
## function value for initial parameters = 235.189548
##   Scaled convergence tolerance is 3.5046e-06
## Stepsize computed as 1.006574
## BUILD              3 355.780270 235.189548
## LO-REDUCTION       5 336.649941 235.189548
## HI-REDUCTION       7 335.950526 235.189548
## LO-REDUCTION       9 321.535570 235.189548
## HI-REDUCTION      11 297.990018 235.189548
## HI-REDUCTION      13 286.358619 235.189548
## HI-REDUCTION      15 268.288119 235.189548
## HI-REDUCTION      17 257.913193 235.189548
## HI-REDUCTION      19 248.872615 235.189548
## HI-REDUCTION      21 241.667385 235.189548
## HI-REDUCTION      23 240.688910 235.189548
## LO-REDUCTION      25 237.486659 235.189548
## HI-REDUCTION      27 236.388869 235.189548
## LO-REDUCTION      29 235.976597 235.189548
## LO-REDUCTION      31 235.625743 235.189548
## LO-REDUCTION      33 235.338612 235.189548
## HI-REDUCTION      35 235.269713 235.189548
## LO-REDUCTION      37 235.236374 235.189548
## LO-REDUCTION      39 235.211728 235.189548
## HI-REDUCTION      41 235.202987 235.189548
## LO-REDUCTION      43 235.198048 235.189548
## LO-REDUCTION      45 235.191295 235.189548
## HI-REDUCTION      47 235.189877 235.189548
## HI-REDUCTION      49 235.189724 235.189417
## HI-REDUCTION      51 235.189548 235.189416
## HI-REDUCTION      53 235.189417 235.189364
## HI-REDUCTION      55 235.189416 235.189360
## LO-REDUCTION      57 235.189364 235.189353
## HI-REDUCTION      59 235.189360 235.189347
## HI-REDUCTION      61 235.189353 235.189345
## Exiting from Nelder Mead minimizer
##     63 function evaluations used
summary( fit2 )
## Maximum likelihood estimation
## 
## Call:
## mle2(minuslogl = hfrLL, start = as.list(coef(fit1)), method = "Nelder-Mead", 
##     fixed = list(logH0 = coef(fit1)[3], logF0 = coef(fit1)[4]), 
##     trace = TRUE, control = list(maxit = 100000, trace = 2))
## 
## Coefficients:
##        Estimate Std. Error z value               Pr(z)    
## lbeta -10.06603    0.00883 -1140.2 <0.0000000000000002 ***
## lnu     0.90586    0.00936    96.8 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -2 log L: 470.38
par_mean_2 <- fit2@coef
par_varcov_2 <- fit2@vcov
library(mvtnorm)
samples_fit2 <- as.data.frame(rmvnorm(1000, par_mean_2, par_varcov_2 ) )
summary(exp(samples_fit2))
##      lbeta                lnu      
##  Min.   :0.0000415   Min.   :2.40  
##  1st Qu.:0.0000422   1st Qu.:2.46  
##  Median :0.0000425   Median :2.47  
##  Mean   :0.0000425   Mean   :2.47  
##  3rd Qu.:0.0000427   3rd Qu.:2.49  
##  Max.   :0.0000439   Max.   :2.55
#
hfrMU_2 <- function(par_mean) {
  lbeta <- par_mean[[1]]; lnu <- par_mean[[2]] 
  logH0 <- coef(fit1)[3]; logF0 <- coef(fit1)[4]
  parms <- c(beta=plogis(lbeta), nu=plogis(lnu))
  y0 <- c(H=exp(logH0), F=exp(logF0), R=0)
  out <- ode(y=y0, weeks, hfr, parms)
  out[,4]
}
mu_2 <- hfrMU_2( par_mean_2 )
sigma_2 <- (par_varcov_1[3,3])^0.5
#
samples_fit2 <- as_tibble(samples_fit2) |> 
  mutate(prediction = rnorm( 1000, mu_1, sigma_1) )
#
# HPDI next
# 
hpdi_samples <- samples_fit2 |> filter ( prediction > 5000 ) |>
  hpdi_tbl( prob=0.89)
hpdi_samples
##            |lower 0.89 upper 0.89|
## lbeta        -10.07957   -10.05295
## lnu            0.89401     0.92211
## prediction  6816.60315 14747.05288

That interval for prediction is still takes our statistical breath away!

set.seed(4242)
samples_fit1 |> filter( prediction > 5000 ) |>
  ggplot(aes(x = prediction)) +
  geom_histogram(color = "blue", alpha=0.7) +
  geom_vline( xintercept=hpdi_samples["prediction",1], 
              color="red") +
  geom_vline( xintercept=hpdi_samples["prediction",2], 
              color="red") +
  labs(title = "Posterior predictive distribution",
       subtitle = "beta, nu fit: 89% High Posterior Density Interval",
       x = "predicted restored hours",
       y = "frequency of predicted outcomes"
       ) +
  theme(panel.grid = element_blank())
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#

# and this initial plot
# 
theta2 <- as.numeric(c(plogis(coef(fit2)[1:2]),
                      exp(coef(fit1)[3:4])) ) # fit2 has only two parms 
parms <- c(beta=theta2[1], nu = theta2[2])
times <- seq(0,30,0.1)
y0 <- c(theta2[3],theta2[4],0)
out_HFR2 <- ode(y=y0, times, hfr, parms)
colnames(out_HFR2) <- c("time","H","F","R")
d_plot <- out_HFR2 |> as_tibble()
#
p <- ggplot( data = d_plot_obs, aes( x=weeks, y=cumplant)) +
  geom_quantile(data = d_plot_obs, method = "rqss", lambda = 0.6, color = "red", size=1.1) +
  geom_point( data = d_plot_obs, aes( x=weeks, y=cumplant ), color = "red") +
  geom_line( data = d_plot, aes( x=time, y=R), color = "gray30", size=1.5) +
  labs( title="Cumulative Restored Hours Forced Out", 
        subtitle="5%-95% quantile predictive intervals (red)",
        x="Weeks", 
        y="Cumulative Restoration (hours)")
p
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 0.6)

#

That looks pretty tight!

We still miss the top, and skim the quantile interval at the high end for that matter. But when does the model predict that we are half-way through the restoration? It still seems that the answer lies somewhere between weeks 15 and 20 for both fits. So perhaps we have beaten our model with too large a hammer and have our answer already.