Background: A forecast can be defined as an endeavor to quantitatively estimate a future event or probabilities\r\nassigned to a future occurrence. Forecasting stochastic processes such as epidemics is challenging since there are\r\nseveral biological, behavioral, and environmental factors that influence the number of cases observed at each point\r\nduring an epidemic. However, accurate forecasts of epidemics would impact timely and effective implementation of\r\npublic health interventions. In this study, we introduce a Dirichlet process (DP) model for classifying and forecasting\r\ninfluenza epidemic curves.\r\nMethods: The DP model is a nonparametric Bayesian approach that enables the matching of current influenza\r\nactivity to simulated and historical patterns, identifies epidemic curves different from those observed in the past and\r\nenables prediction of the expected epidemic peak time. The method was validated using simulated influenza\r\nepidemics from an individual-based model and the accuracy was compared to that of the tree-based classification\r\ntechnique, Random Forest (RF), which has been shown to achieve high accuracy in the early prediction of epidemic\r\ncurves using a classification approach. We also applied the method to forecasting influenza outbreaks in the United\r\nStates from 1997ââ?¬â??2013 using influenza-like illness (ILI) data from the Centers for Disease Control and Prevention (CDC).\r\nResults: We made the following observations. First, the DP model performed as well as RF in identifying several of the\r\nsimulated epidemics. Second, the DP model correctly forecasted the peak time several days in advance for most of\r\nthe simulated epidemics. Third, the accuracy of identifying epidemics different from those already observed improved\r\nwith additional data, as expected. Fourth, both methods correctly classified epidemics with higher reproduction\r\nnumbers (R) with a higher accuracy compared to epidemics with lower R values. Lastly, in the classification of seasonal\r\ninfluenza epidemics based on ILI data from the CDC, the methodsââ?¬â?¢ performance was comparable.\r\nConclusions: Although RF requires less computational time compared to the DP model, the algorithm is fully\r\nsupervised implying that epidemic curves different from those previously observed will always be misclassified. In\r\ncontrast, the DP model can be unsupervised, semi-supervised or fully supervised. Since both methods have their\r\nrelative merits, an approach that uses both RF and the DP model could be beneficial.
Loading....