Date of Award

Summer 8-2020

Document Type


Degree Name

Doctor of Philosophy (PhD)


Computational and Data Sciences

First Advisor

Cyril Rakovski

Second Advisor

Daniel Alpay

Third Advisor

Louis Ehwerhemuepha

Fourth Advisor

Gary Doran


This thesis represents the results of three research projects that underline the breadth and depth of my interests.

Firstly, I devoted some efforts to the well-known Box-Pierce goodness-of-fit tests for time series models which has been an important research topic over the last few decades. All previously proposed tests are focused on changes of the test statistics. Instead, I adopted a different approach that takes the best performing test and modifying the rejection region. Thus, I developed a semiparametric correction of the Adjusted Box-Pierce test that attains the best I error rates for all sample sizes and lags and outperforms all previous global time series goodness-of-fit approaches.

Secondly, I aimed to study and identify novel risk factors significantly associated with 72-hour return visits to emergency departments. I queried data consisting of 185,000 ED visits of patients less than 18 years in the United States using the Cerner® Health Facts Database. A nested mixed-effects logistic regression model to provide statistical inference on associated risk factors was built, and a representative set of machine learning algorithms for our predictive modeling task was selected. New respiratory conditions including acute bronchiolitis, pneumonia, and asthma were identified as risk factors for return visits to ED.

Thirdly, I ambitioned to design and implement a comprehensive study to identify new clinical and demographic factors associated with prolonged length of stay ($>$ two weeks) among pediatric patients (aged 18 years and under) in a number of free-standing pediatric and mixed medical facilities. I implemented a mixed effect model to assess the statistical significance and effect sizes of age, race/ethnicity, number of medications, medical family history, presence of infection agents (fungi, bacteria, virus), cancer diagnoses, and other conditions as well as some clinical variables. A stochastic gradient model was also implemented for prediction. From the mixed-effects model, 11 main effect predictors were found to be significantly and statistically associated with an increase in the odds of prolonged length of stay. The area under the operator characteristic curve (AUROC) for the mixed-effects model was 0.887 (0.885, 0.889) and the extreme gradient boosting model attained an AUROC of 0.931 (0.930, 0.933).

Available for download on Thursday, October 01, 2020