Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Computational and Data Sciences

First Advisor

Cyril Rakovski

Second Advisor

Mark DeSantis

Third Advisor

Mohamed Allali


This dissertation documents an investigation into forecasting U.S. stock market equities via two very different time series analysis techniques: 1) autoregressive integrated moving average (ARIMA), and 2) singular spectrum analysis (SSA). Approximately 40% of the S&P 500 stocks are analyzed. Forecasts are generated for one and five days ahead using daily closing prices. Univariate and multivariate structures are applied and results are compared. One objective is to explore the hypothesis that a multivariate model produces superior performance over a univariate configuration. Another objective is to compare the forecasting performance of ARIMA to SSA, as SSA is a relatively recent development and has shown much potential.

Stochastic characteristics of stock market data are analyzed and found to be definitely not Gaussian, but instead better fit to a generalized t-distribution. Probability distribution models are validated with goodness-of-fit tests. For analysis, stock data is segmented into non-overlapping time “windows” to support unconditional statistical evaluation. Univariate and multivariate ARIMA and SSA time series models are evaluated for independence. ARIMA models are found to be independent, but SSA models are not able to reach independence. Statistics for out-of-sample forecasts are computed for every stock in every window, and multivariate-univariate confidence interval shrinkages are examined. Results are compared for univariate, bivariate, and trivariate combinations of highly-correlated stocks. Effects are found to be mixed.

Bivariate modeling and forecasting with three different covariates are investigated. Examination of results with covariates of trading volume, principal component analysis (PCA), and volatility reveal that PCA exhibits the best overall forecasting accuracy in the entire field of investigated elements, including univariate models. Bivariate-PCA structures are applied in a back-testing environment to evaluate economic significance and robustness of the methods. Initial results of back-testing yielded similar results to those from earlier independent testing. Inconsistent performance across test intervals inspired the development of a second technique that yields improved results and positive economic significance. Robustness is validated through back-testing across multiple market trends.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.