Date of Award
Doctor of Philosophy (PhD)
Computational and Data Sciences
Dr. Cyril Rakovski
Dr. Vincent Berardi
Dr. Adrian Vajiac
Baseball has quickly become one of the most analyzed sports with significant growth in the last 20 years  with an enormous amount of data collected every game that requires professional teams to have a state-of the-art analytics team in order to compete in today's game. Statcast, introduced in 2015, "allows for the collection and analysis of a massive amount of baseball data, in ways that were never possible in the past" . Using this new Statcast data that is updated every pitch, a novel metric was developed, Pitcher Effectiveness, that is updated dynamically throughout a game. It was shown to be predictive of runs in combination with rate of change of the metric as well as effective in evaluating a starting pitcher on the game level and overall. Baseball can be broken down into a Markov Chain with 24 different states based on the combination of outs and baserunners where throughout the game teams will transition from one base/out state to another when events such as hits, outs, walks, and others occur . Using this idea, pitch sequencing was explored on the micro level of each state individually. Looking at the last three pitches in a sequence, certain sequences in particular states were shown to have some predictive power in predicting outs, hits, and strikeouts. In addition, proportion tests showed significant differences in the proportion of outs and strikeouts of sequences depending on the baseball state. From fantasy baseball to Major League Baseball (MLB) front offices, projections of players’ future performance are important and are explored quite often. Several machine learning methods were explored for projecting future weighted on base average (wOBA) . These methods were evaluated and the best were compared to 2020 projections from the reputable Steamer .
C. Watkins, "Novel statistical and machine learning methods for the forecasting and analysis of Major League Baseball player performance," Ph.D. dissertation, Chapman University, Orange, CA, 2020. https://doi.org/10.36837/chapman.000139