Date of Award
Fall 1-2021
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computational and Data Sciences
First Advisor
Erik Linstead
Second Advisor
Elizabeth Stevens
Third Advisor
Ruben Ramirez-Padron
Abstract
The use of machine learning has risen in recent years, though many areas remain unexplored due to lack of data or lack of computational tools. This dissertation explores machine learning approaches in case studies involving image classification and natural language processing. In addition, a software library in the form of two-way bridge connecting deep learning models in Keras with ones available in the Fortran programming language is also presented.
In Chapter 2, we explore the applicability of transfer learning utilizing models pre-trained on non-software engineering data applied to the problem of classifying software unified modeling language diagrams where data is scarce. Our experimental results show training reacts positively to transfer learning as related to sample size, even though the pre-trained model was not exposed to training instances from the software domain. We contrast the transferred network with other networks to show its advantage on different sized training sets.
Implementing artificial neural networks is commonly achieved via high-level programming languages like Python and easy-to-use deep learning libraries like Keras. These libraries come pre-loaded with a variety of network architectures, provide autodifferentiation, and support GPUs for fast and efficient computation. Many large-scale scientific computation projects are written in Fortran, making it difficult to integrate with modern deep learning methods. To alleviate this problem, we introduce a software library, the Fortran-Keras Bridge (FKB), that connects environments where deep learning resources are plentiful, with those where they are scarce. Chapter 3 describes several unique features offered by FKB, such as customizable layers, loss functions, and network ensembles.
In Chapter 4, Latent Dirichlet Allocation (LDA) is leveraged to analyze R and MATLAB source code from 10,051 R packages and 27,000 open source MATLAB modules in order to provide empirical insight on the topic space of scientific computing. This method is able to identify several generic programming concepts and, more importantly, concepts that are highly specific to scientific and high performance computing applications. We are also able to directly compare these topics using document entropy and topic uniformity scoring.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Recommended Citation
N. Best, “Applications of machine learning to facilitate software engineering and scientific computing”, Ph.D. dissertation, Chapman University, Orange, CA, 2021. https://doi.org/10.36837/chapman.000223
Included in
Data Science Commons, Numerical Analysis and Scientific Computing Commons, Software Engineering Commons