Date of Award

Fall 1-2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computational and Data Sciences

First Advisor

Erik Linstead

Second Advisor

Elizabeth Stevens

Third Advisor

Ruben Ramirez-Padron

Abstract

The use of machine learning has risen in recent years, though many areas remain unexplored due to lack of data or lack of computational tools. This dissertation explores machine learning approaches in case studies involving image classification and natural language processing. In addition, a software library in the form of two-way bridge connecting deep learning models in Keras with ones available in the Fortran programming language is also presented.

In Chapter 2, we explore the applicability of transfer learning utilizing models pre-trained on non-software engineering data applied to the problem of classifying software unified modeling language diagrams where data is scarce. Our experimental results show training reacts positively to transfer learning as related to sample size, even though the pre-trained model was not exposed to training instances from the software domain. We contrast the transferred network with other networks to show its advantage on different sized training sets.

Implementing artificial neural networks is commonly achieved via high-level programming languages like Python and easy-to-use deep learning libraries like Keras. These libraries come pre-loaded with a variety of network architectures, provide autodifferentiation, and support GPUs for fast and efficient computation. Many large-scale scientific computation projects are written in Fortran, making it difficult to integrate with modern deep learning methods. To alleviate this problem, we introduce a software library, the Fortran-Keras Bridge (FKB), that connects environments where deep learning resources are plentiful, with those where they are scarce. Chapter 3 describes several unique features offered by FKB, such as customizable layers, loss functions, and network ensembles.

In Chapter 4, Latent Dirichlet Allocation (LDA) is leveraged to analyze R and MATLAB source code from 10,051 R packages and 27,000 open source MATLAB modules in order to provide empirical insight on the topic space of scientific computing. This method is able to identify several generic programming concepts and, more importantly, concepts that are highly specific to scientific and high performance computing applications. We are also able to directly compare these topics using document entropy and topic uniformity scoring.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Recommended Citation

N. Best, “Applications of machine learning to facilitate software engineering and scientific computing”, Ph.D. dissertation, Chapman University, Orange, CA, 2021. https://doi.org/10.36837/chapman.000223

Download

Included in

Data Science Commons, Numerical Analysis and Scientific Computing Commons, Software Engineering Commons

COinS

DOI

https://doi.org/10.36837/chapman.000223

Chapman University Digital Commons

Computational and Data Sciences (PhD) Dissertations

Applications of Machine Learning to Facilitate Software Engineering and Scientific Computing

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Creative Commons License

Recommended Citation

Included in

DOI

Browse

Search

Author Corner

Links

Chapman University Digital Commons

Computational and Data Sciences (PhD) Dissertations

Applications of Machine Learning to Facilitate Software Engineering and Scientific Computing

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Creative Commons License

Recommended Citation

Included in

Share

DOI

Browse

Search

Author Corner

Links