Student Scholar Symposium Abstracts and Posters
Document Type
Poster
Publication Date
Fall 12-3-2025
Faculty Advisor(s)
James Wimberley
Abstract
Translation of Islamic religious texts poses unique challenges requiring both linguistic and theological expertise. This study explores the application of neural machine translation (NMT) models to Arabic-English hadith translation while analyzing semantic similarity patterns across different human translations. Using the complete Sahih Bukhari corpus (7,550 hadiths) as the primary dataset, we adopt a dual approach combining transfer learning and comprehensive neural network analysis to demonstrate the critical impact of corpus size on model performance.
First, we fine-tune a pre-trained MarianMT Arabic-English translation model on the full Sahih Bukhari corpus, comparing models trained on 40 hadiths versus 7,550 hadiths. Performance is evaluated using BLEU scores, demonstrating that corpus scale significantly affects translation accuracy: the 40-hadith model achieves a BLEU score of 9.90, while the 7,550-hadith model shows substantial improvement, illustrating how adequate training data is essential for specialized domain adaptation.
Second, we implement and compare ten distinct Siamese neural network architectures to analyze semantic similarity between multiple English translations of the same hadith. These architectures range from simple LSTMs to advanced models incorporating attention mechanisms, bidirectional processing, and transformer encoders. Comprehensive evaluation addresses the severe overfitting observed with limited data: expanding from 40 to 7,550 hadiths improves validation accuracy from 42% to over 70%, reducing the training-validation gap from 56 to under 15 percentage points. An ensemble model combining the top three architectures achieves optimal performance.
Our analysis integrates computational metrics with theological accuracy assessment, leveraging expertise in Islamic studies to evaluate model performance. Findings indicate that while NMT models achieve reasonable quality for straightforward passages, they struggle to preserve religious nuance and precise Arabic terminology. Results provide quantitative evidence that adequate corpus size is the critical factor for meaningful model generalization. This research contributes to computational religious studies and underscores the irreplaceable role of human expertise in translating sacred texts, with implications for Islamic education and digital humanities scholarship.
Recommended Citation
Speight, Asiyah R., "Bridging Machine Learning and Islamic Scholarship: A Study in Hadith Translation and Similarity Analysis" (2025). Student Scholar Symposium Abstracts and Posters. 779.
https://digitalcommons.chapman.edu/cusrd_abstracts/779
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Included in
Arabic Language and Literature Commons, Arabic Studies Commons, Artificial Intelligence and Robotics Commons, Categorical Data Analysis Commons, Data Science Commons, Islamic Studies Commons, Islamic World and Near East History Commons, Language Interpretation and Translation Commons, Longitudinal Data Analysis and Time Series Commons, Reading and Language Commons
Comments
Presented at the Fall 2025 Student Scholar Symposium at Chapman University.