Student Scholar Symposium Abstracts and Posters

Document Type

Poster

Publication Date

Fall 12-3-2025

Faculty Advisor(s)

James Wimberley

Abstract

Translation of Islamic religious texts poses unique challenges requiring both linguistic and theological expertise. This study explores the application of neural machine translation (NMT) models to Arabic-English hadith translation while analyzing semantic similarity patterns across different human translations. Using the complete Sahih Bukhari corpus (7,550 hadiths) as the primary dataset, we adopt a dual approach combining transfer learning and comprehensive neural network analysis to demonstrate the critical impact of corpus size on model performance.

First, we fine-tune a pre-trained MarianMT Arabic-English translation model on the full Sahih Bukhari corpus, comparing models trained on 40 hadiths versus 7,550 hadiths. Performance is evaluated using BLEU scores, demonstrating that corpus scale significantly affects translation accuracy: the 40-hadith model achieves a BLEU score of 9.90, while the 7,550-hadith model shows substantial improvement, illustrating how adequate training data is essential for specialized domain adaptation.

Second, we implement and compare ten distinct Siamese neural network architectures to analyze semantic similarity between multiple English translations of the same hadith. These architectures range from simple LSTMs to advanced models incorporating attention mechanisms, bidirectional processing, and transformer encoders. Comprehensive evaluation addresses the severe overfitting observed with limited data: expanding from 40 to 7,550 hadiths improves validation accuracy from 42% to over 70%, reducing the training-validation gap from 56 to under 15 percentage points. An ensemble model combining the top three architectures achieves optimal performance.

Our analysis integrates computational metrics with theological accuracy assessment, leveraging expertise in Islamic studies to evaluate model performance. Findings indicate that while NMT models achieve reasonable quality for straightforward passages, they struggle to preserve religious nuance and precise Arabic terminology. Results provide quantitative evidence that adequate corpus size is the critical factor for meaningful model generalization. This research contributes to computational religious studies and underscores the irreplaceable role of human expertise in translating sacred texts, with implications for Islamic education and digital humanities scholarship.

Comments

Presented at the Fall 2025 Student Scholar Symposium at Chapman University.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Share

COinS