Document Type
Article
Publication Date
5-6-2026
Abstract
Computational prediction of allosteric binding sites in protein structures remains a persistent challenge, as these regulatory pockets evade detection by both sequence-based and structure-based algorithms. Both computational and physical origins of this predictive asymmetry remain insufficiently understood. In this study, we systematically examine the determinants of binding site predictability using a dual framework that integrates a fine-tuned protein language model and the structure-based method P2Rank as complementary tools probing a diverse data set of 453 human kinases, together with a physics-based interpretability layer derived from energy landscape frustration analysis. Both predictors exhibit a sharp and reproducible dichotomy on protein kinases, in which orthosteric ATP-binding sites can be identified with high precision, whereas allosteric sites are detected with substantially lower confidence across kinase structures and distinct conformational states. To decode this divergence, we deploy energy landscape-based explainable AI approach that integrates local frustration analysis as an independent physical interpretability layer, mapping predictive behavior to the underlying energetic organization of protein structures. This analysis reveals that predictive success is governed by the local energetic embedding of binding sites within the protein energy landscape. Orthosteric pockets are located in minimally frustrated basins that generate strong evolutionary and structural signatures, whereas allosteric pockets occupy predominantly neutrally frustrated zones associated with conformational plasticity and reduced evolutionary constraint. By integrating the prediction results with energy landscape analysis, our framework converts predictive performance into physically interpretable descriptors of binding site organization in protein kinases. These results establish energy landscape frustration as a potentially important determinant of algorithmic visibility and an interpretability layer providing a feasible strategy for diagnosing the limits of current prediction methods.
Recommended Citation
Riedlová, K.; Škrhák, V.; Gatlin, W. G.; Ludwick, M.; Turano, L.; Novotný,́ M.; Hoksza, D.; Verkhivker, G. M. Predicting and Decoding Allosteric Binding Sites Using Protein Language Models and Structure-Based Machine Learning: An Energy Landscape-Guided Explainable AI Framework. J. Chem. Theory Comput. 2026. https://doi.org/10.1021/acs.jctc.6c00427
Supporting Information
Peer Reviewed
1
Copyright
The authors
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Included in
Amino Acids, Peptides, and Proteins Commons, Artificial Intelligence and Robotics Commons, Medicinal-Pharmaceutical Chemistry Commons, Molecular Biology Commons, Other Computer Sciences Commons
Comments
This article was originally published in Journal, volume number, issue number, in year. https://doi.org/