Mathematics, Physics, and Computer Science Faculty Articles and Research

From Latent Manifolds to Targeted Molecular Probes: An Interpretable, Kinome-Scale Generative Machine Learning Framework for Family-Based Kinase Ligand Design

Gennady M. Verkhivker, Chapman UniversityFollow
Ryan Kassab, Chapman UniversityFollow
Keerthi Krishnan, Chapman UniversityFollow

Document Type

Article

Publication Date

1-28-2026

Abstract

Scaffold-aware artificial intelligence (AI) models enable systematic exploration of chemical space conditioned on protein-interacting ligands, yet the representational principles governing their behavior remain poorly understood. The computational representation of structurally complex kinase small molecules remains a formidable challenge due to the high conservation of ATP active site architecture across the kinome and the topological complexity of structural scaffolds in current generative AI frameworks. In this study, we present a diagnostic, modular and chemistry-first generative framework for design of targeted SRC kinase ligands by integrating ChemVAE-based latent space modeling, a chemically interpretable structural similarity metric (Kinase Likelihood Score), Bayesian optimization, and cluster-guided local neighborhood sampling. Using a comprehensive dataset of protein kinase ligands, we examine scaffold topology, latent-space geometry, and model-driven generative trajectories. We show that chemically distinct scaffolds can converge toward overlapping latent representations, revealing intrinsic degeneracy in scaffold encoding, while specific topological motifs function as organizing anchors that constrain generative diversification. The results demonstrate that kinase scaffolds spanning 37 protein kinase families spontaneously organize into a coherent, low-dimensional manifold in latent space, with SRC-like scaffolds acting as a structural “hub” that enables rational scaffold transformation. Our local sampling approach successfully converts scaffolds from other kinase families (notably LCK) into novel SRC-like chemotypes, with LCK-derived molecules accounting for ~40% of high-similarity outputs. However, both generative strategies reveal a critical limitation: SMILES-based representations systematically fail to recover multi-ring aromatic systems—a topological hallmark of kinase chemotypes—despite ring count being a top feature in our structural similarity metric. This “representation gap” demonstrates that no amount of scoring refinement can compensate for a generative engine that cannot access topologically constrained regions. By diagnosing these constraints within a transparent pipeline and reframing scaffold-aware ligand design as a problem of molecular representation our work provides a conceptual framework for interpreting generative model behavior and for guiding the incorporation of structural priors into future molecular AI architectures.

Comments

This article was originally published in Biomolecules, volume 16, issue 2, in 2026. https://doi.org/10.3390/biom16020209

Recommended Citation

Verkhivker, G.; Kassab, R.; Krishnan, K. From Latent Manifolds to Targeted Molecular Probes: An Interpretable, Kinome-Scale Generative Machine Learning Framework for Family-Based Kinase Ligand Design. Biomolecules 2026, 16, 209. https://doi.org/10.3390/biom16020209

biomolecules-16-00209-s001.zip (46644 kB)
Supplementary Materials

Peer Reviewed

Copyright

The authors

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Download

Included in

Amino Acids, Peptides, and Proteins Commons, Artificial Intelligence and Robotics Commons, Medicinal-Pharmaceutical Chemistry Commons

COinS

Chapman University Digital Commons

Mathematics, Physics, and Computer Science Faculty Articles and Research

From Latent Manifolds to Targeted Molecular Probes: An Interpretable, Kinome-Scale Generative Machine Learning Framework for Family-Based Kinase Ligand Design

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Peer Reviewed

Copyright

Creative Commons License

Included in

Browse

Search

Author Corner

Links

Chapman University Digital Commons

Mathematics, Physics, and Computer Science Faculty Articles and Research

From Latent Manifolds to Targeted Molecular Probes: An Interpretable, Kinome-Scale Generative Machine Learning Framework for Family-Based Kinase Ligand Design

Authors

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Peer Reviewed

Copyright

Creative Commons License

Included in

Share

Browse

Search

Author Corner

Links