Quantitative Methods
- [1] arXiv:2405.20523 [pdf, ps, other]
-
Title: Systems-level health of patients living with end-stage kidney disease using standard lab valuesComments: 50 pages (15 for main, 35 supplemental), 10 figures in mainSubjects: Quantitative Methods (q-bio.QM)
We present a systems-level analysis of end-stage kidney disease (ESKD) with a dynamical network analysis of 14 commonly measured blood-based biomarkers in patients undergoing regular haemodialysis. Utilizing a validated pipeline for declining homeostatic systems, our approach learns a dynamical model together with an invertible transformation that simplifies the behaviour of observed biomarkers into natural variables. Within the natural variables, we identified two distinct dynamical behaviours: (i) stochastic accumulation, the random accumulation of abnormal values, and (ii) mallostasis, a deterministic drift towards worse health. These behaviours are identified by persistent fluctuations indicating weak stability, or a gradual shift in homeostatic set point, respectively. Both lead to worsening natural variable values, making the natural variables salient survival predictors with preferred directions of increasing risk. When this worsening is transformed back into observable biomarkers, it generates a coherent spectrum of worsening medical signs characteristic of a medical syndrome. Specifically, we found that small modules of natural variables corresponded to two existing syndromes commonly afflicting ESKD patients: protein-energy wasting and sepsis. We also identified new prospective syndromes. Our findings suggest that natural variables are robust, systems-level biomarkers, capturing the complex, holistic changes in health associated with ESKD.
- [2] arXiv:2405.20747 [pdf, ps, html, other]
-
Title: Generalized Inverse Optimal Control and its Application in BiologySubjects: Quantitative Methods (q-bio.QM); Optimization and Control (math.OC)
Living organisms exhibit remarkable adaptations across all scales, from molecules to ecosystems. We believe that many of these adaptations correspond to optimal solutions driven by evolution, training, and underlying physical and chemical laws and constraints. While some argue against such optimality principles due to their potential ambiguity, we propose generalized inverse optimal control to infer them directly from data. This novel approach incorporates multi-criteria optimality, nestedness of objective functions on different scales, the presence of active constraints, the possibility of switches of optimality principles during the observed time horizon, maximization of robustness, and minimization of time as important special cases, as well as uncertainties involved with the mathematical modeling of biological systems. This data-driven approach ensures that optimality principles are not merely theoretical constructs but are firmly rooted in experimental observations. Furthermore, the inferred principles can be used in forward optimal control to predict and manipulate biological systems, with possible applications in bio-medicine, biotechnology, and agriculture. As discussed and illustrated, the well-posed problem formulation and the inference are challenging and require a substantial interdisciplinary effort in the development of theory and robust numerical methods.
New submissions for Monday, 3 June 2024 (showing 2 of 2 entries )
- [3] arXiv:2405.20358 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Medication Recommendation via Dual Molecular Modalities and Multi-Substructure DistillationComments: 14 pages, 9 figuresSubjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Medication recommendation combines patient medical history with biomedical knowledge to assist doctors in determining medication combinations more accurately and safely. Existing approaches based on molecular knowledge overlook the atomic geometric structure of molecules, failing to capture the high-dimensional characteristics and intrinsic physical properties of medications, leading to structural confusion and the inability to extract useful substructures from individual patient visits. To address these limitations, we propose BiMoRec, which overcomes the inherent lack of molecular essential information in 2D molecular structures by incorporating 3D molecular structures and atomic properties. To retain the fast response required of recommendation systems, BiMoRec maximizes the mutual information between the two molecular modalities through bimodal graph contrastive learning, achieving the integration of 2D and 3D molecular graphs, and finally distills substructures through interaction with single patient visits. Specifically, we use deep learning networks to construct a pre-training method to obtain representations of 2D and 3D molecular structures and substructures, and we use contrastive learning to derive mutual information. Subsequently, we generate fused molecular representations through a trained GNN module, re-determining the relevance of substructure representations in conjunction with the patient's clinical history information. Finally, we generate the final medication combination based on the extracted substructure sequences. Our implementation on the MIMIC-III and MIMIC-IV datasets demonstrates that our method achieves state-of-the-art performance. Compared to the next best baseline, our model improves accuracy by 1.8\% while maintaining the same level of DDI as the baseline.
- [4] arXiv:2405.20573 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Enhancing Generative Molecular Design via Uncertainty-guided Fine-tuning of Variational AutoencodersSubjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
In recent years, deep generative models have been successfully adopted for various molecular design tasks, particularly in the life and material sciences. A critical challenge for pre-trained generative molecular design (GMD) models is to fine-tune them to be better suited for downstream design tasks aimed at optimizing specific molecular properties. However, redesigning and training an existing effective generative model from scratch for each new design task is impractical. Furthermore, the black-box nature of typical downstream tasks$\unicode{x2013}$such as property prediction$\unicode{x2013}$makes it nontrivial to optimize the generative model in a task-specific manner. In this work, we propose a novel approach for a model uncertainty-guided fine-tuning of a pre-trained variational autoencoder (VAE)-based GMD model through performance feedback in an active learning setting. The main idea is to quantify model uncertainty in the generative model, which is made efficient by working within a low-dimensional active subspace of the high-dimensional VAE parameters explaining most of the variability in the model's output. The inclusion of model uncertainty expands the space of viable molecules through decoder diversity. We then explore the resulting model uncertainty class via black-box optimization made tractable by low-dimensionality of the active subspace. This enables us to identify and leverage a diverse set of high-performing models to generate enhanced molecules. Empirical results across six target molecular properties, using multiple VAE-based generative models, demonstrate that our uncertainty-guided fine-tuning approach consistently outperforms the original pre-trained models.
- [5] arXiv:2405.20668 (cross-list from q-bio.BM) [pdf, ps, html, other]
-
Title: Improving Paratope and Epitope Prediction by Multi-Modal Contrastive Learning and Interaction Informativeness EstimationComments: This paper is accepted by IJCAI 2024Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Accurately predicting antibody-antigen binding residues, i.e., paratopes and epitopes, is crucial in antibody design. However, existing methods solely focus on uni-modal data (either sequence or structure), disregarding the complementary information present in multi-modal data, and most methods predict paratopes and epitopes separately, overlooking their specific spatial interactions. In this paper, we propose a novel Multi-modal contrastive learning and Interaction informativeness estimation-based method for Paratope and Epitope prediction, named MIPE, by using both sequence and structure data of antibodies and antigens. MIPE implements a multi-modal contrastive learning strategy, which maximizes representations of binding and non-binding residues within each modality and meanwhile aligns uni-modal representations towards effective modal representations. To exploit the spatial interaction information, MIPE also incorporates an interaction informativeness estimation that computes the estimated interaction matrices between antibodies and antigens, thereby approximating them to the actual ones. Extensive experiments demonstrate the superiority of our method compared to baselines. Additionally, the ablation studies and visualizations demonstrate the superiority of MIPE owing to the better representations acquired through multi-modal contrastive learning and the interaction patterns comprehended by the interaction informativeness estimation.
Cross submissions for Monday, 3 June 2024 (showing 3 of 3 entries )
- [6] arXiv:2402.17810 (replaced) [pdf, ps, html, other]
-
Title: BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task TuningQizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui YanComments: Accepted by ACL 2024 (Findings)Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Recent research trends in computational biology have increasingly focused on integrating text and bio-entity modeling, especially in the context of molecules and proteins. However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e.g., IUPAC). This paper introduces BioT5+, an extension of the BioT5 framework, tailored to enhance biological research and drug discovery. BioT5+ incorporates several novel features: integration of IUPAC names for molecular understanding, inclusion of extensive bio-text and molecule data from sources like bioRxiv and PubChem, the multi-task instruction tuning for generality across tasks, and a numerical tokenization technique for improved processing of numerical data. These enhancements allow BioT5+ to bridge the gap between molecular representations and their textual descriptions, providing a more holistic understanding of biological entities, and largely improving the grounded reasoning of bio-text and bio-sequences. The model is pre-trained and fine-tuned with a large number of experiments, including \emph{3 types of problems (classification, regression, generation), 15 kinds of tasks, and 21 total benchmark datasets}, demonstrating the remarkable performance and state-of-the-art results in most cases. BioT5+ stands out for its ability to capture intricate relationships in biological data, thereby contributing significantly to bioinformatics and computational biology. Our code is available at \url{this https URL}.
- [7] arXiv:2402.14991 (replaced) [pdf, ps, other]
-
Title: Quantum Theory and Application of Contextual Optimal TransportNicola Mariella, Albert Akhriev, Francesco Tacchino, Christa Zoufal, Juan Carlos Gonzalez-Espitia, Benedek Harsanyi, Eugene Koskin, Ivano Tavernelli, Stefan Woerner, Marianna Rapsomaniki, Sergiy Zhuk, Jannis BornComments: ICML 2024Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Quantum Algebra (math.QA); Quantitative Methods (q-bio.QM); Quantum Physics (quant-ph)
Optimal Transport (OT) has fueled machine learning (ML) across many domains. When paired data measurements $(\boldsymbol{\mu}, \boldsymbol{\nu})$ are coupled to covariates, a challenging conditional distribution learning setting arises. Existing approaches for learning a $\textit{global}$ transport map parameterized through a potentially unseen context utilize Neural OT and largely rely on Brenier's theorem. Here, we propose a first-of-its-kind quantum computing formulation for amortized optimization of contextualized transportation plans. We exploit a direct link between doubly stochastic matrices and unitary operators thus unravelling a natural connection between OT and quantum computation. We verify our method (QontOT) on synthetic and real data by predicting variations in cell type distributions conditioned on drug dosage. Importantly we conduct a 24-qubit hardware experiment on a task challenging for classical computers and report a performance that cannot be matched with our classical neural OT approach. In sum, this is a first step toward learning to predict contextualized transportation plans through quantum computing.