Search CORE

192,600 research outputs found

Machine Learning and Its Applications to Biology

Author: Adi L Tarca
Fran Lewitter
Roberto Romero
Sorin Drăghici
Vincent J Carey
Xue-wen Chen
Publication venue: Public Library of Science
Publication date: 01/06/2007
Field of study

Crossref

Directory of Open Access Journals

KU ScholarWorks

PubMed Central

Anderson Acceleration For Bioinformatics-Based Machine Learning

Author: Ali Sarwan
Chourasia Prakash
Patterson Murray
Publication venue
Publication date: 24/08/2023
Field of study

Anderson acceleration (AA) is a well-known method for accelerating the convergence of iterative algorithms, with applications in various fields including deep learning and optimization. Despite its popularity in these areas, the effectiveness of AA in classical machine learning classifiers has not been thoroughly studied. Tabular data, in particular, presents a unique challenge for deep learning models, and classical machine learning models are known to perform better in these scenarios. However, the convergence analysis of these models has received limited attention. To address this gap in research, we implement a support vector machine (SVM) classifier variant that incorporates AA to speed up convergence. We evaluate the performance of our SVM with and without Anderson acceleration on several datasets from the biology domain and demonstrate that the use of AA significantly improves convergence and reduces the training loss as the number of iterations increases. Our findings provide a promising perspective on the potential of Anderson acceleration in the training of simple machine learning classifiers and underscore the importance of further research in this area. By showing the effectiveness of AA in this setting, we aim to inspire more studies that explore the applications of AA in classical machine learning.Comment: Accepted in KDH-2023: Knowledge Discovery in Healthcare Data (IJCAI Workshop

arXiv.org e-Print Archive

Distributed Machine Learning via Sufficient Factor Broadcasting

Author: Ho Qirong
Kim Jin Kyu
Kumar Abhimanu
Xie Pengtao
Xing Eric
Yu Yaoliang
Zhou Yi
Publication venue
Publication date: 07/09/2015
Field of study

Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parameter synchronization costs that greatly slow down distributed learning. To address this issue, we propose a Sufficient Factor Broadcasting (SFB) computation model for efficient distributed learning of a large family of matrix-parameterized models, which share the following property: the parameter update computed on each data sample is a rank-1 matrix, i.e., the outer product of two "sufficient factors" (SFs). By broadcasting the SFs among worker machines and reconstructing the update matrices locally at each worker, SFB improves communication efficiency --- communication costs are linear in the parameter matrix's dimensions, rather than quadratic --- without affecting computational correctness. We present a theoretical convergence analysis of SFB, and empirically corroborate its efficiency on four different matrix-parametrized ML models

arXiv.org e-Print Archive

CiteSeerX

DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models

Author: Barzilay Regina
Corso Gabriele
Jaakkola Tommi S.
Ketata Mohamed Amine
Laue Cedrik
Mammadov Ruslan
Marquet Céline
Stärk Hannes
Wu Menghua
Publication venue
Publication date: 07/04/2023
Field of study

Understanding how proteins structurally interact is crucial to modern biology, with applications in drug discovery and protein design. Recent machine learning methods have formulated protein-small molecule docking as a generative problem with significant performance boosts over both traditional and deep learning baselines. In this work, we propose a similar approach for rigid protein-protein docking: DiffDock-PP is a diffusion generative model that learns to translate and rotate unbound protein structures into their bound conformations. We achieve state-of-the-art performance on DIPS with a median C-RMSD of 4.85, outperforming all considered baselines. Additionally, DiffDock-PP is faster than all search-based methods and generates reliable confidence estimates for its predictions. Our code is publicly available at

\texttt{https://github.com/ketatam/DiffDock-PP}

Comment: ICLR Machine Learning for Drug Discovery (MLDD) Workshop 202

arXiv.org e-Print Archive

LearnFCA: A Fuzzy FCA and Probability Based Approach for Learning and Classification

Author: Samal Suraj Ketan
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 22/08/2019
Field of study

Formal concept analysis(FCA) is a mathematical theory based on lattice and order theory used for data analysis and knowledge representation. Over the past several years, many of its extensions have been proposed and applied in several domains including data mining, machine learning, knowledge management, semantic web, software development, chemistry ,biology, medicine, data analytics, biology and ontology engineering. This thesis reviews the state-of-the-art of theory of Formal Concept Analysis(FCA) and its various extensions that have been developed and well-studied in the past several years. We discuss their historical roots, reproduce the original definitions and derivations with illustrative examples. Further, we provide a literature review of it’s applications and various approaches adopted by researchers in the areas of dataanalysis, knowledge management with emphasis to data-learning and classification problems. We propose LearnFCA, a novel approach based on FuzzyFCA and probability theory for learning and classification problems. LearnFCA uses an enhanced version of FuzzyLattice which has been developed to store class labels and probability vectors and has the capability to be used for classifying instances with encoded and unlabelled features. We evaluate LearnFCA on encodings from three datasets - mnist, omniglot and cancer images with interesting results and varying degrees of success. Adviser: Dr Jitender Deogu

DigitalCommons@University of Nebraska

LEARNFCA: A FUZZY FCA AND PROBABILITY BASED APPROACH FOR LEARNING AND CLASSIFICATION

Author: Samal Suraj Ketan
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/12/2022
Field of study

DigitalCommons@University of Nebraska

Understanding Huntington\u27s disease using Machine Learning Approaches

Author: Lokhande Sonali
Publication venue: Scholarship @ Claremont
Publication date: 15/12/2017
Field of study

Huntington’s disease (HD) is a debilitating neurodegenerative disorder with a complex pathophysiology. Despite extensive studies to study the disease, the sequence of events through which mutant Huntingtin (mHtt) protein executes its action still remains elusive. The phenotype of HD is an outcome of numerous processes initiated by the mHtt protein along with other proteins that act as either suppressors or enhancers of the effects of mHtt protein and PolyQ aggregates. Utilizing an integrative systems biology approach, I construct and analyze a Huntington’s disease integrome using human orthologs of protein interactors of wild type and mHtt protein. Analysis of this integrome using unsupervised machine learning methods reveals a novel connection linking mHtt protein with chromosome condensation and DNA repair. I generate a list of candidate genes that upon validation in a yeast and drosophila model of HD are shown to affect the mHtt phenotype and provide an in-vivo evidence of our hypothesis. A separate supervised machine learning approach is applied to build a classifier model that predicts protein interactors of wild type and mHtt protein. Both the machine learning models that I employ, have important applications for Huntington’s disease in predicting both protein and genetic interactions of huntingtin protein and can be easily extended to other PolyQ and neurodegenerative disorders such as Alzheimer’s and Parkinson’s disease

Scholarship@Claremont