192,600 research outputs found
Anderson Acceleration For Bioinformatics-Based Machine Learning
Anderson acceleration (AA) is a well-known method for accelerating the
convergence of iterative algorithms, with applications in various fields
including deep learning and optimization. Despite its popularity in these
areas, the effectiveness of AA in classical machine learning classifiers has
not been thoroughly studied. Tabular data, in particular, presents a unique
challenge for deep learning models, and classical machine learning models are
known to perform better in these scenarios. However, the convergence analysis
of these models has received limited attention. To address this gap in
research, we implement a support vector machine (SVM) classifier variant that
incorporates AA to speed up convergence. We evaluate the performance of our SVM
with and without Anderson acceleration on several datasets from the biology
domain and demonstrate that the use of AA significantly improves convergence
and reduces the training loss as the number of iterations increases. Our
findings provide a promising perspective on the potential of Anderson
acceleration in the training of simple machine learning classifiers and
underscore the importance of further research in this area. By showing the
effectiveness of AA in this setting, we aim to inspire more studies that
explore the applications of AA in classical machine learning.Comment: Accepted in KDH-2023: Knowledge Discovery in Healthcare Data (IJCAI
Workshop
Distributed Machine Learning via Sufficient Factor Broadcasting
Matrix-parametrized models, including multiclass logistic regression and
sparse coding, are used in machine learning (ML) applications ranging from
computer vision to computational biology. When these models are applied to
large-scale ML problems starting at millions of samples and tens of thousands
of classes, their parameter matrix can grow at an unexpected rate, resulting in
high parameter synchronization costs that greatly slow down distributed
learning. To address this issue, we propose a Sufficient Factor Broadcasting
(SFB) computation model for efficient distributed learning of a large family of
matrix-parameterized models, which share the following property: the parameter
update computed on each data sample is a rank-1 matrix, i.e., the outer product
of two "sufficient factors" (SFs). By broadcasting the SFs among worker
machines and reconstructing the update matrices locally at each worker, SFB
improves communication efficiency --- communication costs are linear in the
parameter matrix's dimensions, rather than quadratic --- without affecting
computational correctness. We present a theoretical convergence analysis of
SFB, and empirically corroborate its efficiency on four different
matrix-parametrized ML models
DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models
Understanding how proteins structurally interact is crucial to modern
biology, with applications in drug discovery and protein design. Recent machine
learning methods have formulated protein-small molecule docking as a generative
problem with significant performance boosts over both traditional and deep
learning baselines. In this work, we propose a similar approach for rigid
protein-protein docking: DiffDock-PP is a diffusion generative model that
learns to translate and rotate unbound protein structures into their bound
conformations. We achieve state-of-the-art performance on DIPS with a median
C-RMSD of 4.85, outperforming all considered baselines. Additionally,
DiffDock-PP is faster than all search-based methods and generates reliable
confidence estimates for its predictions. Our code is publicly available at
Comment: ICLR Machine Learning for Drug Discovery (MLDD) Workshop 202
LearnFCA: A Fuzzy FCA and Probability Based Approach for Learning and Classification
Formal concept analysis(FCA) is a mathematical theory based on lattice and order theory used for data analysis and knowledge representation. Over the past several years, many of its extensions have been proposed and applied in several domains including data mining, machine learning, knowledge management, semantic web, software development, chemistry ,biology, medicine, data analytics, biology and ontology engineering.
This thesis reviews the state-of-the-art of theory of Formal Concept Analysis(FCA) and its various extensions that have been developed and well-studied in the past several years. We discuss their historical roots, reproduce the original definitions and derivations with illustrative examples. Further, we provide a literature review of itās applications and various approaches adopted by researchers in the areas of dataanalysis, knowledge management with emphasis to data-learning and classification problems.
We propose LearnFCA, a novel approach based on FuzzyFCA and probability theory for learning and classification problems. LearnFCA uses an enhanced version of FuzzyLattice which has been developed to store class labels and probability vectors and has the capability to be used for classifying instances with encoded and unlabelled features. We evaluate LearnFCA on encodings from three datasets - mnist, omniglot and cancer images with interesting results and varying degrees of success.
Adviser: Dr Jitender Deogu
LEARNFCA: A FUZZY FCA AND PROBABILITY BASED APPROACH FOR LEARNING AND CLASSIFICATION
Formal concept analysis(FCA) is a mathematical theory based on lattice and order theory used for data analysis and knowledge representation. Over the past several years, many of its extensions have been proposed and applied in several domains including data mining, machine learning, knowledge management, semantic web, software development, chemistry ,biology, medicine, data analytics, biology and ontology engineering.
This thesis reviews the state-of-the-art of theory of Formal Concept Analysis(FCA) and its various extensions that have been developed and well-studied in the past several years. We discuss their historical roots, reproduce the original definitions and derivations with illustrative examples. Further, we provide a literature review of itās applications and various approaches adopted by researchers in the areas of dataanalysis, knowledge management with emphasis to data-learning and classification problems.
We propose LearnFCA, a novel approach based on FuzzyFCA and probability theory for learning and classification problems. LearnFCA uses an enhanced version of FuzzyLattice which has been developed to store class labels and probability vectors and has the capability to be used for classifying instances with encoded and unlabelled features. We evaluate LearnFCA on encodings from three datasets - mnist, omniglot and cancer images with interesting results and varying degrees of success.
Adviser: Jitender Deogu
Understanding Huntington\u27s disease using Machine Learning Approaches
Huntingtonās disease (HD) is a debilitating neurodegenerative disorder with a complex pathophysiology. Despite extensive studies to study the disease, the sequence of events through which mutant Huntingtin (mHtt) protein executes its action still remains elusive. The phenotype of HD is an outcome of numerous processes initiated by the mHtt protein along with other proteins that act as either suppressors or enhancers of the effects of mHtt protein and PolyQ aggregates. Utilizing an integrative systems biology approach, I construct and analyze a Huntingtonās disease integrome using human orthologs of protein interactors of wild type and mHtt protein. Analysis of this integrome using unsupervised machine learning methods reveals a novel connection linking mHtt protein with chromosome condensation and DNA repair. I generate a list of candidate genes that upon validation in a yeast and drosophila model of HD are shown to affect the mHtt phenotype and provide an in-vivo evidence of our hypothesis. A separate supervised machine learning approach is applied to build a classifier model that predicts protein interactors of wild type and mHtt protein. Both the machine learning models that I employ, have important applications for Huntingtonās disease in predicting both protein and genetic interactions of huntingtin protein and can be easily extended to other PolyQ and neurodegenerative disorders such as Alzheimerās and Parkinsonās disease
- ā¦