85,823 research outputs found
DNALinux Virtual Desktop Edition
The new version of DNALinux (VDE) is presented. DNALinux VDE is a departure from traditional distributions since it uses a virtual machine to bundle together the operating system and bioinformatics applications. The main advantage of this approach is that a virtualized environment doesn't affect a installed system. With a virtual machine a Linux system can be run under a Windows system, provided that the virtual machine player is installed. The included programs are listed and specifications to add more programs are explained. We believe that DNALinux could be used as a standardized virtual machine for learning, using, developing and testing bioinformatics applications
Deep learning for supervised classification
One of the most recent area in the Machine Learning research is Deep Learning. Deep Learning algorithms have been applied successfully to computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics. The key idea of Deep Learning is to combine the best techniques from Machine Learning to build powerful general‑purpose learning algorithms. It is a mistake to identify Deep Neural Networks with Deep Learning Algorithms. Other approaches are possible, and in this paper we illustrate a generalization of Stacking which has very competitive performances. In particular, we show an application of this approach to a real classification problem, where a three-stages Stacking has proved to be very effective
An empirical comparison of supervised machine learning techniques in bioinformatics
Research in bioinformatics is driven by the experimental data.
Current biological databases are populated by vast amounts of
experimental data. Machine learning has been widely applied to
bioinformatics and has gained a lot of success in this research
area. At present, with various learning algorithms available in the
literature, researchers are facing difficulties in choosing the best
method that can apply to their data. We performed an empirical
study on 7 individual learning systems and 9 different combined
methods on 4 different biological data sets, and provide some
suggested issues to be considered when answering the following
questions: (i) How does one choose which algorithm is best
suitable for their data set? (ii) Are combined methods better than
a single approach? (iii) How does one compare the effectiveness
of a particular algorithm to the others
Recommended from our members
Multi-class protein fold classification using a new ensemble machine learning approach.
Protein structure classification represents an important process in understanding the associations
between sequence and structure as well as possible functional and evolutionary relationships.
Recent structural genomics initiatives and other high-throughput experiments have populated the
biological databases at a rapid pace. The amount of structural data has made traditional methods
such as manual inspection of the protein structure become impossible. Machine learning has been
widely applied to bioinformatics and has gained a lot of success in this research area. This work
proposes a novel ensemble machine learning method that improves the coverage of the classifiers
under the multi-class imbalanced sample sets by integrating knowledge induced from different base
classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have
compared our approach with PART and show that our method improves the sensitivity of the
classifier in protein fold classification. Furthermore, we have extended this method to learning over
multiple data types, preserving the independence of their corresponding data sources, and show
that our new approach performs at least as well as the traditional technique over a single joined
data source. These experimental results are encouraging, and can be applied to other bioinformatics
problems similarly characterised by multi-class imbalanced data sets held in multiple data
sources
Deep Learning for Metagenomic Data: using 2D Embeddings and Convolutional Neural Networks
Deep learning (DL) techniques have had unprecedented success when applied to
images, waveforms, and texts to cite a few. In general, when the sample size
(N) is much greater than the number of features (d), DL outperforms previous
machine learning (ML) techniques, often through the use of convolution neural
networks (CNNs). However, in many bioinformatics ML tasks, we encounter the
opposite situation where d is greater than N. In these situations, applying DL
techniques (such as feed-forward networks) would lead to severe overfitting.
Thus, sparse ML techniques (such as LASSO e.g.) usually yield the best results
on these tasks. In this paper, we show how to apply CNNs on data which do not
have originally an image structure (in particular on metagenomic data). Our
first contribution is to show how to map metagenomic data in a meaningful way
to 1D or 2D images. Based on this representation, we then apply a CNN, with the
aim of predicting various diseases. The proposed approach is applied on six
different datasets including in total over 1000 samples from various diseases.
This approach could be a promising one for prediction tasks in the
bioinformatics field.Comment: Accepted at NIPS 2017 Workshop on Machine Learning for Health
(https://ml4health.github.io/2017/); In Proceedings of the NIPS ML4H 2017
Workshop in Long Beach, CA, USA
Bioinformatics: a knowledge engineering approach
The paper introduces the knowledge engineering (KE) approach for the modeling and the discovery of new knowledge in bioinformatics. This approach extends the machine learning approach with various rule extraction and other knowledge representation procedures. Examples of the KE approach, and especially of one of the recently developed techniques - evolving connectionist systems (ECOS), to challenging problems in bioinformatics are given, that include: DNA sequence analysis, microarray gene expression profiling, protein structure prediction, finding gene regulatory networks, medical prognostic systems, computational neurogenetic modeling
Learning what to read: Focused machine reading
Recent efforts in bioinformatics have achieved tremendous progress in the
machine reading of biomedical literature, and the assembly of the extracted
biochemical interactions into large-scale models such as protein signaling
pathways. However, batch machine reading of literature at today's scale (PubMed
alone indexes over 1 million papers per year) is unfeasible due to both cost
and processing overhead. In this work, we introduce a focused reading approach
to guide the machine reading of biomedical literature towards what literature
should be read to answer a biomedical query as efficiently as possible. We
introduce a family of algorithms for focused reading, including an intuitive,
strong baseline, and a second approach which uses a reinforcement learning (RL)
framework that learns when to explore (widen the search) or exploit (narrow
it). We demonstrate that the RL approach is capable of answering more queries
than the baseline, while being more efficient, i.e., reading fewer documents.Comment: 6 pages, 1 figure, 1 algorithm, 2 tables, accepted to EMNLP 201
Analysis of Stock Market using Machine Learning
Machine Learning is a prominent area of research that emphasizes on finding patterns in existential data. The field of Machine Learning, can be concisely described as enabling computers to make productive predictions using previous experiences. As there is a large amount of information being available everywhere, it is very important to analyze this data in order to extract some useful information and thus developing algorithms based on this analysis. This can hence be done through data mining and Machine Learning. In addition to many other fields, Machine Learning models have broad applications in the field Bioinformatics. The complexity involved in biological analysis has led to the development of experienced Machine Learning methods. This research paper discusses the importance of a data-driven approach, compared to the formalization of traditional Artificial Intelligence and also primarily focuses on a key approach to forecast company's workflow using Machine learning
Analysis of Stock Market using Machine Learning
Machine Learning is a prominent area of research that emphasizes on finding patterns in existential data. The field of Machine Learning, can be concisely described as enabling computers to make productive predictions using previous experiences. As there is a large amount of information being available everywhere, it is very important to analyze this data in order to extract some useful information and thus developing algorithms based on this analysis. This can hence be done through data mining and Machine Learning. In addition to many other fields, Machine Learning models have broad applications in the field Bioinformatics. The complexity involved in biological analysis has led to the development of experienced Machine Learning methods. This research paper discusses the importance of a data-driven approach, compared to the formalization of traditional Artificial Intelligence and also primarily focuses on a key approach to forecast company's workflow using Machine learning
- …