97,117 research outputs found
Automated Protein Structure Classification: A Survey
Classification of proteins based on their structure provides a valuable
resource for studying protein structure, function and evolutionary
relationships. With the rapidly increasing number of known protein structures,
manual and semi-automatic classification is becoming ever more difficult and
prohibitively slow. Therefore, there is a growing need for automated, accurate
and efficient classification methods to generate classification databases or
increase the speed and accuracy of semi-automatic techniques. Recognizing this
need, several automated classification methods have been developed. In this
survey, we overview recent developments in this area. We classify different
methods based on their characteristics and compare their methodology, accuracy
and efficiency. We then present a few open problems and explain future
directions.Comment: 14 pages, Technical Report CSRG-589, University of Toront
Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar
Automatic machine learning is an important problem in the forefront of
machine learning. The strongest AutoML systems are based on neural networks,
evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached
state-of-the-art results with an order of magnitude speedup using reinforcement
learning with self-play. In this work we extend AlphaD3M by using a pipeline
grammar and a pre-trained model which generalizes from many different datasets
and similar tasks. Our results demonstrate improved performance compared with
our earlier work and existing methods on AutoML benchmark datasets for
classification and regression tasks. In the spirit of reproducible research we
make our data, models, and code publicly available.Comment: ICML Workshop on Automated Machine Learnin
Feature Learning for Multispectral Satellite Imagery Classification Using Neural Architecture Search
Automated classification of remote sensing data is an integral tool for earth scientists, and deep learning has proven very successful at solving such problems. However, building deep learning models to process the data requires expert knowledge of machine learning. We introduce DELTA, a software toolkit to bridge this technical gap and make deep learning easily accessible to earth scientists. Visual feature engineering is a critical part of the machine learning lifecycle, and hence is a key area that will be automated by DELTA. Hand-engineered features can perform well, but require a cross functional team with expertise in both machine learning and the specific problem domain, which is costly in both researcher time and labor. The problem is more acute with multispectral satellite imagery, which requires considerable computational resources to process. In order to automate the feature learning process, a neural architecture search samples the space of asymmetric and symmetric autoencoders using evolutionary algorithms. Since denoising autoencoders have been shown to perform well for feature learning, the autoencoders are trained on various levels of noise and the features generated by the best performing autoencoders evaluated according to their performance on image classification tasks. The resulting features are demonstrated to be effective for Landsat-8 flood mapping, as well as benchmark datasets CIFAR10 and SVHN
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
As the field of data science continues to grow, there will be an
ever-increasing demand for tools that make machine learning accessible to
non-experts. In this paper, we introduce the concept of tree-based pipeline
optimization for automating one of the most tedious parts of machine
learning---pipeline design. We implement an open source Tree-based Pipeline
Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a
series of simulated and real-world benchmark data sets. In particular, we show
that TPOT can design machine learning pipelines that provide a significant
improvement over a basic machine learning analysis while requiring little to no
input nor prior knowledge from the user. We also address the tendency for TPOT
to design overly complex pipelines by integrating Pareto optimization, which
produces compact pipelines without sacrificing classification accuracy. As
such, this work represents an important step toward fully automating machine
learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet
made from reviewer comment
SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures.
Structural Classification of Proteins-extended (SCOPe, http://scop.berkeley.edu) is a database of protein structural relationships that extends the SCOP database. SCOP is a manually curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. Development of the SCOP 1.x series concluded with SCOP 1.75. The ASTRAL compendium provides several databases and tools to aid in the analysis of the protein structures classified in SCOP, particularly through the use of their sequences. SCOPe extends version 1.75 of the SCOP database, using automated curation methods to classify many structures released since SCOP 1.75. We have rigorously benchmarked our automated methods to ensure that they are as accurate as manual curation, though there are many proteins to which our methods cannot be applied. SCOPe is also partially manually curated to correct some errors in SCOP. SCOPe aims to be backward compatible with SCOP, providing the same parseable files and a history of changes between all stable SCOP and SCOPe releases. SCOPe also incorporates and updates the ASTRAL database. The latest release of SCOPe, 2.03, contains 59 514 Protein Data Bank (PDB) entries, increasing the number of structures classified in SCOP by 55% and including more than 65% of the protein structures in the PDB
Components of Soft Computing for Epileptic Seizure Prediction and Detection
Components of soft computing include machine learning, fuzzy logic, evolutionary computation, and probabilistic theory. These components have the cognitive ability to learn effectively. They deal with imprecision and good tolerance of uncertainty. Components of soft computing are needed for developing automated expert systems. These systems reduce human interventions so as to complete a task essentially. Automated expert systems are developed in order to perform difficult jobs. The systems have been trained and tested using soft computing techniques. These systems are required in all kinds of fields and are especially very useful in medical diagnosis. This chapter describes the components of soft computing and review of some analyses regarding EEG signal classification. From those analyses, this chapter concludes that a number of features extracted are very important and relevant features for classifier can give better accuracy of classification. The classifier with a suitable learning method can perform well for automated epileptic seizure detection systems. Further, the decomposition of EEG signal at level 4 is sufficient for seizure detection
Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes
In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.NEH Office of Digital Humanities: Digital Humanities Start-Up Grant (HD-51166-10
- …