97,117 research outputs found

    Automated Protein Structure Classification: A Survey

    Full text link
    Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.Comment: 14 pages, Technical Report CSRG-589, University of Toront

    Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar

    Get PDF
    Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre-trained model which generalizes from many different datasets and similar tasks. Our results demonstrate improved performance compared with our earlier work and existing methods on AutoML benchmark datasets for classification and regression tasks. In the spirit of reproducible research we make our data, models, and code publicly available.Comment: ICML Workshop on Automated Machine Learnin

    Feature Learning for Multispectral Satellite Imagery Classification Using Neural Architecture Search

    Get PDF
    Automated classification of remote sensing data is an integral tool for earth scientists, and deep learning has proven very successful at solving such problems. However, building deep learning models to process the data requires expert knowledge of machine learning. We introduce DELTA, a software toolkit to bridge this technical gap and make deep learning easily accessible to earth scientists. Visual feature engineering is a critical part of the machine learning lifecycle, and hence is a key area that will be automated by DELTA. Hand-engineered features can perform well, but require a cross functional team with expertise in both machine learning and the specific problem domain, which is costly in both researcher time and labor. The problem is more acute with multispectral satellite imagery, which requires considerable computational resources to process. In order to automate the feature learning process, a neural architecture search samples the space of asymmetric and symmetric autoencoders using evolutionary algorithms. Since denoising autoencoders have been shown to perform well for feature learning, the autoencoders are trained on various levels of noise and the features generated by the best performing autoencoders evaluated according to their performance on image classification tasks. The resulting features are demonstrated to be effective for Landsat-8 flood mapping, as well as benchmark datasets CIFAR10 and SVHN

    Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

    Full text link
    As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

    SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures.

    Get PDF
    Structural Classification of Proteins-extended (SCOPe, http://scop.berkeley.edu) is a database of protein structural relationships that extends the SCOP database. SCOP is a manually curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. Development of the SCOP 1.x series concluded with SCOP 1.75. The ASTRAL compendium provides several databases and tools to aid in the analysis of the protein structures classified in SCOP, particularly through the use of their sequences. SCOPe extends version 1.75 of the SCOP database, using automated curation methods to classify many structures released since SCOP 1.75. We have rigorously benchmarked our automated methods to ensure that they are as accurate as manual curation, though there are many proteins to which our methods cannot be applied. SCOPe is also partially manually curated to correct some errors in SCOP. SCOPe aims to be backward compatible with SCOP, providing the same parseable files and a history of changes between all stable SCOP and SCOPe releases. SCOPe also incorporates and updates the ASTRAL database. The latest release of SCOPe, 2.03, contains 59 514 Protein Data Bank (PDB) entries, increasing the number of structures classified in SCOP by 55% and including more than 65% of the protein structures in the PDB

    Components of Soft Computing for Epileptic Seizure Prediction and Detection

    Get PDF
    Components of soft computing include machine learning, fuzzy logic, evolutionary computation, and probabilistic theory. These components have the cognitive ability to learn effectively. They deal with imprecision and good tolerance of uncertainty. Components of soft computing are needed for developing automated expert systems. These systems reduce human interventions so as to complete a task essentially. Automated expert systems are developed in order to perform difficult jobs. The systems have been trained and tested using soft computing techniques. These systems are required in all kinds of fields and are especially very useful in medical diagnosis. This chapter describes the components of soft computing and review of some analyses regarding EEG signal classification. From those analyses, this chapter concludes that a number of features extracted are very important and relevant features for classifier can give better accuracy of classification. The classifier with a suitable learning method can perform well for automated epileptic seizure detection systems. Further, the decomposition of EEG signal at level 4 is sufficient for seizure detection

    Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes

    Get PDF
    In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.NEH Office of Digital Humanities: Digital Humanities Start-Up Grant (HD-51166-10
    • …
    corecore