Search CORE

34,970 research outputs found

Recommended from our members

Multi-class protein fold classification using a new ensemble machine learning approach.

Author: Deville Y
Gilbert D
Tan A
Publication venue: GIW
Publication date: 01/01/2003
Field of study

Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. Recent structural genomics initiatives and other high-throughput experiments have populated the biological databases at a rapid pace. The amount of structural data has made traditional methods such as manual inspection of the protein structure become impossible. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. This work proposes a novel ensemble machine learning method that improves the coverage of the classifiers under the multi-class imbalanced sample sets by integrating knowledge induced from different base classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have compared our approach with PART and show that our method improves the sensitivity of the classifier in protein fold classification. Furthermore, we have extended this method to learning over multiple data types, preserving the independence of their corresponding data sources, and show that our new approach performs at least as well as the traditional technique over a single joined data source. These experimental results are encouraging, and can be applied to other bioinformatics problems similarly characterised by multi-class imbalanced data sets held in multiple data sources

Brunel University Research Archive

Recommended from our members

An Overview of the Use of Neural Networks for Data Mining Tasks

Author: Alberts B
Alpaydin E
Ando T
Blake CL
Bramer MA
Castanheira LG
Han J
Lu H
Mitchell M
Ni X
Quinlan RJ
Rumelhart DE
Shafer JC
Shendure J
Simić D
Stahl F
Steinwart I
Surjandari I
Wei JS
Widrow B
Witten IH
Zaslavsky B
Zhang D
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

Central Archive at the University of Reading

Crossref

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online

Infectious Disease Ontology

Technological developments have resulted in tremendous increases in the volume and diversity of the data and information that must be processed in the course of biomedical and clinical research and practice. Researchers are at the same time under ever greater pressure to share data and to take steps to ensure that data resources are interoperable. The use of ontologies to annotate data has proven successful in supporting these goals and in providing new possibilities for the automated processing of data and information. In this chapter, we describe different types of vocabulary resources and emphasize those features of formal ontologies that make them most useful for computational applications. We describe current uses of ontologies and discuss future goals for ontology-based computing, focusing on its use in the field of infectious diseases. We review the largest and most widely used vocabulary resources relevant to the study of infectious diseases and conclude with a description of the Infectious Disease Ontology (IDO) suite of interoperable ontology modules that together cover the entire infectious disease domain

PhilPapers

CiteSeerX

Crossref

Visual and computational analysis of structure-activity relationships in high-throughput screening data

Author: Agrafiotis
Agrafiotis
Ahlberg
Ajay
Ajay
Bayada
Bemis
Bernard
Bonabeau
Brown
Calvert
Card
Chen
Chen
Cho
Christianini
Clark
Clark
Cox
Duda
Edwards
Engels
Frimurer
Gao
Garrido
Ghose
Gillet
Gillet
Hand
Hann
Haupts
Hayward
Hertzberg
Izrailev
Jiang
Jones-Hertzog
Kirew
Kobayashi
Kohonen
Ladd
Lee
Lepre
Martin
Mason
Mello
Meyer
Miller
Mitchell
Oprea
Peter Gedeck
Peter Willett
Poroikov
Rhodes
Roberts
Roberts
Ros
Rusinko
Sadowski
Sadowski
Scherf
Sheridan
Shi
Stanton
Su
Teague
Thompson
Tropsha
Tufte
Tufte
Wagener
Walters
Wang
Wedin
Xie
Xu
Zupan
Publication venue: 'Elsevier BV'
Publication date: 01/08/2001
Field of study

Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

Crossref

White Rose Research Online

PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications

Author: Adams
Aloy
Apweiler
Bairoch
Brenner
Fariselli
Fariselli
Finkelstein
Fischer
Hamodrakas
Haykin
Hobohm
Jones
Koehl
Levitt
Minsky
Murzin
Murzin
Murzin
Möeller
Nielsen
Pasquier
Pasquier
Pasquier
Pedersen
Petersen
Rice
Rost
Rost
Rumelhart
Sanchez
Schulz
Stevens
Vlahou
Wallin
Publication venue: 'Wiley'
Publication date: 01/01/2001
Field of study

A cascading system of hierarchical, artificial neural networks (named PRED-CLASS) is presented for the generalized classification of proteins into four distinct classes-transmembrane, fibrous, globular, and mixed-from information solely encoded in their amino acid sequences. The architecture of the individual component networks is kept very simple, reducing the number of free parameters (network synaptic weights) for faster training, improved generalization, and the avoidance of data overfitting. Capturing information from as few as 50 protein sequences spread among the four target classes (6 transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able to obtain 371 correct predictions out of a set of 387 proteins (success rate approximately 96%) unambiguously assigned into one of the target classes. The application of PRED-CLASS to several test sets and complete proteomes of several organisms demonstrates that such a method could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods. Detailed results obtained for various data sets and completed genomes, along with a web sever running the PRED-CLASS algorithm, can be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLAS

arXiv.org e-Print Archive

Crossref

Towards a proteomics meta-classification

Author: Kumar Anand
Smith Barry
Publication venue
Publication date: 01/01/2004
Field of study

that can serve as a foundation for more refined ontologies in the field of proteomics. Standard data sources classify proteins in terms of just one or two specific aspects. Thus SCOP (Structural Classification of Proteins) is described as classifying proteins on the basis of structural features; SWISSPROT annotates proteins on the basis of their structure and of parameters like post-translational modifications. Such data sources are connected to each other by pairwise term-to-term mappings. However, there are obstacles which stand in the way of combining them together to form a robust meta-classification of the needed sort. We discuss some formal ontological principles which should be taken into account within the existing datasources in order to make such a metaclassification possible, taking into account also the Gene Ontology (GO) and its application to the annotation of proteins

PhilPapers

CiteSeerX