226 research outputs found
Building multiclass classifiers for remote homology detection and fold recognition
BACKGROUND: Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. RESULTS: We present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes. CONCLUSION: Analyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results
Recommended from our members
Junior staffing changes and the temporal ecology of adverse incidents in acute psychiatric wards
Aim.  This paper reports an examination of the relationship between adverse incident rates, the arrival of new junior staff on wards, and days of the week on acute psychiatric wards.
Background.  Incidents of violence, absconding and self-harm in acute inpatient services pose risks to patients and staff. Previous research suggests that the arrival of inexperienced new staff may trigger more adverse incidents. Findings on the relationship between incidents and the weekly routine are inconsistent.
Method.  A retrospective analysis was conducted of formally reported incident rates, records of nursing student allocations and junior doctor rotation patterns, using Poisson Regression. Variance between days of the week was explored using contingency table analysis. The data covered 30 months on 17 psychiatric wards, and were collected in 2002–2004.
Findings.  The arrival of new and inexperienced staff on the wards was not associated with increases in adverse incident rates. Most types of incidents were less frequent at weekends and midweek. Incident rates were unchanged on ward-round days, but increased rates were found on the days before and after ward rounds.
Conclusion.  Increased patient tension is associated with raised incident rates. It may be possible to reduce incident rates by moderating stimulation in the environment and by mobilizing support for patients during critical periods
Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification
With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes
- …