18,487 research outputs found
Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis
Flow cytometry is often used to characterize the malignant cells in leukemia
and lymphoma patients, traced to the level of the individual cell. Typically,
flow cytometric data analysis is performed through a series of 2-dimensional
projections onto the axes of the data set. Through the years, clinicians have
determined combinations of different fluorescent markers which generate
relatively known expression patterns for specific subtypes of leukemia and
lymphoma -- cancers of the hematopoietic system. By only viewing a series of
2-dimensional projections, the high-dimensional nature of the data is rarely
exploited. In this paper we present a means of determining a low-dimensional
projection which maintains the high-dimensional relationships (i.e.
information) between differing oncological data sets. By using machine learning
techniques, we allow clinicians to visualize data in a low dimension defined by
a linear combination of all of the available markers, rather than just 2 at a
time. This provides an aid in diagnosing similar forms of cancer, as well as a
means for variable selection in exploratory flow cytometric research. We refer
to our method as Information Preserving Component Analysis (IPCA).Comment: 26 page
Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data
Background: High-throughput proteomics techniques, such as mass spectrometry
(MS)-based approaches, produce very high-dimensional data-sets. In a clinical
setting one is often interested in how mass spectra differ between patients of
different classes, for example spectra from healthy patients vs. spectra from
patients having a particular disease. Machine learning algorithms are needed to
(a) identify these discriminating features and (b) classify unknown spectra
based on this feature set. Since the acquired data is usually noisy, the
algorithms should be robust against noise and outliers, while the identified
feature set should be as small as possible.
Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based
on the theory of compressed sensing that allows us to identify a minimal
discriminating set of features from mass spectrometry data-sets. We show (1)
how our method performs on artificial and real-world data-sets, (2) that its
performance is competitive with standard (and widely used) algorithms for
analyzing proteomics data, and (3) that it is robust against random and
systematic noise. We further demonstrate the applicability of our algorithm to
two previously published clinical data-sets
An intelligent fault diagnosis method using variable weight artificial immune recognizers (V-AIR)
The Artificial Immune Recognition System (AIRS), which has been proved to be a successful classification method in the field of Artificial Immune Systems, has been used in many classification problems and gained good classification effect. However, the network inhibition mechanisms used in these methods are based on the threshold inhibition and the cells with low affinity will be deleted directly from the network, which will misrepresent the key features of the data set for not considering the density information within the data. In this paper, we utilize the concept of data potential field and propose a new weight optimizing network inhibition algorithm called variable weight artificial immune recognizer (V-AIR) where we replace the network inhibiting mechanism based on affinity with the inhibiting mechanism based on weight optimizing. The concept of data potential field was also used to describe the data distribution around training samples and the pattern of a training data belongs to the class with the largest potential field. At last, we used this algorithm to rolling bearing analog fault diagnosis and reciprocating compressor valves fault diagnosis, which get a good classification effect
Exploring the potential of 3D Zernike descriptors and SVM for protein\u2013protein interface prediction
Abstract Background The correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. Results In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). Conclusions The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class
An automated pattern recognition system for the quantification of inflammatory cells in hepatitis-C-infected liver biopsies
This paper presents an automated system for the quantification of inflammatory cells in hepatitis-C-infected liver biopsies. Initially, features are extracted from colour-corrected biopsy images at positions of interest identified by adaptive thresholding and clump decomposition. A sequential floating search method and principal component analysis are used to reduce dimensionality. Manually annotated training images allow supervised training. The performance of Gaussian parametric and mixture models is compared when used to classify regions as either inflammatory or healthy. The system is optimized using a response surface method that maximises the area under the receiver operating characteristic curve. This system is then tested on images previously ranked by a number of observers with varying levels of expertise. These results are compared to the automated system using Spearman rank correlation. Results show that this system can rank 15 test images, with varying degrees of inflammation, in strong agreement with five expert pathologists
- …