19 research outputs found

    Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

    Get PDF
    <b>Background</b> The widely used k top scoring pair (k-TSP) algorithm is a simple yet powerful parameter-free classifier. It owes its success in many cancer microarray datasets to an effective feature selection algorithm that is based on relative expression ordering of gene pairs. However, its general robustness does not extend to some difficult datasets, such as those involving cancer outcome prediction, which may be due to the relatively simple voting scheme used by the classifier. We believe that the performance can be enhanced by separating its effective feature selection component and combining it with a powerful classifier such as the support vector machine (SVM). More generally the top scoring pairs generated by the k-TSP ranking algorithm can be used as a dimensionally reduced subspace for other machine learning classifiers.<p></p> <b>Results</b> We developed an approach integrating the k-TSP ranking algorithm (TSP) with other machine learning methods, allowing combination of the computationally efficient, multivariate feature ranking of k-TSP with multivariate classifiers such as SVM. We evaluated this hybrid scheme (k-TSP+SVM) in a range of simulated datasets with known data structures. As compared with other feature selection methods, such as a univariate method similar to Fisher's discriminant criterion (Fisher), or a recursive feature elimination embedded in SVM (RFE), TSP is increasingly more effective than the other two methods as the informative genes become progressively more correlated, which is demonstrated both in terms of the classification performance and the ability to recover true informative genes. We also applied this hybrid scheme to four cancer prognosis datasets, in which k-TSP+SVM outperforms k-TSP classifier in all datasets, and achieves either comparable or superior performance to that using SVM alone. In concurrence with what is observed in simulation, TSP appears to be a better feature selector than Fisher and RFE in some of the cancer datasets.<p></p> <b>Conclusions</b> The k-TSP ranking algorithm can be used as a computationally efficient, multivariate filter method for feature selection in machine learning. SVM in combination with k-TSP ranking algorithm outperforms k-TSP and SVM alone in simulated datasets and in some cancer prognosis datasets. Simulation studies suggest that as a feature selector, it is better tuned to certain data characteristics, i.e. correlations among informative genes, which is potentially interesting as an alternative feature ranking method in pathway analysis

    Covert Waking Brain Activity Reveals Instantaneous Sleep Depth

    Get PDF
    The neural correlates of the wake-sleep continuum remain incompletely understood, limiting the development of adaptive drug delivery systems for promoting sleep maintenance. The most useful measure for resolving early positions along this continuum is the alpha oscillation, an 8–13 Hz electroencephalographic rhythm prominent over posterior scalp locations. The brain activation signature of wakefulness, alpha expression discloses immediate levels of alertness and dissipates in concert with fading awareness as sleep begins. This brain activity pattern, however, is largely ignored once sleep begins. Here we show that the intensity of spectral power in the alpha band actually continues to disclose instantaneous responsiveness to noise—a measure of sleep depth—throughout a night of sleep. By systematically challenging sleep with realistic and varied acoustic disruption, we found that sleepers exhibited markedly greater sensitivity to sounds during moments of elevated alpha expression. This result demonstrates that alpha power is not a binary marker of the transition between sleep and wakefulness, but carries rich information about immediate sleep stability. Further, it shows that an empirical and ecologically relevant form of sleep depth is revealed in real-time by EEG spectral content in the alpha band, a measure that affords prediction on the order of minutes. This signal, which transcends the boundaries of classical sleep stages, could potentially be used for real-time feedback to novel, adaptive drug delivery systems for inducing sleep

    Fly Photoreceptors Encode Phase Congruency

    Get PDF
    More than five decades ago it was postulated that sensory neurons detect and selectively enhance behaviourally relevant features of natural signals. Although we now know that sensory neurons are tuned to efficiently encode natural stimuli, until now it was not clear what statistical features of the stimuli they encode and how. Here we reverse-engineer the neural code of Drosophila photoreceptors and show for the first time that photoreceptors exploit nonlinear dynamics to selectively enhance and encode phase-related features of temporal stimuli, such as local phase congruency, which are invariant to changes in illumination and contrast. We demonstrate that to mitigate for the inherent sensitivity to noise of the local phase congruency measure, the nonlinear coding mechanisms of the fly photoreceptors are tuned to suppress random phase signals, which explains why photoreceptor responses to naturalistic stimuli are significantly different from their responses to white noise stimuli

    Unsupervised assessment of microarray data quality using a Gaussian mixture model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Quality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny.</p> <p>Results</p> <p>We show how an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and the naïve Bayes model can be used to automate microarray quality assessment. The method is flexible and can be easily adapted to accommodate alternate quality statistics and platforms. We evaluate our approach using Affymetrix 3' gene expression and exon arrays and compare the performance of this method to a similar supervised approach.</p> <p>Conclusion</p> <p>This research illustrates the efficacy of an unsupervised classification approach for the purpose of automated microarray data quality assessment. Since our approach requires only unannotated training data, it is easy to customize and to keep up-to-date as technology evolves. In contrast to other "black box" classification systems, this method also allows for intuitive explanations.</p

    Reliability analysis of microarray data using fuzzy c-means and normal mixture modeling based classification methods

    No full text
    WOS: 000227241200011PubMed ID: 15374860Motivation: A serious limitation in microarray analysis is the unreliability of the data generated from low signal intensities. Such data may produce erroneous gene expression ratios and cause unnecessary validation or post-analysis follow-up tasks. Therefore, the elimination of unreliable signal intensities will enhance reproducibility and reliability of gene expression ratios produced from microarray data. In this study, we applied fuzzy c-means (FCM) and normal mixture modeling (NMM) based classification methods to separate microarray data into reliable and unreliable signal intensity populations. Results: We compared the results of FCM classification with those of classification based on NMM. Both approaches were validated against reference sets of biological data consisting of only true positives and true negatives. We observed that both methods performed equally well in terms of sensitivity and specificity. Although a comparison of the computation times indicated that the fuzzy approach is computationally more efficient, other considerations support the use of NMM for the reliability analysis of microarray data

    Nonlinear analysis of heart rate variability

    No full text
    23rd Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society -- OCT 25-28, 2001 -- ISTANBUL, TURKEYWOS: 000178871900439This article reports nonlinear analysis of ECG R-R interval time-series obtained from healthy individuals and some cardiac patients. The R-R interval time-series data from 6 healthy individuals and 3 cardiac patients were transformed into multidimensional phase-space vectors by time-delay embedding. The largest Lyapunov exponent and correlation dimension (CD) were calculated. Nonlinearity was tested by comparing the CDs obtained from the original data with those obtained from surrogate data sets. Results are discussed with reference to results obtained in previous studies.Natl Sci Fdn, TUBITAK, Sci & Tech Res Ctr Turkey, ISIK Univ, COMNET, EREL Techno Grp, GUZEL SANATLAR Printinghouse, JOHNSON&JOHNSON Med, PFIZER, SIEMENS Med, TURKCELL Iletism Hizmetler A S, ALSTOM Elect Ltd Co, GANTEK Technol & SUN Microsyst, TURCOM Co Gr
    corecore