1,567 research outputs found

    Separability-Oriented Subclass Discriminant Analysis

    Get PDF

    Sparse Discriminant Analysis

    Get PDF
    tionanddimensionreductionareofgreatimportanceiscommonin Classi cationinhigh-dimensionalfeaturespaceswhereinterpreta-biologicalandmedicalapplications. methodsasmicroarrays,1DNMR,andspectroscopyhavebecomeev- Fortheseapplicationsstandard erydaytoolsformeasuringthousandsoffeaturesinsamplesofinterest. Furthermore,thesamplesareoftencostlyandthereforemanysuch problemshavefewobservationsinrelationtothenumberoffeatures. Traditionallysuchdataareanalyzedby lectionbeforeclassi cation. Weproposeamethodwhichperforms rstperformingafeaturese-lineardiscriminantanalysiswithasparsenesscriterionimposedsuch thattheclassi mergedintooneanalysis. cation, featureselectionanddimensionreductionis thantraditionalfeatureselectionmethodsbasedoncomputationally Thesparsediscriminantanalysisisfaster heavycriteriasuchasWilk'slambda,andtheresultsarebetterwith regardstoclassi tomixturesofGaussianswhichisusefulwhene.g.biologicalclusters cationratesandsparseness.Themethodisextended arepresentwithineachclass. low-dimensionalviewsofthediscriminativedirections. Finally,themethodsproposedprovide 1

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset

    Cluster-Based Supervised Classification

    Get PDF

    Simultaneous prediction of wrist/hand motion via wearable ultrasound sensing

    Get PDF

    Nonlinear Supervised Dimensionality Reduction via Smooth Regular Embeddings

    Full text link
    The recovery of the intrinsic geometric structures of data collections is an important problem in data analysis. Supervised extensions of several manifold learning approaches have been proposed in the recent years. Meanwhile, existing methods primarily focus on the embedding of the training data, and the generalization of the embedding to initially unseen test data is rather ignored. In this work, we build on recent theoretical results on the generalization performance of supervised manifold learning algorithms. Motivated by these performance bounds, we propose a supervised manifold learning method that computes a nonlinear embedding while constructing a smooth and regular interpolation function that extends the embedding to the whole data space in order to achieve satisfactory generalization. The embedding and the interpolator are jointly learnt such that the Lipschitz regularity of the interpolator is imposed while ensuring the separation between different classes. Experimental results on several image data sets show that the proposed method outperforms traditional classifiers and the supervised dimensionality reduction algorithms in comparison in terms of classification accuracy in most settings

    Diagnostic prediction of complex diseases using phase-only correlation based on virtual sample template

    Get PDF
    Motivation: Complex diseases induce perturbations to interaction and regulation networks in living systems, resulting in dynamic equilibrium states that differ for different diseases and also normal states. Thus identifying gene expression patterns corresponding to different equilibrium states is of great benefit to the diagnosis and treatment of complex diseases. However, it remains a major challenge to deal with the high dimensionality and small size of available complex disease gene expression datasets currently used for discovering gene expression patterns. Results: Here we present a phase-only correlation (POC) based classification method for recognizing the type of complex diseases. First, a virtual sample template is constructed for each subclass by averaging all samples of each subclass in a training dataset. Then the label of a test sample is determined by measuring the similarity between the test sample and each template. This novel method can detect the similarity of overall patterns emerged from the differentially expressed genes or proteins while ignoring small mismatches. Conclusions: The experimental results obtained on seven publicly available complex disease datasets including microarray and protein array data demonstrate that the proposed POC-based disease classification method is effective and robust for diagnosing complex diseases with regard to the number of initially selected features, and its recognition accuracy is better than or comparable to other state-of-the-art machine learning methods. In addition, the proposed method does not require parameter tuning and data scaling, which can effectively reduce the occurrence of over-fitting and bias

    Performance of Feature Selection Methods

    Get PDF
    High-throughput biological technologies offer the promise of finding feature sets to serve as biomarkers for medical applications; however, the sheer number of potential features (genes, proteins, etc.) means that there needs to be massive feature selection, far greater than that envisioned in the classical literature. This paper considers performance analysis for feature-selection algorithms from two fundamental perspectives: How does the classification accuracy achieved with a selected feature set compare to the accuracy when the best feature set is used and what is the optimal number of features that should be used? The criteria manifest themselves in several issues that need to be considered when examining the efficacy of a feature-selection algorithm: (1) the correlation between the classifier errors for the selected feature set and the theoretically best feature set; (2) the regressions of the aforementioned errors upon one another; (3) the peaking phenomenon, that is, the effect of sample size on feature selection; and (4) the analysis of feature selection in the framework of high-dimensional models corresponding to high-throughput data
    • …
    corecore