175 research outputs found

    Unsupervised Feature Selection with Adaptive Structure Learning

    Full text link
    The problem of feature selection has raised considerable interests in the past decade. Traditional unsupervised methods select the features which can faithfully preserve the intrinsic structures of data, where the intrinsic structures are estimated using all the input features of data. However, the estimated intrinsic structures are unreliable/inaccurate when the redundant and noisy features are not removed. Therefore, we face a dilemma here: one need the true structures of data to identify the informative features, and one need the informative features to accurately estimate the true structures of data. To address this, we propose a unified learning framework which performs structure learning and feature selection simultaneously. The structures are adaptively learned from the results of feature selection, and the informative features are reselected to preserve the refined structures of data. By leveraging the interactions between these two essential tasks, we are able to capture accurate structures and select more informative features. Experimental results on many benchmark data sets demonstrate that the proposed method outperforms many state of the art unsupervised feature selection methods

    Fast, automated measurement of nematode swimming (thrashing) without morphometry

    Get PDF
    Background: The "thrashing assay", in which nematodes are placed in liquid and the frequency of lateral swimming ("thrashing") movements estimated, is a well-established method for measuring motility in the genetic model organism Caenorhabditis elegans as well as in parasitic nematodes. It is used as an index of the effects of drugs, chemicals or mutations on motility and has proved useful in identifying mutants affecting behaviour. However, the method is laborious, subject to experimenter error, and therefore does not permit high-throughput applications. Existing automation methods usually involve analysis of worm shape, but this is computationally demanding and error-prone. Here we present a novel, robust and rapid method of automatically counting the thrashing frequency of worms that avoids morphometry but nonetheless gives a direct measure of thrashing frequency. Our method uses principal components analysis to remove the background, followed by computation of a covariance matrix of the remaining image frames from which the interval between statistically-similar frames is estimated. Results: We tested the performance of our covariance method in measuring thrashing rates of worms using mutations that affect motility and found that it accurately substituted for laborious, manual measurements over a wide range of thrashing rates. The algorithm used also enabled us to determine a dose-dependent inhibition of thrashing frequency by the anthelmintic drug, levamisole, illustrating the suitability of the system for assaying the effects of drugs and chemicals on motility. Furthermore, the algorithm successfully measured the actions of levamisole on a parasitic nematode, Haemonchus contortus, which undergoes complex contorted shapes whilst swimming, without alterations in the code or of any parameters, indicating that it is applicable to different nematode species, including parasitic nematodes. Our method is capable of analyzing a 30 s movie in less than 30 s and can therefore be deployed in rapid screens. Conclusion: We demonstrate that a covariance-based method yields a fast, reliable, automated measurement of C. elegans motility which can replace the far more time-consuming, manual method. The absence of a morphometry step means that the method can be applied to any nematode that swims in liquid and, together with its speed, this simplicity lends itself to deployment in large-scale chemical and genetic screens. </p

    The projection score - an evaluation criterion for variable subset selection in PCA visualization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many non-informative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization.</p> <p>Results</p> <p>We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA.</p> <p>Conclusions</p> <p>We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis.</p

    Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

    Full text link
    Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a family of 12 parsimonious DLM models is exhibited which allows to fit onto various situations. An estimation algorithm, called the Fisher-EM algorithm, is also proposed for estimating both the mixture parameters and the discriminative subspace. Experiments on simulated and real datasets show that the proposed approach performs better than existing clustering methods while providing a useful representation of the clustered data. The method is as well applied to the clustering of mass spectrometry data

    Spatio-Temporal Dynamics of Human Intention Understanding in Temporo-Parietal Cortex: A Combined EEG/fMRI Repetition Suppression Paradigm

    Get PDF
    Inferring the intentions of other people from their actions recruits an inferior fronto-parietal action observation network as well as a putative social network that includes the posterior superior temporal sulcus (STS). However, the functional dynamics within and among these networks remains unclear. Here we used functional magnetic resonance imaging (fMRI) and high-density electroencephalogram (EEG), with a repetition suppression design, to assess the spatio-temporal dynamics of decoding intentions. Suppression of fMRI activity to the repetition of the same intention was observed in inferior frontal lobe, anterior intraparietal sulcus (aIPS), and right STS. EEG global field power was reduced with repeated intentions at an early (starting at 60 ms) and a later (∌330 ms) period after the onset of a hand-on-object encounter. Source localization during these two intervals involved right STS and aIPS regions highly consistent with RS effects observed with fMRI. These results reveal the dynamic involvement of temporal and parietal networks at multiple stages during the intention decoding and without a strict segregation of intention decoding between these networks

    Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data?

    Get PDF
    Despite the availability of newer approaches, traditional hierarchical clustering remains very popular in genetic diversity studies in plants. However, little is known about its suitability for molecular marker data. We studied the performance of traditional hierarchical clustering techniques using real and simulated molecular marker data. Our study also compared the performance of traditional hierarchical clustering with model-based clustering (STRUCTURE). We showed that the cophenetic correlation coefficient is directly related to subgroup differentiation and can thus be used as an indicator of the presence of genetically distinct subgroups in germplasm collections. Whereas UPGMA performed well in preserving distances between accessions, Ward excelled in recovering groups. Our results also showed a close similarity between clusters obtained by Ward and by STRUCTURE. Traditional cluster analysis can provide an easy and effective way of determining structure in germplasm collections using molecular marker data, and, the output can be used for sampling core collections or for association studies

    Geographic genetic structure of Iberian columbines (gen. Aquilegia)

    Get PDF
    Southern European columbines (genus Aquilegia)are involved in active processes of diversification, and the Iberian Peninsula offers a privileged observatory to witness the process. Studies on Iberian columbines have provided significant advances on species diversification,but we still lack a complete perspective of the genetic diversification in the Iberian scenario. This work explores how genetic diversity of the genus Aquilegia is geographically structured across the Iberian Peninsula. We used Bayesian clustering methods, principal coordinates analyses, and NJ phenograms to assess the genetic relationships among 285 individuals from 62 locations and detect the main lineages. Genetic diversity of Iberian columbines consists of five geographically structured lineages, corresponding to different Iberian taxa. Differentiation among lineages shows particularly complex admixture patterns at Northeast and highly homogeneous toward Northwest and Southeast. This geographic genetic structure suggests the existence of incomplete lineage sorting and interspecific hybridization as could be expected in recent processes of diversification under the influence of quaternary postglacial migrations. This scenario is consistent with what is proposed by the most recent studies on European and Iberian columbines, which point to geographic isolation and divergent selection by habitat specialization as the main diversification drivers of the Iberian Aquilegia complex
    • 

    corecore