79 research outputs found

    A survey on feature weighting based K-Means algorithms

    Get PDF
    This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of Classification [de Amorim, R. C., 'A survey on feature weighting based K-Means algorithms', Journal of Classification, Vol. 33(2): 210-242, August 25, 2016]. Subject to embargo. Embargo end date: 25 August 2017. The final publication is available at Springer via http://dx.doi.org/10.1007/s00357-016-9208-4 © Classification Society of North America 2016In a real-world data set there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process. With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analysing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.Peer reviewedFinal Accepted Versio

    Armadillo 1.1: An Original Workflow Platform for Designing and Conducting Phylogenetic Analysis and Simulations

    Get PDF
    In this paper we introduce Armadillo v1.1, a novel workflow platform dedicated to designing and conducting phylogenetic studies, including comprehensive simulations. A number of important phylogenetic and general bioinformatics tools have been included in the first software release. As Armadillo is an open-source project, it allows scientists to develop their own modules as well as to integrate existing computer applications. Using our workflow platform, different complex phylogenetic tasks can be modeled and presented in a single workflow without any prior knowledge of programming techniques. The first version of Armadillo was successfully used by professors of bioinformatics at Université du Quebec à Montreal during graduate computational biology courses taught in 2010–11. The program and its source code are freely available at: <http://www.bioinfo.uqam.ca/armadillo>

    Subsampling effects in neuronal avalanche distributions recorded in vivo

    Get PDF
    Background Many systems in nature are characterized by complex behaviour where large cascades of events, or avalanches, unpredictably alternate with periods of little activity. Snow avalanches are an example. Often the size distribution f(s) of a system's avalanches follows a power law, and the branching parameter sigma, the average number of events triggered by a single preceding event, is unity. A power law for f(s), and sigma=1, are hallmark features of self-organized critical (SOC) systems, and both have been found for neuronal activity in vitro. Therefore, and since SOC systems and neuronal activity both show large variability, long-term stability and memory capabilities, SOC has been proposed to govern neuronal dynamics in vivo. Testing this hypothesis is difficult because neuronal activity is spatially or temporally subsampled, while theories of SOC systems assume full sampling. To close this gap, we investigated how subsampling affects f(s) and sigma by imposing subsampling on three different SOC models. We then compared f(s) and sigma of the subsampled models with those of multielectrode local field potential (LFP) activity recorded in three macaque monkeys performing a short term memory task. Results Neither the LFP nor the subsampled SOC models showed a power law for f(s). Both, f(s) and sigma, depended sensitively on the subsampling geometry and the dynamics of the model. Only one of the SOC models, the Abelian Sandpile Model, exhibited f(s) and sigma similar to those calculated from LFP activity. Conclusions Since subsampling can prevent the observation of the characteristic power law and sigma in SOC systems, misclassifications of critical systems as sub- or supercritical are possible. Nevertheless, the system specific scaling of f(s) and sigma under subsampling conditions may prove useful to select physiologically motivated models of brain function. Models that better reproduce f(s) and sigma calculated from the physiological recordings may be selected over alternatives

    CMS physics technical design report : Addendum on high density QCD with heavy ions

    Get PDF
    Peer reviewe

    Genetic Networking of the Bemisia tabaci Cryptic Species Complex Reveals Pattern of Biological Invasions

    Get PDF
    BACKGROUND: A challenge within the context of cryptic species is the delimitation of individual species within the complex. Statistical parsimony network analytics offers the opportunity to explore limits in situations where there are insufficient species-specific morphological characters to separate taxa. The results also enable us to explore the spread in taxa that have invaded globally. METHODOLOGY/PRINCIPAL FINDINGS: Using a 657 bp portion of mitochondrial cytochrome oxidase 1 from 352 unique haplotypes belonging to the Bemisia tabaci cryptic species complex, the analysis revealed 28 networks plus 7 unconnected individual haplotypes. Of the networks, 24 corresponded to the putative species identified using the rule set devised by Dinsdale et al. (2010). Only two species proposed in Dinsdale et al. (2010) departed substantially from the structure suggested by the analysis. The analysis of the two invasive members of the complex, Mediterranean (MED) and Middle East - Asia Minor 1 (MEAM1), showed that in both cases only a small number of haplotypes represent the majority that have spread beyond the home range; one MEAM1 and three MED haplotypes account for >80% of the GenBank records. Israel is a possible source of the globally invasive MEAM1 whereas MED has two possible sources. The first is the eastern Mediterranean which has invaded only the USA, primarily Florida and to a lesser extent California. The second are western Mediterranean haplotypes that have spread to the USA, Asia and South America. The structure for MED supports two home range distributions, a Sub-Saharan range and a Mediterranean range. The MEAM1 network supports the Middle East - Asia Minor region. CONCLUSION/SIGNIFICANCE: The network analyses show a high level of congruence with the species identified in a previous phylogenetic analysis. The analysis of the two globally invasive members of the complex support the view that global invasion often involve very small portions of the available genetic diversity

    Data-analysis strategies for image-based cell profiling

    Get PDF
    Image-based cell profiling is a high-throughput strategy for the quantification of phenotypic differences among a variety of cell populations. It paves the way to studying biological systems on a large scale by using chemical and genetic perturbations. The general workflow for this technology involves image acquisition with high-throughput microscopy systems and subsequent image processing and analysis. Here, we introduce the steps required to create high-quality image-based (i.e., morphological) profiles from a collection of microscopy images. We recommend techniques that have proven useful in each stage of the data analysis process, on the basis of the experience of 20 laboratories worldwide that are refining their image-based cell-profiling methodologies in pursuit of biological discovery. The recommended techniques cover alternatives that may suit various biological goals, experimental designs, and laboratories' preferences.Peer reviewe

    Biogeography of Amazonian fishes: deconstructing river basins as biogeographic units

    Full text link

    Cylindrical shells weakened by holes

    No full text
    corecore