123 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Fast, automated measurement of nematode swimming (thrashing) without morphometry

    Get PDF
    Background: The "thrashing assay", in which nematodes are placed in liquid and the frequency of lateral swimming ("thrashing") movements estimated, is a well-established method for measuring motility in the genetic model organism Caenorhabditis elegans as well as in parasitic nematodes. It is used as an index of the effects of drugs, chemicals or mutations on motility and has proved useful in identifying mutants affecting behaviour. However, the method is laborious, subject to experimenter error, and therefore does not permit high-throughput applications. Existing automation methods usually involve analysis of worm shape, but this is computationally demanding and error-prone. Here we present a novel, robust and rapid method of automatically counting the thrashing frequency of worms that avoids morphometry but nonetheless gives a direct measure of thrashing frequency. Our method uses principal components analysis to remove the background, followed by computation of a covariance matrix of the remaining image frames from which the interval between statistically-similar frames is estimated. Results: We tested the performance of our covariance method in measuring thrashing rates of worms using mutations that affect motility and found that it accurately substituted for laborious, manual measurements over a wide range of thrashing rates. The algorithm used also enabled us to determine a dose-dependent inhibition of thrashing frequency by the anthelmintic drug, levamisole, illustrating the suitability of the system for assaying the effects of drugs and chemicals on motility. Furthermore, the algorithm successfully measured the actions of levamisole on a parasitic nematode, Haemonchus contortus, which undergoes complex contorted shapes whilst swimming, without alterations in the code or of any parameters, indicating that it is applicable to different nematode species, including parasitic nematodes. Our method is capable of analyzing a 30 s movie in less than 30 s and can therefore be deployed in rapid screens. Conclusion: We demonstrate that a covariance-based method yields a fast, reliable, automated measurement of C. elegans motility which can replace the far more time-consuming, manual method. The absence of a morphometry step means that the method can be applied to any nematode that swims in liquid and, together with its speed, this simplicity lends itself to deployment in large-scale chemical and genetic screens. </p

    The criminal careers of those imprisoned for hate crime in the UK

    Get PDF
    Hate crime research has increased, but there are very few studies examining hate crime offenders. It is, therefore, difficult to determine to what extent those who perpetrate this offence might be different from those who have not committed hate crime. This study is the first to provide an account of the demographics and criminal histories of those serving time in prison for committing a hate crime. It is based on a large complete population of offenders in the UK. Hate crime offenders released from prison were found to have prolific criminal careers, having committed a wide range and large number of different types of offences. When compared with those who committed a general (non-hate) violent offence, violent hate crime offenders were significantly older and were considerably more prolific in their previous offending. Violent hate crime appeared quantitatively, as opposed to qualitatively, different from violent non-hate crime, but this was less clearly true when those who had committed public order hate crime were compared with other public order offenders. Interventions to reduce the later offending of violent hate crime offenders should be based on the effective interventions that exist for violent offenders, but should take into account knowledge about the surprisingly prolific criminal careers of hate crime offenders

    Superhelical Duplex Destabilization and the Recombination Position Effect

    Get PDF
    The susceptibility to recombination of a plasmid inserted into a chromosome varies with its genomic position. This recombination position effect is known to correlate with the average G+C content of the flanking sequences. Here we propose that this effect could be mediated by changes in the susceptibility to superhelical duplex destabilization that would occur. We use standard nonparametric statistical tests, regression analysis and principal component analysis to identify statistically significant differences in the destabilization profiles calculated for the plasmid in different contexts, and correlate the results with their measured recombination rates. We show that the flanking sequences significantly affect the free energy of denaturation at specific sites interior to the plasmid. These changes correlate well with experimentally measured variations of the recombination rates within the plasmid. This correlation of recombination rate with superhelical destabilization properties of the inserted plasmid DNA is stronger than that with average G+C content of the flanking sequences. This model suggests a possible mechanism by which flanking sequence base composition, which is not itself a context-dependent attribute, can affect recombination rates at positions within the plasmid

    Adolescents with metabolic syndrome have a history of low aerobic fitness and physical activity levels

    Get PDF
    Abstract: Purpose: Metabolic syndrome (MS) is a clustering of cardiovascular disease risk factors that identifies individuals with the highest risk for heart disease. Two factors that may influence the MS are physical activity and aerobic fitness. This study determined if adolescent with the MS had low levels of aerobic fitness and physical activity as children. Methods: This longitudinal, exploratory study had 389 participants: 51% girls, 84% Caucasian, 12% African American, 1% Hispanic, and 3% other races, from the State of North Carolina. Habitual physical activity (PA survey), aerobic fitness (VO2max), body mass index (BMI), blood pressure, and lipids obtained at 7–10 y of age were compared to their results obtained 7 y later at ages 14–17 y. Results: Eighteen adolescents (4.6%) developed 3 or more characteristics of the MS. Logistic regression, adjusting for BMI percentile, blood pressure, and cholesterol levels, found that adolescents with the MS were 6.08 (95%CI = 1.18–60.08) times more likely to have low aerobic fitness as children and 5.16 (95%CI = 1.06–49.66) times more likely to have low PA levels. Conclusion: Low levels of childhood physical activity and aerobic fitness are associated with the presence of the metabolic syndrome in adolescents. Thus, efforts need to begin early in childhood to increase exercise

    DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes

    Get PDF
    BACKGROUND: In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA). RESULTS: Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question. CONCLUSIONS: Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided

    Between Metabolite Relationships: an essential aspect of metabolic change

    Get PDF
    Not only the levels of individual metabolites, but also the relations between the levels of different metabolites may indicate (experimentally induced) changes in a biological system. Component analysis methods in current ‘standard’ use for metabolomics, such as Principal Component Analysis (PCA), do not focus on changes in these relations. We therefore propose the concept of ‘Between Metabolite Relationships’ (BMRs): common changes in the covariance (or correlation) between all metabolites in an organism. Such structural changes may indicate metabolic change brought about by experimental manipulation but which are lost with standard data analysis methods. These BMRs can be analysed by the INdividual Differences SCALing (INDSCAL) method. First the BMR quantification is described and subsequently the INDSCAL method. Finally, two studies illustrate the power and the applicability of BMRs in metabolomics. The first study is about the induced plant response of cabbage to herbivory, of which BMRs are a considerable part. In the second study—a human nutritional intervention study of green tea extract—standard data analysis tools did not reveal any metabolic change, although the BMRs were considerably affected. The presented results show that BMRs can be easily implemented in a wide variety of metabolomic studies. They provide a new source of information to describe biological systems in a way that fits flawlessly into the next generation of systems biology questions, dealing with personalized responses

    Identification of Thioredoxin Glutathione Reductase Inhibitors That Kill Cestode and Trematode Parasites

    Get PDF
    Parasitic flatworms are responsible for serious infectious diseases that affect humans as well as livestock animals in vast regions of the world. Yet, the drug armamentarium available for treatment of these infections is limited: praziquantel is the single drug currently available for 200 million people infected with Schistosoma spp. and there is justified concern about emergence of drug resistance. Thioredoxin glutathione reductase (TGR) is an essential core enzyme for redox homeostasis in flatworm parasites. In this work, we searched for flatworm TGR inhibitors testing compounds belonging to various families known to inhibit thioredoxin reductase or TGR and also additional electrophilic compounds. Several furoxans and one thiadiazole potently inhibited TGRs from both classes of parasitic flatworms: cestoda (tapeworms) and trematoda (flukes), while several benzofuroxans and a quinoxaline moderately inhibited TGRs. Remarkably, five active compounds from diverse families possessed a phenylsulfonyl group, strongly suggesting that this moiety is a new pharmacophore. The most active inhibitors were further characterized and displayed slow and nearly irreversible binding to TGR. These compounds efficiently killed Echinococcus granulosus larval worms and Fasciola hepatica newly excysted juveniles in vitro at a 20 µM concentration. Our results support the concept that the redox metabolism of flatworm parasites is precarious and particularly susceptible to destabilization, show that furoxans can be used to target both flukes and tapeworms, and identified phenylsulfonyl as a new drug-hit moiety for both classes of flatworm parasites
    corecore