123 research outputs found
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
Over the past five decades, k-means has become the clustering algorithm of
choice in many application domains primarily due to its simplicity, time/space
efficiency, and invariance to the ordering of the data points. Unfortunately,
the algorithm's sensitivity to the initial selection of the cluster centers
remains to be its most serious drawback. Numerous initialization methods have
been proposed to address this drawback. Many of these methods, however, have
time complexity superlinear in the number of data points, which makes them
impractical for large data sets. On the other hand, linear methods are often
random and/or sensitive to the ordering of the data points. These methods are
generally unreliable in that the quality of their results is unpredictable.
Therefore, it is common practice to perform multiple runs of such methods and
take the output of the run that produces the best results. Such a practice,
however, greatly increases the computational requirements of the otherwise
highly efficient k-means algorithm. In this chapter, we investigate the
empirical performance of six linear, deterministic (non-random), and
order-invariant k-means initialization methods on a large and diverse
collection of data sets from the UCI Machine Learning Repository. The results
demonstrate that two relatively unknown hierarchical initialization methods due
to Su and Dy outperform the remaining four methods with respect to two
objective effectiveness criteria. In addition, a recent method due to Erisoglu
et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms
(Springer, 2014). arXiv admin note: substantial text overlap with
arXiv:1304.7465, arXiv:1209.196
Fast, automated measurement of nematode swimming (thrashing) without morphometry
Background:
The "thrashing assay", in which nematodes are placed in liquid and the frequency of lateral swimming ("thrashing") movements estimated, is a well-established method for measuring motility in the genetic model organism Caenorhabditis elegans as well as in parasitic nematodes. It is used as an index of the effects of drugs, chemicals or mutations on motility and has proved useful in identifying mutants affecting behaviour. However, the method is laborious, subject to experimenter error, and therefore does not permit high-throughput applications. Existing automation methods usually involve analysis of worm shape, but this is computationally demanding and error-prone. Here we present a novel, robust and rapid method of automatically counting the thrashing frequency of worms that avoids morphometry but nonetheless gives a direct measure of thrashing frequency. Our method uses principal components analysis to remove the background, followed by computation of a covariance matrix of the remaining image frames from which the interval between statistically-similar frames is estimated.
Results:
We tested the performance of our covariance method in measuring thrashing rates of worms using mutations that affect motility and found that it accurately substituted for laborious, manual measurements over a wide range of thrashing rates. The algorithm used also enabled us to determine a dose-dependent inhibition of thrashing frequency by the anthelmintic drug, levamisole, illustrating the suitability of the system for assaying the effects of drugs and chemicals on motility. Furthermore, the algorithm successfully measured the actions of levamisole on a parasitic nematode, Haemonchus contortus, which undergoes complex contorted shapes whilst swimming, without alterations in the code or of any parameters, indicating that it is applicable to different nematode species, including parasitic nematodes. Our method is capable of analyzing a 30 s movie in less than 30 s and can therefore be deployed in rapid screens.
Conclusion:
We demonstrate that a covariance-based method yields a fast, reliable, automated measurement of C. elegans motility which can replace the far more time-consuming, manual method. The absence of a morphometry step means that the method can be applied to any nematode that swims in liquid and, together with its speed, this simplicity lends itself to deployment in large-scale chemical and genetic screens. </p
The criminal careers of those imprisoned for hate crime in the UK
Hate crime research has increased, but there are very few studies examining hate crime offenders. It is, therefore, difficult to determine to what extent those who perpetrate this offence might be different from those who have not committed hate crime. This study is the first to provide an account of the demographics and criminal histories of those serving time in prison for committing a hate crime. It is based on a large complete population of offenders in the UK. Hate crime offenders released from prison were found to have prolific criminal careers, having committed a wide range and large number of different types of offences. When compared with those who committed a general (non-hate) violent offence, violent hate crime offenders were significantly older and were considerably more prolific in their previous offending. Violent hate crime appeared quantitatively, as opposed to qualitatively, different from violent non-hate crime, but this was less clearly true when those who had committed public order hate crime were compared with other public order offenders. Interventions to reduce the later offending of violent hate crime offenders should be based on the effective interventions that exist for violent offenders, but should take into account knowledge about the surprisingly prolific criminal careers of hate crime offenders
Superhelical Duplex Destabilization and the Recombination Position Effect
The susceptibility to recombination of a plasmid inserted into a chromosome
varies with its genomic position. This recombination position effect is known to
correlate with the average G+C content of the flanking sequences. Here we
propose that this effect could be mediated by changes in the susceptibility to
superhelical duplex destabilization that would occur. We use standard
nonparametric statistical tests, regression analysis and principal component
analysis to identify statistically significant differences in the
destabilization profiles calculated for the plasmid in different contexts, and
correlate the results with their measured recombination rates. We show that the
flanking sequences significantly affect the free energy of denaturation at
specific sites interior to the plasmid. These changes correlate well with
experimentally measured variations of the recombination rates within the
plasmid. This correlation of recombination rate with superhelical
destabilization properties of the inserted plasmid DNA is stronger than that
with average G+C content of the flanking sequences. This model suggests a
possible mechanism by which flanking sequence base composition, which is not
itself a context-dependent attribute, can affect recombination rates at
positions within the plasmid
Adolescents with metabolic syndrome have a history of low aerobic fitness and physical activity levels
Abstract: Purpose: Metabolic syndrome (MS) is a clustering of cardiovascular disease risk factors that identifies individuals with the highest risk for heart disease. Two factors that may influence the MS are physical activity and aerobic fitness. This study determined if adolescent with the MS had low levels of aerobic fitness and physical activity as children. Methods: This longitudinal, exploratory study had 389 participants: 51% girls, 84% Caucasian, 12% African American, 1% Hispanic, and 3% other races, from the State of North Carolina. Habitual physical activity (PA survey), aerobic fitness (VO2max), body mass index (BMI), blood pressure, and lipids obtained at 7–10 y of age were compared to their results obtained 7 y later at ages 14–17 y. Results: Eighteen adolescents (4.6%) developed 3 or more characteristics of the MS. Logistic regression, adjusting for BMI percentile, blood pressure, and cholesterol levels, found that adolescents with the MS were 6.08 (95%CI = 1.18–60.08) times more likely to have low aerobic fitness as children and 5.16 (95%CI = 1.06–49.66) times more likely to have low PA levels. Conclusion: Low levels of childhood physical activity and aerobic fitness are associated with the presence of the metabolic syndrome in adolescents. Thus, efforts need to begin early in childhood to increase exercise
DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes
BACKGROUND: In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA). RESULTS: Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question. CONCLUSIONS: Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided
Between Metabolite Relationships: an essential aspect of metabolic change
Not only the levels of individual metabolites, but also the relations between the levels of different metabolites may indicate (experimentally induced) changes in a biological system. Component analysis methods in current ‘standard’ use for metabolomics, such as Principal Component Analysis (PCA), do not focus on changes in these relations. We therefore propose the concept of ‘Between Metabolite Relationships’ (BMRs): common changes in the covariance (or correlation) between all metabolites in an organism. Such structural changes may indicate metabolic change brought about by experimental manipulation but which are lost with standard data analysis methods. These BMRs can be analysed by the INdividual Differences SCALing (INDSCAL) method. First the BMR quantification is described and subsequently the INDSCAL method. Finally, two studies illustrate the power and the applicability of BMRs in metabolomics. The first study is about the induced plant response of cabbage to herbivory, of which BMRs are a considerable part. In the second study—a human nutritional intervention study of green tea extract—standard data analysis tools did not reveal any metabolic change, although the BMRs were considerably affected. The presented results show that BMRs can be easily implemented in a wide variety of metabolomic studies. They provide a new source of information to describe biological systems in a way that fits flawlessly into the next generation of systems biology questions, dealing with personalized responses
Identification of Thioredoxin Glutathione Reductase Inhibitors That Kill Cestode and Trematode Parasites
Parasitic flatworms are responsible for serious infectious diseases that affect humans as well as livestock animals in vast regions of the world. Yet, the drug armamentarium available for treatment of these infections is limited: praziquantel is the single drug currently available for 200 million people infected with Schistosoma spp. and there is justified concern about emergence of drug resistance. Thioredoxin glutathione reductase (TGR) is an essential core enzyme for redox homeostasis in flatworm parasites. In this work, we searched for flatworm TGR inhibitors testing compounds belonging to various families known to inhibit thioredoxin reductase or TGR and also additional electrophilic compounds. Several furoxans and one thiadiazole potently inhibited TGRs from both classes of parasitic flatworms: cestoda (tapeworms) and trematoda (flukes), while several benzofuroxans and a quinoxaline moderately inhibited TGRs. Remarkably, five active compounds from diverse families possessed a phenylsulfonyl group, strongly suggesting that this moiety is a new pharmacophore. The most active inhibitors were further characterized and displayed slow and nearly irreversible binding to TGR. These compounds efficiently killed Echinococcus granulosus larval worms and Fasciola hepatica newly excysted juveniles in vitro at a 20 µM concentration. Our results support the concept that the redox metabolism of flatworm parasites is precarious and particularly susceptible to destabilization, show that furoxans can be used to target both flukes and tapeworms, and identified phenylsulfonyl as a new drug-hit moiety for both classes of flatworm parasites
- …