25 research outputs found

    Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes

    Get PDF
    BACKGROUND: A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO) for the annotated genes of the relevant species. RESULTS: In this paper, we introduce two performance measures for evaluating the results of a clustering algorithm in its ability to produce biologically meaningful clusters. The first measure is a biological homogeneity index (BHI). As the name suggests, it is a measure of how biologically homogeneous the clusters are. This can be used to quantify the performance of a given clustering algorithm such as UPGMA in grouping genes for a particular data set and also for comparing the performance of a number of competing clustering algorithms applied to the same data set. The second performance measure is called a biological stability index (BSI). For a given clustering algorithm and an expression data set, it measures the consistency of the clustering algorithm's ability to produce biologically meaningful clusters when applied repeatedly to similar data sets. A good clustering algorithm should have high BHI and moderate to high BSI. We evaluated the performance of ten well known clustering algorithms on two gene expression data sets and identified the optimal algorithm in each case. The first data set deals with SAGE profiles of differentially expressed tags between normal and ductal carcinoma in situ samples of breast cancer patients. The second data set contains the expression profiles over time of positively expressed genes (ORF's) during sporulation of budding yeast. Two separate choices of the functional classes were used for this data set and the results were compared for consistency. CONCLUSION: Functional information of annotated genes available from various GO databases mined using ontology tools can be used to systematically judge the results of an unsupervised clustering algorithm as applied to a gene expression data set in clustering genes. This information could be used to select the right algorithm from a class of clustering algorithms for the given data set

    Social Determinants of Smoking in Low- and Middle-Income Countries: Results from the World Health Survey

    Get PDF
    INTRODUCTION: Tobacco smoking is a leading cause of premature death and disability, and over 80% of the world's smokers live in low- or middle-income countries. The objective of this study is to assess demographic and socioeconomic determinants of current smoking in low- and middle-income countries. METHODS: We used data, from the World Health Survey in 48 low-income and middle-income countries, to explore the impact of demographic and socioeconomic factors on the current smoking status of respondents. The data from these surveys provided information on 213,807 respondents aged 18 years or above that were divided into 4 pooled datasets according to their sex and country income group. The overall proportion of current smokers, as well as the proportion by each relevant demographic and socioeconomic determinant, was calculated within each of the pooled datasets, and multivariable logistic regression was used to assess the association between current smoking and these determinants. RESULTS: The odds of smoking were not equal in all demographic or socioeconomic groups. Some factors were fairly stable across the four datasets studied: for example, individuals were more likely to smoke if they had little or no education, regardless of if they were male or female, or lived in a low or a middle income country. Nevertheless, other factors, notably age and wealth, showed a differential effect on smoking by sex or country income level. While women in the low-income country group were twice as likely to smoke if they were in the lowest wealth quintile compared with the highest, the association was absent in the middle-income country group. CONCLUSION: Information on how smoking is distributed among low- or middle-income countries will allow policy makers to tailor future policies, and target the most vulnerable populations

    Search for new particles in events with energetic jets and large missing transverse momentum in proton-proton collisions at root s=13 TeV

    Get PDF
    A search is presented for new particles produced at the LHC in proton-proton collisions at root s = 13 TeV, using events with energetic jets and large missing transverse momentum. The analysis is based on a data sample corresponding to an integrated luminosity of 101 fb(-1), collected in 2017-2018 with the CMS detector. Machine learning techniques are used to define separate categories for events with narrow jets from initial-state radiation and events with large-radius jets consistent with a hadronic decay of a W or Z boson. A statistical combination is made with an earlier search based on a data sample of 36 fb(-1), collected in 2016. No significant excess of events is observed with respect to the standard model background expectation determined from control samples in data. The results are interpreted in terms of limits on the branching fraction of an invisible decay of the Higgs boson, as well as constraints on simplified models of dark matter, on first-generation scalar leptoquarks decaying to quarks and neutrinos, and on models with large extra dimensions. Several of the new limits, specifically for spin-1 dark matter mediators, pseudoscalar mediators, colored mediators, and leptoquarks, are the most restrictive to date.Peer reviewe

    Susceptibility to acaricides and detoxifying enzyme activities in the red spider mite, Oligonychus coffeae Nietner (Acari: Tetranychidae)

    No full text
    Roy, Somnath, Prasad, Anjali Km., Handique, Gautam, Deka, Bipanchi (2018): Susceptibility to acaricides and detoxifying enzyme activities in the red spider mite, Oligonychus coffeae Nietner (Acari: Tetranychidae). Acarologia 58 (3): 647-654, DOI: 10.24349/acarologia/20184261, URL: http://dx.doi.org/10.24349/acarologia/2018426
    corecore