234 research outputs found

    WordCluster: detecting clusters of DNA words and genomic elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many <it>k-</it>mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds.</p> <p>Results</p> <p>We introduce here an algorithm to detect clusters of DNA words (<it>k-</it>mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used <it>WordCluster </it>to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome.</p> <p>Conclusions</p> <p><it>WordCluster </it>seems to predict biological meaningful clusters of DNA words (<it>k-</it>mers) and genomic entities. The implementation of the method into a web server is available at <url>http://bioinfo2.ugr.es/wordCluster/wordCluster.php</url> including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.</p

    Information and feedback to improve occupational physicians’ reporting of occupational diseases: a randomised controlled trial

    Get PDF
    To assess the effectiveness of supplying occupational physicians (OPs) with targeted and stage-matched information or with feedback on reporting occupational diseases to the national registry in the Netherlands. In a randomized controlled design, 1076 OPs were divided into three groups based on previous reporting behaviour: precontemplators not considering reporting, contemplators considering reporting and actioners reporting occupational diseases. Precontemplators and contemplators were randomly assigned to receive stage-matched, stage-mismatched or general information. Actioners were randomly assigned to receive personalized or standardized feedback upon notification. Outcome measures were the number of OPs reporting and the number of reported occupational diseases in a 180-day period before and after the intervention. Precontemplators were significantly more male and self-employed compared to contemplators and actioners. There was no significant effect of stage-matched information versus stage-mismatched or general information on the percentage of reporting OPs and on the mean number of notifications in each group. Receiving any information affected reporting more in contemplators than in precontemplators. The mean number of notifications in actioners increased more after personalized feedback than after standardized feedback, but the difference was not significant. This study supports the concept that contemplators are more susceptible to receiving information but could not confirm an effect of stage-matching this information on reporting occupational diseases to the national registr

    Identifying hazardousness of sewer pipeline gas mixture using classification methods: a comparative study

    Get PDF
    In this work, we formulated a real-world problem related to sewer pipeline gas detection using the classification-based approaches. The primary goal of this work was to identify the hazardousness of sewer pipeline to offer safe and non-hazardous access to sewer pipeline workers so that the human fatalities, which occurs due to the toxic exposure of sewer gas components, can be avoided. The dataset acquired through laboratory tests, experiments, and various literature sources was organized to design a predictive model that was able to identify/classify hazardous and non-hazardous situation of sewer pipeline. To design such prediction model, several classification algorithms were used and their performances were evaluated and compared, both empirically and statistically, over the collected dataset. In addition, the performances of several ensemble methods were analyzed to understand the extent of improvement offered by these methods. The result of this comprehensive study showed that the instance-based learning algorithm performed better than many other algorithms such as multilayer perceptron, radial basis function network, support vector machine, reduced pruning tree. Similarly, it was observed that multi-scheme ensemble approach enhanced the performance of base predictors

    Rational Design of Temperature-Sensitive Alleles Using Computational Structure Prediction

    Get PDF
    Temperature-sensitive (ts) mutations are mutations that exhibit a mutant phenotype at high or low temperatures and a wild-type phenotype at normal temperature. Temperature-sensitive mutants are valuable tools for geneticists, particularly in the study of essential genes. However, finding ts mutations typically relies on generating and screening many thousands of mutations, which is an expensive and labor-intensive process. Here we describe an in silico method that uses Rosetta and machine learning techniques to predict a highly accurate “top 5” list of ts mutations given the structure of a protein of interest. Rosetta is a protein structure prediction and design code, used here to model and score how proteins accommodate point mutations with side-chain and backbone movements. We show that integrating Rosetta relax-derived features with sequence-based features results in accurate temperature-sensitive mutation predictions

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    Intermediate DNA methylation is a conserved signature of genome regulation

    Get PDF
    The role of intermediate methylation states in DNA is unclear. Here, to comprehensively identify regions of intermediate methylation and their quantitative relationship with gene activity, we apply integrative and comparative epigenomics to 25 human primary cell and tissue samples. We report 18,452 intermediate methylation regions located near 36 of genes and enriched at enhancers, exons and DNase I hypersensitivity sites. Intermediate methylation regions average 57 methylation, are predominantly allele-independent and are conserved across individuals and between mouse and human, suggesting a conserved function. These regions have an intermediate level of active chromatin marks and their associated genes have intermediate transcriptional activity. Exonic intermediate methylation correlates with exon inclusion at a level between that of fully methylated and unmethylated exons, highlighting gene context-dependent functions. We conclude that intermediate DNA methylation is a conserved signature of gene regulation and exon usage

    Global gene disruption in human cells to assign genes to phenotypes

    Get PDF
    Insertional mutagenesis in a haploid background can disrupt gene function[superscript 1]. We extend our earlier work by using a retroviral gene-trap vector to generate insertions in >98% of the genes expressed in a human cancer cell line that is haploid for all but one of its chromosomes. We apply phenotypic interrogation via tag sequencing (PhITSeq) to examine millions of mutant alleles through selection and parallel sequencing. Analysis of pools of cells, rather than individual clones[superscript 1] enables rapid assessment of the spectrum of genes involved in the phenotypes under study. This facilitates comparative screens as illustrated here for the family of cytolethal distending toxins (CDTs). CDTs are virulence factors secreted by a variety of pathogenic Gram-negative bacteria responsible for tissue damage at distinct anatomical sites[superscript 2]. We identify 743 mutations distributed over 12 human genes important for intoxication by four different CDTs. Although related CDTs may share host factors, they also exploit unique host factors to yield a profile characteristic for each CDT

    Chromatin States Accurately Classify Cell Differentiation Stages

    Get PDF
    Gene expression is controlled by the concerted interactions between transcription factors and chromatin regulators. While recent studies have identified global chromatin state changes across cell-types, it remains unclear to what extent these changes are co-regulated during cell-differentiation. Here we present a comprehensive computational analysis by assembling a large dataset containing genome-wide occupancy information of 5 histone modifications in 27 human cell lines (including 24 normal and 3 cancer cell lines) obtained from the public domain, followed by independent analysis at three different representations. We classified the differentiation stage of a cell-type based on its genome-wide pattern of chromatin states, and found that our method was able to identify normal cell lines with nearly 100% accuracy. We then applied our model to classify the cancer cell lines and found that each can be unequivocally classified as differentiated cells. The differences can be in part explained by the differential activities of three regulatory modules associated with embryonic stem cells. We also found that the “hotspot” genes, whose chromatin states change dynamically in accordance to the differentiation stage, are not randomly distributed across the genome but tend to be embedded in multi-gene chromatin domains, and that specialized gene clusters tend to be embedded in stably occupied domains
    corecore