12 research outputs found

    A Topic Coverage Approach to Evaluation of Topic Models

    Full text link
    Topic models are widely used unsupervised models of text capable of learning topics - weighted lists of words and documents - from large collections of text documents. When topic models are used for discovery of topics in text collections, a question that arises naturally is how well the model-induced topics correspond to topics of interest to the analyst. In this paper we revisit and extend a so far neglected approach to topic model evaluation based on measuring topic coverage - computationally matching model topics with a set of reference topics that models are expected to uncover. The approach is well suited for analyzing models' performance in topic discovery and for large-scale analysis of both topic models and measures of model quality. We propose new measures of coverage and evaluate, in a series of experiments, different types of topic models on two distinct text domains for which interest for topic discovery exists. The experiments include evaluation of model quality, analysis of coverage of distinct topic categories, and the analysis of the relationship between coverage and other methods of topic model evaluation. The contributions of the paper include new measures of coverage, insights into both topic models and other methods of model evaluation, and the datasets and code for facilitating future research of both topic coverage and other approaches to topic model evaluation.Comment: Results and contributions unchanged; Added new references; Improved the contextualization and the description of the work (abstr, intro, 7.1 concl, rw, concl); Moved technical details of data and model building to appendices; Improved layout

    Preciznost sklapanja genoma bakterije Escherichia coli nakon γ-zračenja

    Get PDF
    γ-Radiation, a powerful DNA-damaging agent, can often lead to the formation of genome rearrangements. In this study, we have assessed the capacity of Escherichia coli to accurately reassemble its genome after multiple double-strand DNA breaks caused by γ-radiation. It has recently been shown that very high doses of γ-radiation or RecA protein deficiency cause erroneous chromosomal assemblies in Deinococcus radiodurans, a highly radiation-resistant bacterium. Accordingly, we have examined the accuracy of genome reassembly in both wild-type and recA strains of E. coli after exposure to the doses of γ-radiation which reduce the survival by 10^6 - to 10^7 -fold. Thirty-eight percent of wild-type survivors showed gross genome changes, most of which were found to be the consequence of the excision of e14, a 15-kb defective prophage. Only one additional type of gross genome rearrangement was detected, presumably representing the duplication of a DNA fragment. These results demonstrate an unexpectedly accurate genome reassembly in wild-type E. coli. We have detected no genome rearrangements in recA recBCD and recA recBCD sbcB mutants, suggesting that RecA-independent DNA repair in E. coli may also be accurate.Gama-zračenje je moćan agens koji oštećuje molekulu DNA i uzrokuje preraspodjelu genoma. U ovom smo radu ispitali sposobnost bakterije Escherichia coli da precizno sklopi svoj genom nakon višestrukih dvolančanih lomova DNA izazvanih γ-zračenjem. Nedavno smo dokazali da izuzetno velike doze γ-zračenja ili nedostatak proteina RecA uzrokuju pogrešno sklapanje genoma u bakteriji Deinococcus radiodurans, otpornoj na zračenje. Stoga smo istražili preciznost sklapanja genoma u divljem tipu i mutantu recA bakterije E. coli nakon izlaganja dozama γ-zračenja što smanjuju mogućnost preživljavanja stanica 10^6 do 10^7 puta. Kod 38 % stanica divljega tipa došlo je do velikih promjena u genomu, uglavnom kao posljedica izrezivanja profaga e14. Uz to, pronašli smo još samo jedan tip veće promjene u preraspodjeli genoma koji je vjerojatno posljedica udvostručenja fragmenta DNA. Rezultati pokazuju da divlji tip bakterije E. coli ima neočekivano veliku preciznost obnove genoma. U mutantima recA recBCD i recA recBCD sbcB nismo detektirali preraspodjelu genoma, što pokazuje da bi i RecA-neovisni popravak DNA u bakteriji E. coli također mogao biti vrlo precizan

    Chromosome Segregation and Cell Division Defects in Escherichia coli Recombination Mutants Exposed to Different DNA-Damaging Treatments

    Get PDF
    Homologous recombination repairs potentially lethal DNA lesions such as double-strand DNA breaks (DSBs) and single-strand DNA gaps (SSGs). In Escherichia coli, DSB repair is initiated by the RecBCD enzyme that resects double-strand DNA ends and loads RecA recombinase to the emerging single-strand (ss) DNA tails. SSG repair is mediated by the RecFOR protein complex that loads RecA onto the ssDNA segment of gaped duplex. In both repair pathways, RecA catalyses reactions of homologous DNA pairing and strand exchange, while RuvABC complex and RecG helicase process recombination intermediates. In this work, we have characterised cytological changes in various recombination mutants of E. coli after three different DNA-damaging treatments: (i) expression of I-SceI endonuclease, (ii) gamma-irradiation, and (iii) UV-irradiation. All three treatments caused severe chromosome segregation defects and DNA-less cell formation in the ruvABC, recG, and ruvABC recG mutants. After I-SceI expression and gamma-irradiation, this phenotype was efficiently suppressed by the recB mutation, indicating that cytological defects result mostly from incomplete DSB repair. In UV-irradiated cells, the recB mutation abolished cytological defects of recG mutants and also partially suppressed the cytological defects of ruvABC recG mutants. However, neither recB nor recO mutation alone could suppress the cytological defects of UV- irradiated ruvABC mutants. The suppression was achieved only by simultaneous inactivation of the recB and recO genes. Cell survival and microscopic analysis suggest that chromosome segregation defects in UV-irradiated ruvABC mutants largely result from defective processing of stalled replication forks. The results of this study show that chromosome morphology is a valuable marker in genetic analyses of recombinational repair in E. coli

    Genetic analysis of transductional recombination in Escherichia coli reveals differences in the postsynaptic stages of RecBCD and RecFOR pathways

    Get PDF
    Background and purpose: Homologous recombination in Escherichia coli proceeds via two pathways, RecBCD and RecFOR, which use different enzymes for DNA end resection and loading of RecA recombinase. The postsynaptic reactions following RecA-mediated homologous pairing have mostly been studied within the RecBCD pathway. They involve RuvABC helicase/resolvase complex, RecG and RadA helicases that process recombination intermediates to produce recombinant DNA molecules. Also, RecG functionally interacts with the PriA protein in initiation of recombination-dependent replication. Here, we studied the individual and combined effects of ruvABC, recG and radA null mutations on transductional recombination in both pathways. The effect of the priA300 mutation, which acts as a suppressor of the recG mutation, was also tested. The goal was to characterize the postsynaptic stage of transductional recombination in more details, especially in the RecFOR pathway, which is less well-studied. Materials and methods: Phage P1vir-mediated transduction was used to measure recombination efficiency in a series of recombination mutants. The proA+ marker was used for selection in transductional crosses with various ΔproA recipients. Results: The ruvABC mutation moderately decreased recombination in both recombination pathways, while radA had no effect. The recG mutation reduced recombination in the RecBCD pathway but not in the RecFOR pathway. The strong recombination defect of recG radA double mutants in both pathways was completely suppressed by the priA300 mutation, and this suppression depended on the functional RuvABC complex. Conclusions: RecG and RadA proteins have a redundant role in transductional recombination via RecFOR pathway. In both recombination pathways, RecG and RadA functionally interact with PriA, probably during initiation of recombination- dependent replication

    Translational Selection Is Ubiquitous in Prokaryotes

    Get PDF
    Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome—between 5% and 33%, depending on genome size—while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl–tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an “adaptome” by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea

    Accuracy of Genome Reassembly in γ-Irradiated Escherichia coli

    No full text
    γ-Radiation, a powerful DNA-damaging agent, can often lead to the formation of genome rearrangements. In this study, we have assessed the capacity of Escherichia coli to accurately reassemble its genome after multiple double-strand DNA breaks caused by γ-radiation. It has recently been shown that very high doses of γ-radiation or RecA protein deficiency cause erroneous chromosomal assemblies in Deinococcus radiodurans, a highly radiation-resistant bacterium. Accordingly, we have examined the accuracy of genome reassembly in both wild-type and recA strains of E. coli after exposure to the doses of γ-radiation which reduce the survival by 10^6 - to 10^7 -fold. Thirty-eight percent of wild-type survivors showed gross genome changes, most of which were found to be the consequence of the excision of e14, a 15-kb defective prophage. Only one additional type of gross genome rearrangement was detected, presumably representing the duplication of a DNA fragment. These results demonstrate an unexpectedly accurate genome reassembly in wild-type E. coli. We have detected no genome rearrangements in recA recBCD and recA recBCD sbcB mutants, suggesting that RecA-independent DNA repair in E. coli may also be accurate
    corecore