51,928 research outputs found

    Coding limits on the number of transcription factors

    Get PDF
    Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms. We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction. The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.Comment: http://www.weizmann.ac.il/complex/tlusty/papers/BMCGenomics2006.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1590034/ http://www.biomedcentral.com/1471-2164/7/23

    Rates of DNA Sequence Profiles for Practical Values of Read Lengths

    Full text link
    A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size qq, read length \ell, and word length nn.Consequently, we demonstrate that for q2q\ge 2 and nq/21n\le q^{\ell/2-1}, the number of profile vectors is at least qκnq^{\kappa n} with κ\kappa very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors

    Evidence of a Genomic Biomarker in Normal Human Epithelial Mammary Cell Line, MCF-10A, That Is Absent in the Human Breast Cancer Cell Line, MCF-7

    Get PDF
    This study investigated the use of DNA amplification fingerprinting (DAF) to identify biomarkers useful in the elucidating genetic factors that lead to carcinogenesis. The DNA amplification fingerprinting (DAF) technique was used to generate fingerprint profiles of a normal human mammary epithelial cell line (MCF-10A) and a human breast cancer cell line (MCF-7). When compared with one another, a polymorphic biomarker gene (262 base pairs (bps)) was identified in MCF-10A but was not present in MCF-7. This gene was cloned from the genomic DNA of the MCF-10A cell line, and subjected to Genbank database analysis. The analysis of the nucleotide sequence polymorphic marker (Genbank account: AC079630) shows that this biomarker has 100% homology with the nucleotide sequence of human chromosome 12 BAC RP11-476D10 (bps 19612-19353). The nucleotide sequence was used for possible protein translation product and the result obtained indicated that the gene codes for hypothetical protein XF2620. In order to evaluate the effects that the 262 bps biomarker would have on the morphology of MCF-7 cells, it was transfected into MCF-7 cells. There were observable changes in the morphology of the transfected cells. These changes included an increase in cell elongation and a decrease in cell aggregation

    MINTmap: fast and exhaustive profiling of nuclear and mitochondrial tRNA fragments from short RNA-seq data.

    Get PDF
    Transfer RNA fragments (tRFs) are an established class of constitutive regulatory molecules that arise from precursor and mature tRNAs. RNA deep sequencing (RNA-seq) has greatly facilitated the study of tRFs. However, the repeat nature of the tRNA templates and the idiosyncrasies of tRNA sequences necessitate the development and use of methodologies that differ markedly from those used to analyze RNA-seq data when studying microRNAs (miRNAs) or messenger RNAs (mRNAs). Here we present MINTmap (for MItochondrial and Nuclear TRF mapping), a method and a software package that was developed specifically for the quick, deterministic and exhaustive identification of tRFs in short RNA-seq datasets. In addition to identifying them, MINTmap is able to unambiguously calculate and report both raw and normalized abundances for the discovered tRFs. Furthermore, to ensure specificity, MINTmap identifies the subset of discovered tRFs that could be originating outside of tRNA space and flags them as candidate false positives. Our comparative analysis shows that MINTmap exhibits superior sensitivity and specificity to other available methods while also being exceptionally fast. The MINTmap codes are available through https://github.com/TJU-CMC-Org/MINTmap/ under an open source GNU GPL v3.0 license

    Distinct core promoter codes drive transcription initiation at key developmental transitions in a marine chordate

    Get PDF
    BACKGROUND: Development is largely driven by transitions between transcriptional programs. The initiation of transcription at appropriate sites in the genome is a key component of this and yet few rules governing selection are known. Here, we used cap analysis of gene expression (CAGE) to generate bp-resolution maps of transcription start sites (TSSs) across the genome of Oikopleura dioica, a member of the closest living relatives to vertebrates. RESULTS: Our TSS maps revealed promoter features in common with vertebrates, as well as striking differences, and uncovered key roles for core promoter elements in the regulation of development. During spermatogenesis there is a genome-wide shift in mode of transcription initiation characterized by a novel core promoter element. This element was associated with > 70% of male-specific transcription, including the use of cryptic internal promoters within operons. In many cases this led to the exclusion of trans-splice sites, revealing a novel mechanism for regulating which mRNAs receive the spliced leader. Binding of the cell cycle regulator, E2F1, is enriched at the TSS of maternal genes in endocycling nurse nuclei. In addition, maternal promoters lack the TATA-like element found in zebrafish and have broad, rather than sharp, architectures with ordered nucleosomes. Promoters of ribosomal protein genes lack the highly conserved TCT initiator. We also report an association between DNA methylation on transcribed gene bodies and the TATA-box. CONCLUSIONS: Our results reveal that distinct functional promoter classes and overlapping promoter codes are present in protochordates like in vertebrates, but show extraordinary lineage-specific innovations. Furthermore, we uncover a genome-wide, developmental stage-specific shift in the mode of TSS selection. Our results provide a rich resource for the study of promoter structure and evolution in Metazoa

    Coding over Sets for DNA Storage

    Full text link
    In this paper, we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where data is represented by an unordered set of MM sequences, each of length LL. Errors within that model are losses of whole sequences and point errors inside the sequences, such as substitutions, insertions and deletions. We propose code constructions which can correct these errors with efficient encoders and decoders. By deriving upper bounds on the cardinalities of these codes using sphere packing arguments, we show that many of our codes are close to optimal.Comment: 5 page

    Genetic alterations and cancer formation in a European flatfish at sites of different contamination burdens

    Get PDF
    Fish diseases are an indicator for marine ecosystem health since they provide a biological end-point of historical exposure to stressors. Liver cancer has been used to monitor the effects of exposure to anthropogenic pollution in flatfish for many years. The prevalence of liver cancer can exceed 20%. Despite the high prevalence and the opportunity of using flatfish to study environmentally induced cancer, the genetic and environmental factors driving tumor prevalence across sites are poorly understood. This study aims to define the link between genetic deterioration, liver disease progression, and anthropogenic contaminant exposures in the flatfish dab (Limanda limanda). We assessed genetic changes in a conserved cancer gene, Retinoblastoma (Rb), in association with histological diagnosis of normal, pretumor, and tumor pathologies in the livers of 165 fish from six sites in the North Sea and English Channel. The highest concentrations of metals (especially cadmium) and organic chemicals correlated with the presence of tumor pathology and with defined genetic profiles of the Rb gene, from these sites. Different Rb genetic profiles were found in liver tissue near each tumor phenotype, giving insight into the mechanistic molecular-level cause of the liver pathologies. Different Rb profiles were also found at sampling sites of differing contaminant burdens. Additionally, profiles indicated that histological “normal” fish from Dogger sampling locations possessed Rb profiles associated with pretumor disease. This study highlights an association between Rb and specific contaminants (especially cadmium) in the molecular etiology of dab liver tumorigenesis

    Molecular studies on intraspecific diversity and phylogenetic position of Coniothyrium minitans

    Get PDF
    Simple sequence repeat (SSR)±PCR amplification using a microsatellite primer (GACA)% and ribosomal RNA gene sequencing were used to examine the intraspecific diversity in the mycoparasite Coniothyrium minitans based on 48 strains, representing eight colony types, from 17 countries world-wide. Coniothyrium cerealis, C. fuckelii and C. sporulosum were used for interspecific comparison. The SSR±PCR technique revealed a relatively low level of polymorphism within C. minitans but did allow some differentiation between strains. While there was no relationship between SSR±PCR profiles and colony type, there was some limited correlation between these profiles and country of origin. Sequences of the ITS 1 and ITS 2 regions and the 5±8S gene of rRNA genes were identical in all twenty-four strains of C. minitans examined irrespective of colony type and origin. These results indicate that C. minitans is genetically not very variable despite phenotypic differences. ITS and 5±8S rRNA gene sequence analyses showed that C. minitans had similarities of 94% with C. fuckelii and C. sporulosum (which were identical to each other) and only 64% with C. cerealis. Database searches failed to show any similarity with the ITS 1 sequence for C. minitans although the 5±8S rRNA gene and ITS 2 sequences revealed an 87% similarity with Aporospora terricola. The ITS sequence including the 5±8S rRNA gene sequence of Coniothyrium cerealis showed 91% similarity to Phaeosphaeria microscopica. Phylogenetic analyses using database information suggest that C. minitans, C. sporulosum, C. fuckelii and A. terricola cluster in one clade, grouping with Helminthosporium species and 'Leptosphaeria' bicolor. Coniothyrium cerealis grouped with Ampelomyces quisqualis and formed a major cluster with members of the Phaeosphaeriacae and Phaeosphaeria microscopica
    corecore