118 research outputs found

    PINT: Pathways INtegration Tool

    Get PDF
    New pathway databases generally display pathways by retrieving information from a database dynamically. Some of them even provide their pathways in SBML or other exchangeable formats. Integrating these models is a challenging work, because these models were not built in the same way. Pathways integration Tool (PINT) may integrate the standard SBML files. Since these files may be obtained from different sources, any inconsistency in component names can be revised by using an annotation editor upon uploading a pathway model. This integration function greatly simplifies the building of a complex model from small models. To get new users started, about 190 curated public models of human pathways were collected by PINT. Relevant models can be selected and sent to the workbench by using a user-friendly query interface, which also accepts a gene list derived from high-throughput experiments. The models on the workbench, from either a public or a private source, can be integrated and painted. The painting function is useful for highlighting important genes or even their expression level on a merged pathway diagram, so that the biological significance can be revealed. This tool is freely available at http://csb2.ym.edu.tw/pint/

    Epstein-Barr virus transcription factor Zta acts through distal regulatory elements to directly control cellular gene expression

    Get PDF
    Lytic replication of the human gamma herpes virus Epstein-Barr virus (EBV) is an essential prerequisite for the spread of the virus. Differential regulation of a limited number of cellular genes has been reported in B-cells during the viral lytic replication cycle. We asked whether a viral bZIP transcription factor, Zta (BZLF1, ZEBRA, EB1), drives some of these changes. Using genome-wide chromatin immunoprecipitation coupled to next-generation DNA sequencing (ChIP-seq) we established a map of Zta interactions across the human genome. Using sensitive transcriptome analyses we identified 2263 cellular genes whose expression is significantly changed during the EBV lytic replication cycle. Zta binds 278 of the regulated genes and the distribution of binding sites shows that Zta binds mostly to sites that are distal to transcription start sites. This differs from the prevailing view that Zta activates viral genes by binding exclusively at promoter elements. We show that a synthetic Zta binding element confers Zta regulation at a distance and that distal Zta binding sites from cellular genes can confer Zta-mediated regulation on a heterologous promoter. This leads us to propose that Zta directly reprograms the expression of cellular genes through distal elements

    Association of Accelerometry-Measured Physical Activity and Cardiovascular Events in Mobility-Limited Older Adults: The LIFE (Lifestyle Interventions and Independence for Elders) Study.

    Get PDF
    BACKGROUND:Data are sparse regarding the value of physical activity (PA) surveillance among older adults-particularly among those with mobility limitations. The objective of this study was to examine longitudinal associations between objectively measured daily PA and the incidence of cardiovascular events among older adults in the LIFE (Lifestyle Interventions and Independence for Elders) study. METHODS AND RESULTS:Cardiovascular events were adjudicated based on medical records review, and cardiovascular risk factors were controlled for in the analysis. Home-based activity data were collected by hip-worn accelerometers at baseline and at 6, 12, and 24 months postrandomization to either a physical activity or health education intervention. LIFE study participants (n=1590; age 78.9±5.2 [SD] years; 67.2% women) at baseline had an 11% lower incidence of experiencing a subsequent cardiovascular event per 500 steps taken per day based on activity data (hazard ratio, 0.89; 95% confidence interval, 0.84-0.96; P=0.001). At baseline, every 30 minutes spent performing activities ≥500 counts per minute (hazard ratio, 0.75; confidence interval, 0.65-0.89 [P=0.001]) were also associated with a lower incidence of cardiovascular events. Throughout follow-up (6, 12, and 24 months), both the number of steps per day (per 500 steps; hazard ratio, 0.90, confidence interval, 0.85-0.96 [P=0.001]) and duration of activity ≥500 counts per minute (per 30 minutes; hazard ratio, 0.76; confidence interval, 0.63-0.90 [P=0.002]) were significantly associated with lower cardiovascular event rates. CONCLUSIONS:Objective measurements of physical activity via accelerometry were associated with cardiovascular events among older adults with limited mobility (summary score >10 on the Short Physical Performance Battery) both using baseline and longitudinal data. CLINICAL TRIAL REGISTRATION:URL: http://www.clinicaltrials.gov. Unique identifier: NCT01072500

    The Biomolecular Interaction Network Database in PSI-MI 2.5

    Get PDF
    The Biomolecular Interaction Network Database (BIND) is a major source of curated biomolecular interactions, which has been unmaintained for the last few years, a trend which will eventually result in the loss of a significant amount of unique biomolecular interaction information, mostly as database identifiers become out of date. To help reverse this trend, we converted BIND to a standard format, Proteomics Standard Initiative-Molecular Interaction 2.5, starting from the last curated data release (from 2005) available in a custom XML format and made the core components (interactions and complexes) plus additional valuable curated information available for download (http://download.baderlab.org/BINDTranslation/). Major work during the conversion process was required to update out of date molecule identifiers resulting in a more comprehensive conversion of BIND, by measures including number of species and interactor types covered, than what is currently accessible elsewhere. This work also highlights issues of data modeling, controlled vocabulary adoption and data cleaning that can serve as a general case study on the future compatibility of interaction databases

    Background frequencies for residue variability estimates: BLOSUM revisited

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Shannon entropy applied to columns of multiple sequence alignments as a score of residue conservation has proven one of the most fruitful ideas in bioinformatics. This straightforward and intuitively appealing measure clearly shows the regions of a protein under increased evolutionary pressure, highlighting their functional importance. The inability of the column entropy to differentiate between residue types, however, limits its resolution power.</p> <p>Results</p> <p>In this work we suggest generalizing Shannon's expression to a function with similar mathematical properties, that, at the same time, includes observed propensities of residue types to mutate to each other. To do that, we revisit the original construction of BLOSUM matrices, and re-interpret them as mutation probability matrices. These probabilities are then used as background frequencies in the revised residue conservation measure.</p> <p>Conclusion</p> <p>We show that joint entropy with BLOSUM-proportional probabilities as a reference distribution enables detection of protein functional sites comparable in quality to a time-costly maximum-likelihood evolution simulation method (rate4site), and offers greater resolution than the Shannon entropy alone, in particular in the cases when the available sequences are of narrow evolutionary scope.</p

    Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations

    Get PDF
    BACKGROUND: Microarrays measure the binding of nucleotide sequences to a set of sequence specific probes. This information is combined with annotation specifying the relationship between probes and targets and used to make inferences about transcript- and, ultimately, gene expression. In some situations, a probe is capable of hybridizing to more than one transcript, in others, multiple probes can target a single sequence. These 'multiply targeted' probes can result in non-independence between measured expression levels. RESULTS: An analysis of these relationships for Affymetrix arrays considered both the extent and influence of exact matches between probe and transcript sequences. For the popular HGU133A array, approximately half of the probesets were found to interact in this way. Both real and simulated expression datasets were used to examine how these effects influenced the expression signal. It was found not only to lead to increased signal strength for the affected probesets, but the major effect is to significantly increase their correlation, even in situations when only a single probe from a probeset was involved. By building a network of probe-probeset-transcript relationships, it is possible to identify families of interacting probesets. More than 10% of the families contain members annotated to different genes or even different Unigene clusters. Within a family, a mixture of genuine biological and artefactual correlations can occur. CONCLUSION: Multiple targeting is not only prevalent, but also significant. The ability of probesets to hybridize to more than one gene product can lead to false positives when analysing gene expression. Comprehensive annotation describing multiple targeting is required when interpreting array data

    Identification of β-catenin binding regions in colon cancer cells using ChIP-Seq

    Get PDF
    Deregulation of the Wnt/β-catenin signaling pathway is a hallmark of colon cancer. Mutations in the adenomatous polyposis coli (APC) gene occur in the vast majority of colorectal cancers and are an initiating event in cellular transformation. Cells harboring mutant APC contain elevated levels of the β-catenin transcription coactivator in the nucleus which leads to abnormal expression of genes controlled by β-catenin/T-cell factor 4 (TCF4) complexes. Here, we use chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-Seq) to identify β-catenin binding regions in HCT116 human colon cancer cells. We localized 2168 β-catenin enriched regions using a concordance approach for integrating the output from multiple peak alignment algorithms. Motif discovery algorithms found a core TCF4 motif (T/A–T/A–C–A–A–A–G), an extended TCF4 motif (A/T/G–C/G–T/A–T/A–C–A–A–A–G) and an AP-1 motif (T–G–A–C/T–T–C–A) to be significantly represented in β-catenin enriched regions. Furthermore, 417 regions contained both TCF4 and AP-1 motifs. Genes associated with TCF4 and AP-1 motifs bound β-catenin, TCF4 and c-Jun in vivo and were activated by Wnt signaling and serum growth factors. Our work provides evidence that Wnt/β-catenin and mitogen signaling pathways intersect directly to regulate a defined set of target genes

    ‘Genome design’ model and multicellular complexity: golden middle

    Get PDF
    Human tissue-specific genes were reported to be longer than housekeeping genes (both in coding and intronic parts). The competing neutralist and adaptationist models were proposed to explain this observation. Here I show that in human genome the longest are genes with the intermediate expression pattern. From the standpoint of information theory, the regulation of such genes should be most complex. In the genomewide context, they are found here to have the higher informational load on all available levels: from participation in protein interaction networks, pathways and modules reflected in Gene Ontology categories through transcription factor regulatory sets and protein functional domains to amino acid tuples (words) in encoded proteins and nucleotide tuples in introns and promoter regions. Thus, the intermediately expressed genes have the higher functional and regulatory complexity that is reflected in their greater length (which is consistent with the ‘genome design’ model). The dichotomy of housekeeping versus tissue-specific entities is more pronounced on the modular level than on the molecular level. There are much lesser intermediate-specific modules (modules overrepresented in the intermediately expressed genes) than housekeeping or tissue-specific modules (normalized to gene number). The dichotomy of housekeeping versus tissue-specific genes and modules in multicellular organisms is probably caused by the burden of regulatory complexity acted on the intermediately expressed genes

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    Comprehensive splice-site analysis using comparative genomics

    Get PDF
    We have collected over half a million splice sites from five species—Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana—and classified them into four subtypes: U2-type GT–AG and GC–AG and U12-type GT–AG and AT–AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT–AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3′ splice sites (3′ss) and (iv) distinct evolutionary histories of 5′ and 3′ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing
    corecore