103 research outputs found

    PhyloMap: an algorithm for visualizing relationships of large sequence data sets and its application to the influenza A virus genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Results of phylogenetic analysis are often visualized as phylogenetic trees. Such a tree can typically only include up to a few hundred sequences. When more than a few thousand sequences are to be included, analyzing the phylogenetic relationships among them becomes a challenging task. The recent frequent outbreaks of influenza A viruses have resulted in the rapid accumulation of corresponding genome sequences. Currently, there are more than 7500 influenza A virus genomes in the database. There are no efficient ways of representing this huge data set as a whole, thus preventing a further understanding of the diversity of the influenza A virus genome.</p> <p>Results</p> <p>Here we present a new algorithm, "PhyloMap", which combines ordination, vector quantization, and phylogenetic tree construction to give an elegant representation of a large sequence data set. The use of PhyloMap on influenza A virus genome sequences reveals the phylogenetic relationships of the internal genes that cannot be seen when only a subset of sequences are analyzed.</p> <p>Conclusions</p> <p>The application of PhyloMap to influenza A virus genome data shows that it is a robust algorithm for analyzing large sequence data sets. It utilizes the entire data set, minimizes bias, and provides intuitive visualization. PhyloMap is implemented in JAVA, and the source code is freely available at <url>http://www.biochem.uni-luebeck.de/public/software/phylomap.html</url></p

    A novel scoring schema for peptide identification by searching protein sequence databases using tandem mass spectrometry data

    Get PDF
    BACKGROUND: Tandem mass spectrometry (MS/MS) is a powerful tool for protein identification. Although great efforts have been made in scoring the correlation between tandem mass spectra and an amino acid sequence database, improvements could be made in three aspects, including characterization ofpeaks in spectra, adoption of effective scoring functions and access to thereliability of matching between peptides and spectra. RESULTS: A novel scoring function is presented, along with criteria to estimate the performance confidence of the function. Through learning the typesof product ions and the probability of generating them, a hypothetic spectrum was generated for each candidate peptide. Then relative entropy was introduced to measure the similarity between the hypothetic and the observed spectra. Based on the extreme value distribution (EVD) theory, a threshold was chosen to distinguish a true peptide assignment from a random one. Tests on a public MS/MS dataset demonstrated that this method performs better than the well-known SEQUEST. CONCLUSION: A reliable identification of proteins from the spectra promises a more efficient application of tandem mass spectrometry to proteomes with high complexity

    Investigating citrullinated proteins in tumour cell lines

    Full text link

    MyBASE: a database for genome polymorphism and gene function studies of Mycobacterium

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mycobacterial pathogens are a major threat to humans. With the increasing availability of functional genomic data, research on mycobacterial pathogenesis and subsequent control strategies will be greatly accelerated. It has been suggested that genome polymorphisms, namely large sequence polymorphisms, can influence the pathogenicity of different mycobacterial strains. However, there is currently no database dedicated to mycobacterial genome polymorphisms with functional interpretations.</p> <p>Description</p> <p>We have developed a <b>my</b>cobacterial data<b>base </b>(MyBASE) housing genome polymorphism data and gene functions to provide the mycobacterial research community with a useful information resource and analysis platform. Whole genome comparison data produced by our lab and the novel genome polymorphisms identified were deposited into MyBASE. Extensive literature review of genome polymorphism data, mainly large sequence polymorphisms (LSPs), operon predictions and curated annotations of virulence and essentiality of mycobacterial genes are unique features of MyBASE. Large-scale genomic data integration from public resources makes MyBASE a comprehensive data warehouse useful for current research. All data is cross-linked and can be graphically viewed via a toolbox in MyBASE.</p> <p>Conclusion</p> <p>As an integrated platform focused on the collection of experimental data from our own lab and published literature, MyBASE will facilitate analysis of genome structure and polymorphisms, which will provide insight into genome evolution. Importantly, the database will also facilitate the comparison of virulence factors among various mycobacterial strains. MyBASE is freely accessible via <url>http://mybase.psych.ac.cn</url>.</p

    Exploring the metabolic network of the epidemic pathogen Burkholderia cenocepacia J2315 via genome-scale reconstruction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Burkholderia cenocepacia </it>is a threatening nosocomial epidemic pathogen in patients with cystic fibrosis (CF) or a compromised immune system. Its high level of antibiotic resistance is an increasing concern in treatments against its infection. Strain <it>B. cenocepacia </it>J2315 is the most infectious isolate from CF patients. There is a strong demand to reconstruct a genome-scale metabolic network of <it>B. cenocepacia </it>J2315 to systematically analyze its metabolic capabilities and its virulence traits, and to search for potential clinical therapy targets.</p> <p>Results</p> <p>We reconstructed the genome-scale metabolic network of <it>B. cenocepacia </it>J2315. An iterative reconstruction process led to the establishment of a robust model, <it>i</it>KF1028, which accounts for 1,028 genes, 859 internal reactions, and 834 metabolites. The model <it>i</it>KF1028 captures important metabolic capabilities of <it>B. cenocepacia </it>J2315 with a particular focus on the biosyntheses of key metabolic virulence factors to assist in understanding the mechanism of disease infection and identifying potential drug targets. The model was tested through BIOLOG assays. Based on the model, the genome annotation of <it>B. cenocepacia </it>J2315 was refined and 24 genes were properly re-annotated. Gene and enzyme essentiality were analyzed to provide further insights into the genome function and architecture. A total of 45 essential enzymes were identified as potential therapeutic targets.</p> <p>Conclusions</p> <p>As the first genome-scale metabolic network of <it>B. cenocepacia </it>J2315, <it>i</it>KF1028 allows a systematic study of the metabolic properties of <it>B. cenocepacia </it>and its key metabolic virulence factors affecting the CF community. The model can be used as a discovery tool to design novel drugs against diseases caused by this notorious pathogen.</p

    Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research

    Get PDF
    Frequent outbreaks of highly pathogenic avian influenza and the increasing data available for comparative analysis require a central database specialized in influenza viruses (IVs). We have established the Influenza Virus Database (IVDB) to integrate information and create an analysis platform for genetic, genomic, and phylogenetic studies of the virus. IVDB hosts complete genome sequences of influenza A virus generated by Beijing Institute of Genomics (BIG) and curates all other published IV sequences after expert annotation. Our Q-Filter system classifies and ranks all nucleotide sequences into seven categories according to sequence content and integrity. IVDB provides a series of tools and viewers for comparative analysis of the viral genomes, genes, genetic polymorphisms and phylogenetic relationships. A search system has been developed for users to retrieve a combination of different data types by setting search options. To facilitate analysis of global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) has been developed to display the worldwide geographic distribution of chosen viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to pre-computed alignments and polymorphisms of IV genes and proteins, and presents the results as SNP distribution plots and minor allele distributions. IVDB is publicly available a

    Causal associations of sleep traits with cancer incidence and mortality

    Get PDF
    To explore the correlation and causality between multidimensional sleep traits and pan-cancer incidence and mortality among patients with cancer. The multivariable Cox regression, linear and nonlinear Mendelian randomization (MR), and survival curve analyses were conducted to assess the impacts of chronotype, sleep duration, and insomnia symptoms on pan-cancer risk (N = 326,417 from United Kingdom Biobank) and mortality (N = 23,956 from United Kingdom Biobank). In the Cox regression, we observed a linear and J-shaped association of sleep duration with pan-cancer incidence and mortality among cancer patients respectively. In addition, there was a positive association of insomnia with pan-cancer incidence (HR, 1.03, 95% CI: 1.00–1.06, p = 0.035), all-cause mortality (HR, 1.17, 95% CI: 1.06–1.30, p = 0.002) and cancer mortality among cancer patients (HR, 1.25, 95% CI: 1.11–1.41, p &lt; 0.001). In the linear MR, there was supporting evidence of positive associations between long sleep duration and pan-cancer incidence (OR, 1.41, 95% CI: 1.08–1.84, p = 0.012), and there was a positive association between long sleep duration and all-cause mortality in cancer patients (OR, 5.56, 95% CI: 3.15–9.82, p = 3.42E-09). Meanwhile, a strong association between insomnia and all-cause mortality in cancer patients (OR, 1.41, 95% CI: 1.27–1.56, p = 4.96E-11) was observed in the linear MR. These results suggest that long sleep duration and insomnia play important roles in pan-cancer risk and mortality among cancer patients. In addition to short sleep duration and insomnia, our findings highlight the effect of long sleep duration in cancer prevention and prognosis
    corecore