370 research outputs found

    Inconsistencies over time in 5% of NetAffx probe-to-gene annotations

    Get PDF
    BACKGROUND: DNA microarray probes are designed to match particular mRNA transcripts, often based on expressed sequences like ESTs, or cDNAs, many times incomplete. As a result, the relations between probes and genes can change as the sequence data are updated. However, it is frequent that the reported results of microarray analyses are given just as lists of genes without any reference to the underlying probes. RESULTS: We show for a particular commercial microarray design that the number of probes associated to some genes change with time. These changes concern approximately 5% of the probe sets across the history of annotation releases over a two year span. CONCLUSION: We recommend to report probe set identifiers when publishing microarray results, and to submit those analyses to microarray public databases to ensure that the interpretation of the data is updated with the latest set of annotations

    K2D2: Estimation of protein secondary structure from circular dichroism spectra

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Circular dichroism spectroscopy is a widely used technique to analyze the secondary structure of proteins in solution. Predictive methods use the circular dichroism spectra from proteins of known tertiary structure to assess the secondary structure contents of a protein with unknown structure given its circular dichroism spectrum.</p> <p>Results</p> <p>We developed K2D2, a method with an associated web server to estimate protein secondary structure from circular dichroism spectra. The method uses a self-organized map of spectra from proteins with known structure to deduce a map of protein secondary structure that is used to do the predictions.</p> <p>Conclusion</p> <p>The K2D2 server is publicly accessible at <url>http://www.ogic.ca/projects/k2d2/</url>. It accepts as input a circular dichroism spectrum and outputs the estimated secondary structure content (alpha-helix and beta-strand) of the corresponding protein, as well as an estimated measure of error.</p

    Update of the G2D tool for prioritization of gene candidates to inherited diseases

    Get PDF
    G2D (genes to diseases) is a web resource for prioritizing genes as candidates for inherited diseases. It uses three algorithms based on different prioritization strategies. The input to the server is the genomic region where the user is looking for the disease-causing mutation, plus an additional piece of information depending on the algorithm used. This information can either be the disease phenotype (described as an online Mendelian inheritance in man (OMIM) identifier), one or several genes known or suspected to be associated with the disease (defined by their Entrez Gene identifiers), or a second genomic region that has been linked as well to the disease. In the latter case, the tool uses known or predicted interactions between genes in the two regions extracted from the STRING database. The output in every case is an ordered list of candidate genes in the region of interest. For the first two of the three methods, the candidate genes are first retrieved through sequence homology search, then scored accordingly to the corresponding method. This means that some of them will correspond to well-known characterized genes, and others will overlap with predicted genes, thus providing a wider analysis. G2D is publicly available at http://www.ogic.ca/projects/g2d_2

    Outer membrane pore protein prediction in mycobacteria using genomic comparison

    Get PDF
    Proteins responsible for outer membrane transport across the unique membrane structure of Mycobacterium spp. are attractive drug targets in the treatment of human diseases caused by the mycobacterial pathogens, M. tuberculosis, M. bovis, M. leprae and M. ulcerans. In contrast to E. coli, relatively few outer membrane proteins (OMPs) have been identified in Mycobacterium spp., largely due to the difficulties in isolating mycobacterial membrane proteins and our incomplete understanding of secretion mechanisms and cell wall structure in these organisms. To further expand our knowledge of these elusive proteins in Mycobacterium, we have improved upon our previous method of OMP prediction in mycobacteria by taking advantage of genomic data from seven mycobacteria species. Our improved algorithm suggests 4333 sequences as putative OMPs in these seven species with varying degrees of confidence. The most virulent pathogenic mycobacterial species are slightly enriched in these selected sequences. We present examples of predicted OMPs involved in horizontal transfer and paralogy expansion. Analysis of local secondary structure content allowed identifying small domains predicted to perform as OMPs; some examples show their involvement in events of tandem duplication and domain rearrangements. We discuss the taxonomic distribution of these discovered families and architectures, often specific to mycobacteria or the wider taxonomic class of Actinobacteria. Our results suggest that OMP functionality in mycobacteria is richer than expected and provide a resource to guide future research of these understudied proteins

    Linking genes to diseases: it's all in the data

    Get PDF
    Genome-wide association analyses on large patient cohorts are generating large sets of candidate disease genes. This is coupled with the availability of ever-increasing genomic databases and a rapidly expanding repository of biomedical literature. Computational approaches to disease-gene association attempt to harness these data sources to identify the most likely disease gene candidates for further empirical analysis by translational researchers, resulting in efficient identification of genes of diagnostic, prognostic and therapeutic value. Existing computational methods analyze gene structure and sequence, functional annotation of candidate genes, characteristics of known disease genes, gene regulatory networks, protein-protein interactions, data from animal models and disease phenotype. To date, a few studies have successfully applied computational analysis of clinical phenotype data for specific diseases and shown genetic associations. In the near future, computational strategies will be facilitated by improved integration of clinical and computational research, and by increased availability of clinical phenotype data in a format accessible to computational approaches

    Information extraction from full text scientific articles: Where are the keywords?

    Get PDF
    BACKGROUND: To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant. RESULTS: In this work we addressed those questions showing that the keyword content of the different sections of a standard scientific article (abstract, introduction, methods, results, and discussion) is very heterogeneous. CONCLUSIONS: Although the abstract contains the best ratio of keywords per total of words, other sections of the article may be a better source of biologically relevant data

    Amplification of the Gene Ontology annotation of Affymetrix probe sets

    Get PDF
    BACKGROUND: The annotations of Affymetrix DNA microarray probe sets with Gene Ontology terms are carefully selected for correctness. This results in very accurate but incomplete annotations which is not always desirable for microarray experiment evaluation. RESULTS: Here we present a protocol to amplify the set of Gene Ontology annotations associated to Affymetrix DNA microarray probe sets using information from related databases. CONCLUSION: Predicted novel annotations and the evidence producing them can be accessed at Probe2GO: . Scripts are available on demand

    Oct4 Targets Regulatory Nodes to Modulate Stem Cell Function

    Get PDF
    Stem cells are characterized by two defining features, the ability to self-renew and to differentiate into highly specialized cell types. The POU homeodomain transcription factor Oct4 (Pou5f1) is an essential mediator of the embryonic stem cell state and has been implicated in lineage specific differentiation, adult stem cell identity, and cancer. Recent description of the regulatory networks which maintain ‘ES’ have highlighted a dual role for Oct4 in the transcriptional activation of genes required to maintain self-renewal and pluripotency while concomitantly repressing genes which facilitate lineage specific differentiation. However, the molecular mechanism by which Oct4 mediates differential activation or repression at these loci to either maintain stem cell identity or facilitate the emergence of alternate transcriptional programs required for the realization of lineage remains to be elucidated. To further investigate Oct4 function, we employed gene expression profiling together with a robust statistical analysis to identify genes highly correlated to Oct4. Gene Ontology analysis to categorize overrepresented genes has led to the identification of themes which may prove essential to stem cell identity, including chromatin structure, nuclear architecture, cell cycle control, DNA repair, and apoptosis. Our experiments have identified previously unappreciated roles for Oct4 for firstly, regulating chromatin structure in a state consistent with self-renewal and pluripotency, and secondly, facilitating the expression of genes that keeps the cell poised to respond to cues that lead to differentiation. Together, these data define the mechanism by which Oct4 orchestrates cellular regulatory pathways to enforce the stem cell state and provides important insight into stem cell function and cancer

    A method for cell type marker discovery by high-throughput gene expression analysis of mixed cell populations

    Get PDF
    BACKGROUND: Gene transcripts specifically expressed in a particular cell type (cell-type specific gene markers) are useful for its detection and isolation from a tissue or other cell mixtures. However, finding informative marker genes can be problematic when working with a poorly characterized cell type, as markers can only be unequivocally determined once the cell type has been isolated. We propose a method that could identify marker genes of an uncharacterized cell type within a mixed cell population, provided that the proportion of the cell type of interest in the mixture can be estimated by some indirect method, such as a functional assay. RESULTS: We show that cell-type specific gene markers can be identified from the global gene expression of several cell mixtures that contain the cell type of interest in a known proportion by their high correlation to the concentration of the corresponding cell type across the mixtures. CONCLUSIONS: Genes detected using this high-throughput strategy would be candidate markers that may be useful in detecting or purifying a cell type from a particular biological context. We present an experimental proof-of-concept of this method using cell mixtures of various well-characterized hematopoietic cell types, and we evaluate the performance of the method in a benchmark that explores the requirements and range of validity of the approach

    Identifying gene-disease associations using centrality on a literature mined gene-interaction network

    Get PDF
    Motivation: Understanding the role of genetics in diseases is one of the most important aims of the biological sciences. The completion of the Human Genome Project has led to a rapid increase in the number of publications in this area. However, the coverage of curated databases that provide information manually extracted from the literature is limited. Another challenge is that determining disease-related genes requires laborious experiments. Therefore, predicting good candidate genes before experimental analysis will save time and effort. We introduce an automatic approach based on text mining and network analysis to predict gene-disease associations. We collected an initial set of known disease-related genes and built an interaction network by automatic literature mining based on dependency parsing and support vector machines. Our hypothesis is that the central genes in this disease-specific network are likely to be related to the disease. We used the degree, eigenvector, betweenness and closeness centrality metrics to rank the genes in the network
    corecore