43 research outputs found

    The Structure-Function Linkage Database

    Get PDF
    The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies ‘look alike’, making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity

    The International Gene Trap Consortium Website: a portal to all publicly available gene trap cell lines in mouse

    Get PDF
    Gene trapping is a method of generating murine embryonic stem (ES) cell lines containing insertional mutations in known and novel genes. A number of international groups have used this approach to create sizeable public cell line repositories available to the scientific community for the generation of mutant mouse strains. The major gene trapping groups worldwide have recently joined together to centralize access to all publicly available gene trap lines by developing a user-oriented Website for the International Gene Trap Consortium (IGTC). This collaboration provides an impressive public informatics resource comprising ∼45 000 well-characterized ES cell lines which currently represent ∼40% of known mouse genes, all freely available for the creation of knockout mice on a non-collaborative basis. To standardize annotation and provide high confidence data for gene trap lines, a rigorous identification and annotation pipeline has been developed combining genomic localization and transcript alignment of gene trap sequence tags to identify trapped loci. This information is stored in a new bioinformatics database accessible through the IGTC Website interface. The IGTC Website () allows users to browse and search the database for trapped genes, BLAST sequences against gene trap sequence tags, and view trapped genes within biological pathways. In addition, IGTC data have been integrated into major genome browsers and bioinformatics sites to provide users with outside portals for viewing this data. The development of the IGTC Website marks a major advance by providing the research community with the data and tools necessary to effectively use public gene trap resources for the large-scale characterization of mammalian gene function

    Genetic Variation in the Proximal Promoter of ABC and SLC Superfamilies: Liver and Kidney Specific Expression and Promoter Activity Predict Variation

    Get PDF
    Membrane transporters play crucial roles in the cellular uptake and efflux of an array of small molecules including nutrients, environmental toxins, and many clinically used drugs. We hypothesized that common genetic variation in the proximal promoter regions of transporter genes contribute to observed variation in drug response. A total of 579 polymorphisms were identified in the proximal promoters (−250 to +50 bp) and flanking 5′ sequence of 107 transporters in the ATP Binding Cassette (ABC) and Solute Carrier (SLC) superfamilies in 272 DNA samples from ethnically diverse populations. Many transporter promoters contained multiple common polymorphisms. Using a sliding window analysis, we observed that, on average, nucleotide diversity (π) was lowest at approximately 300 bp upstream of the transcription start site, suggesting that this region may harbor important functional elements. The proximal promoters of transporters that were highly expressed in the liver had greater nucleotide diversity than those that were highly expressed in the kidney consistent with greater negative selective pressure on the promoters of kidney transporters. Twenty-one promoters were evaluated for activity using reporter assays. Greater nucleotide diversity was observed in promoters with strong activity compared to promoters with weak activity, suggesting that weak promoters are under more negative selective pressure than promoters with high activity. Collectively, these results suggest that the proximal promoter region of membrane transporters is rich in variation and that variants in these regions may play a role in interindividual variation in drug disposition and response

    Comparison of methods for genomic localization of gene trap sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences) were used to evaluate localization results.</p> <p>Results</p> <p>In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes.</p> <p>Conclusion</p> <p>The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.</p

    Comparison of methods for genomic localization of gene trap sequences.

    Get PDF
    BackgroundGene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences) were used to evaluate localization results.ResultsIn general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes.ConclusionThe differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular

    Enhancing Data Sharing in Collaborative Research Projects with DASH T.E. Ferrin, C.C. Huang, D.M. Greenblatt, D. Stryke, K.M. Giacomini, and J.H. Morris Pacific Symposium on Biocomputing 10:260-271(2005) ENHANCING DATA SHARING IN COLLABORATIVE RESEARCH PR

    No full text
    We describe a software framework, called DASH, that enables the facile access, maintenance, curation and sharing of computational biology data among collaborating research scientists. The DASH event-based framework enables members of team-based research projects to describe the multistep computational processing pipelines frequently required to generate data for sharing, monitors multiple distributed data stores for changes, and will then automatically invoke the appropriate processing pipeline(s). These pipelines can be used to communicate the results of data analyses to collaborators using mechanisms such as Web Services. We describe the overall design of the DASH system and the application of a simple DASH prototype to a collaborative pharmacogenomics research project involving several dozen researchers located at several different sites—the UCSF Pharmacogenetics of Membrane Transporters project. 1
    corecore