186 research outputs found

    An Ontology-Based Framework for Clinical Research Databases

    Get PDF
    The Ontology-Based eXtensible data model (OBX) was developed to serve as a framework for the development of a clinical research database in the Immunology Database and Analysis Portal (ImmPort) system. OBX was designed around the logical structure provided by the Basic Formal Ontology (BFO) and the Ontology for Biomedical Investigations (OBI). By using the logical structure provided by these two well-formulated ontologies, we have found that a relatively simple, extensible data model could be developed to represent the relatively complex domain of clinical research. In addition, the common framework provided by the BFO should make it straightforward to utilize OBX database data dictionaries based on reference and application ontologies from the OBO Foundry

    A distribution-free convolution model for background correction of oligonucleotide microarray data

    Get PDF
    IntroductionAffymetrix GeneChip® high-density oligonucleotide arrays are widely used in biological and medical research because of production reproducibility, which facilitates the comparison of results between experiment runs. In order to obtain high-level classification and cluster analysis that can be trusted, it is important to perform various pre-processing steps on the probe-level data to control for variability in sample processing and array hybridization. Many proposed preprocessing methods are parametric, in that they assume that the background noise generated by microarray data is a random sample from a statistical distribution, typically a normal distribution. The quality of the final results depends on the validity of such assumptions. ResultsWe propose a Distribution Free Convolution Model (DFCM) to circumvent observed deficiencies in meeting and validating distribution assumptions of parametric methods. Knowledge of array structure and the biological function of the probes indicate that the intensities of mismatched (MM) probes that correspond to the smallest perfect match (PM) intensities can be used to estimate the background noise. Specifically, we obtain the smallest q2 percent of the MM intensities that are associated with the lowest q1 percent PM intensities, and use these intensities to estimate background. ConclusionUsing the Affymetrix Latin Square spike-in experiments, we show that the background noise generated by microarray experiments typically is not well modeled by a single overall normal distribution. We further show that the signal is not exponentially distributed, as is also commonly assumed. Therefore, DFCM has better sensitivity and specificity, as measured by ROC curves and area under the curve (AUC) than MAS 5.0, RMA, RMA with no background correction (RMA-noBG), GCRMA, PLIER, and dChip (MBEI) for preprocessing of Affymetrix microarray data. These results hold for two spike-in data sets and one real data set that were analyzed. Comparisons with other methods on two spike-in data sets and one real data set show that our nonparametric methods are a superior alternative for background correction of Affymetrix data

    Activation of the Syk tyrosine kinase is insufficient for downstream signal transduction in B lymphocytes

    Get PDF
    BACKGROUND: Immature B lymphocytes and certain B cell lymphomas undergo apoptotic cell death following activation of the B cell antigen receptor (BCR) signal transduction pathway. Several biochemical changes occur in response to BCR engagement, including activation of the Syk tyrosine kinase. Although Syk activation appears to be necessary for some downstream biochemical and cellular responses, the signaling events that precede Syk activation remain ill defined. In addition, the requirements for complete activation of the Syk-dependent signaling step remain to be elucidated. RESULTS: A mutant form of Syk carrying a combination of a K395A substitution in the kinase domain and substitutions of three phenylalanines (3F) for the three C-terminal tyrosines was expressed in a murine B cell lymphoma cell line, BCL(1).3B3 to interfere with normal Syk regulation as a means to examine the Syk activation step in BCR signaling. Introduction of this kinase-inactive mutant led to the constitutive activation of the endogenous wildtype Syk enzyme in the absence of receptor engagement through a 'dominant-positive' effect. Under these conditions, Syk kinase activation occurred in the absence of phosphorylation on Syk tyrosine residues. Although Syk appears to be required for BCR-induced apoptosis in several systems, no increase in spontaneous cell death was observed in these cells. Surprisingly, although the endogenous Syk kinase was enzymatically active, no enhancement in the phosphorylation of cytoplasmic proteins, including phospholipase Cγ2 (PLCγ2), a direct Syk target, was observed. CONCLUSION: These data indicate that activation of Syk kinase enzymatic activity is insufficient for Syk-dependent signal transduction. This observation suggests that other events are required for efficient signaling. We speculate that localization of the active enzyme to a receptor complex specifically assembled for signal transduction may be the missing event

    A gene selection method for GeneChip array data with small sample sizes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In microarray experiments with small sample sizes, it is a challenge to estimate p-values accurately and decide cutoff p-values for gene selection appropriately. Although permutation-based methods have proved to have greater sensitivity and specificity than the regular t-test, their p-values are highly discrete due to the limited number of permutations available in very small sample sizes. Furthermore, estimated permutation-based p-values for true nulls are highly correlated and not uniformly distributed between zero and one, making it difficult to use current false discovery rate (FDR)-controlling methods.</p> <p>Results</p> <p>We propose a model-based information sharing method (MBIS) that, after an appropriate data transformation, utilizes information shared among genes. We use a normal distribution to model the mean differences of true nulls across two experimental conditions. The parameters of the model are then estimated using all data in hand. Based on this model, p-values, which are uniformly distributed from true nulls, are calculated. Then, since FDR-controlling methods are generally not well suited to microarray data with very small sample sizes, we select genes for a given cutoff p-value and then estimate the false discovery rate.</p> <p>Conclusion</p> <p>Simulation studies and analysis using real microarray data show that the proposed method, MBIS, is more powerful and reliable than current methods. It has wide application to a variety of situations.</p

    Application of Random Matrix Theory to Biological Networks

    Full text link
    We show that spectral fluctuation of interaction matrices of yeast a core protein interaction network and a metabolic network follows the description of the Gaussian orthogonal ensemble (GOE) of random matrix theory (RMT). Furthermore, we demonstrate that while the global biological networks evaluated belong to GOE, removal of interactions between constituents transitions the networks to systems of isolated modules described by the Poisson statistics of RMT. Our results indicate that although biological networks are very different from other complex systems at the molecular level, they display the same statistical properties at large scale. The transition point provides a new objective approach for the identification of functional modules.Comment: 3 pages, 2 figure

    Automated Analysis of Flow Cytometry Data to Reduce Inter-Lab Variation in the Detection of Major Histocompatibility Complex Multimer-Binding T Cells

    Get PDF
    Manual analysis of flow cytometry data and subjective gate-border decisions taken by individuals continue to be a source of variation in the assessment of antigen-specific T cells when comparing data across laboratories, and also over time in individual labs. Therefore, strategies to provide automated analysis of major histocompatibility complex (MHC) multimer-binding T cells represent an attractive solution to decrease subjectivity and technical variation. The challenge of using an automated analysis approach is that MHC multimer-binding T cell populations are often rare and therefore difficult to detect. We used a highly heterogeneous dataset from a recent MHC multimer proficiency panel to assess if MHC multimer-binding CD8+ T cells could be analyzed with computational solutions currently available, and if such analyses would reduce the technical variation across different laboratories. We used three different methods, FLOw Clustering without K (FLOCK), Scalable Weighted Iterative Flow-clustering Technique (SWIFT), and ReFlow to analyze flow cytometry data files from 28 laboratories. Each laboratory screened for antigen-responsive T cell populations with frequency ranging from 0.01 to 1.5% of lymphocytes within samples from two donors. Experience from this analysis shows that all three programs can be used for the identification of high to intermediate frequency of MHC multimer-binding T cell populations, with results very similar to that of manual gating. For the less frequent populations (&lt;0.1% of live, single lymphocytes), SWIFT outperformed the other tools. As used in this study, none of the algorithms offered a completely automated pipeline for identification of MHC multimer populations, as varying degrees of human interventions were needed to complete the analysis. In this study, we demonstrate the feasibility of using automated analysis pipelines for assessing and identifying even rare populations of antigen-responsive T cells and discuss the main properties, differences, and advantages of the different methods tested

    Influenza research database: an integrated bioinformatics resource for influenza research and surveillance.

    Get PDF
    BackgroundThe recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics.DesignThe Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly-accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user-friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in-protected 'workbench' spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature.ResultsTo demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross-protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks.ConclusionsThe IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics

    An improved ontological representation of dendritic cells as a paradigm for all cell types

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent increases in the volume and diversity of life science data and information and an increasing emphasis on data sharing and interoperability have resulted in the creation of a large number of biological ontologies, including the Cell Ontology (CL), designed to provide a standardized representation of cell types for data annotation. Ontologies have been shown to have significant benefits for computational analyses of large data sets and for automated reasoning applications, leading to organized attempts to improve the structure and formal rigor of ontologies to better support computation. Currently, the CL employs multiple <it>is_a </it>relations, defining cell types in terms of histological, functional, and lineage properties, and the majority of definitions are written with sufficient generality to hold across multiple species. This approach limits the CL's utility for computation and for cross-species data integration.</p> <p>Results</p> <p>To enhance the CL's utility for computational analyses, we developed a method for the ontological representation of cells and applied this method to develop a dendritic cell ontology (DC-CL). DC-CL subtypes are delineated on the basis of surface protein expression, systematically including both species-general and species-specific types and optimizing DC-CL for the analysis of flow cytometry data. We avoid multiple uses of <it>is_a </it>by linking DC-CL terms to terms in other ontologies via additional, formally defined relations such as <it>has_function</it>.</p> <p>Conclusion</p> <p>This approach brings benefits in the form of increased accuracy, support for reasoning, and interoperability with other ontology resources. Accordingly, we propose our method as a general strategy for the ontological representation of cells. DC-CL is available from <url>http://www.obofoundry.org</url>.</p

    VO: Vaccine Ontology

    Get PDF
    Vaccine research, as well as the development, testing, clinical trials, and commercial uses of vaccines involve complex processes with various biological data that include gene and protein expression, analysis of molecular and cellular interactions, study of tissue and whole body responses, and extensive epidemiological modeling. Although many data resources are available to meet different aspects of vaccine needs, it remains a challenge how we are to standardize vaccine annotation, integrate data about varied vaccine types and resources, and support advanced vaccine data analysis and inference. To address these problems, the community-based Vaccine Ontology (VO) has been developed through collaboration with vaccine researchers and many national and international centers and programs, including the National Center for Biomedical Ontology (NCBO), the Infectious Disease Ontology (IDO) Initiative, and the Ontology for Biomedical Investigations (OBI). VO utilizes the Basic Formal Ontology (BFO) as the top ontology and the Relation Ontology (RO) for definition of term relationships. VO is represented in the Web Ontology Language (OWL) and edited using the Protégé-OWL. Currently VO contains more than 2000 terms and relationships. VO emphasizes on classification of vaccines and vaccine components, vaccine quality and phenotypes, and host immune response to vaccines. These reflect different aspects of vaccine composition and biology and can thus be used to model individual vaccines. More than 200 licensed vaccines and many vaccine candidates in research or clinical trials have been modeled in VO. VO is being used for vaccine literature mining through collaboration with the National Center for Integrative Biomedical Informatics (NCIBI). Multiple VO applications will be presented

    FuGEFlow: data model and markup language for flow cytometry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Flow cytometry technology is widely used in both health care and research. The rapid expansion of flow cytometry applications has outpaced the development of data storage and analysis tools. Collaborative efforts being taken to eliminate this gap include building common vocabularies and ontologies, designing generic data models, and defining data exchange formats. The Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) standard was recently adopted by the International Society for Advancement of Cytometry. This standard guides researchers on the information that should be included in peer reviewed publications, but it is insufficient for data exchange and integration between computational systems. The Functional Genomics Experiment (FuGE) formalizes common aspects of comprehensive and high throughput experiments across different biological technologies. We have extended FuGE object model to accommodate flow cytometry data and metadata.</p> <p>Methods</p> <p>We used the MagicDraw modelling tool to design a UML model (Flow-OM) according to the FuGE extension guidelines and the AndroMDA toolkit to transform the model to a markup language (Flow-ML). We mapped each MIFlowCyt term to either an existing FuGE class or to a new FuGEFlow class. The development environment was validated by comparing the official FuGE XSD to the schema we generated from the FuGE object model using our configuration. After the Flow-OM model was completed, the final version of the Flow-ML was generated and validated against an example MIFlowCyt compliant experiment description.</p> <p>Results</p> <p>The extension of FuGE for flow cytometry has resulted in a generic FuGE-compliant data model (FuGEFlow), which accommodates and links together all information required by MIFlowCyt. The FuGEFlow model can be used to build software and databases using FuGE software toolkits to facilitate automated exchange and manipulation of potentially large flow cytometry experimental data sets. Additional project documentation, including reusable design patterns and a guide for setting up a development environment, was contributed back to the FuGE project.</p> <p>Conclusion</p> <p>We have shown that an extension of FuGE can be used to transform minimum information requirements in natural language to markup language in XML. Extending FuGE required significant effort, but in our experiences the benefits outweighed the costs. The FuGEFlow is expected to play a central role in describing flow cytometry experiments and ultimately facilitating data exchange including public flow cytometry repositories currently under development.</p
    corecore