249 research outputs found
Automated Recognition of Brain Region Mentions in Neuroscience Literature
The ability to computationally extract mentions of neuroanatomical regions from the literature would assist linking to other entities within and outside of an article. Examples include extracting reports of connectivity or region-specific gene expression. To facilitate text mining of neuroscience literature we have created a corpus of manually annotated brain region mentions. The corpus contains 1,377 abstracts with 18,242 brain region annotations. Interannotator agreement was evaluated for a subset of the documents, and was 90.7% and 96.7% for strict and lenient matching respectively. We observed a large vocabulary of over 6,000 unique brain region terms and 17,000 words. For automatic extraction of brain region mentions we evaluated simple dictionary methods and complex natural language processing techniques. The dictionary methods based on neuroanatomical lexicons recalled 36% of the mentions with 57% precision. The best performance was achieved using a conditional random field (CRF) with a rich feature set. Features were based on morphological, lexical, syntactic and contextual information. The CRF recalled 76% of mentions at 81% precision, by counting partial matches recall and precision increase to 86% and 92% respectively. We suspect a large amount of error is due to coordinating conjunctions, previously unseen words and brain regions of less commonly studied organisms. We found context windows, lemmatization and abbreviation expansion to be the most informative techniques. The corpus is freely available at http://www.chibi.ubc.ca/WhiteText/
Meta-Analysis of Kindling-Induced Gene Expression Changes in the Rat Hippocampus
Numerous studies have been performed to examine gene expression patterns in the rodent hippocampus in the kindling model of epilepsy. However, recent reviews of this literature have revealed limited agreement among studies. Because this conclusion was based on retrospective comparison of reported โhit listsโ from individual studies, we hypothesized that re-analysis of the original expression data would help address this concern. In this paper, we reanalyzed four genome-wide expression studies of excitotoxin-induced kindling in rat and performed a statistical meta-analysis. The meta-analysis revealed over 800 genes which show significant change in expression 24โh after initial seizure induction, and 59 genes altered after 10โdays. To evaluate our results in light of previous work, we assembled a reference list of genes formed from a consensus of the published literature. Our profiles include most of the genes in this reference list, and most of the additional genes are from pathways or biological processes previously recognized to be altered in kindling. In addition our results emphasized expression changes in lipid metabolism and protein degradation pathways. We conclude that a cautious re-analysis of published expression data can help illuminate genes and pathways underling kindling. Supplementary Material is available at http://www.chibi.ubc.ca/faculty/pavlidis/meta-analysis-of-brain-kindling
โGuilt by Associationโ Is the Exception Rather Than the Rule in Gene Networks
Gene networks are commonly interpreted as encoding functional information in their connections. An extensively validated principle called guilt by association states that genes which are associated or interacting are more likely to share function. Guilt by association provides the central top-down principle for analyzing gene networks in functional terms or assessing their quality in encoding functional information. In this work, we show that functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network. In effect, the apparent encoding of function within networks has been largely driven by outliers whose behaviour cannot even be generalized to individual genes, let alone to the network at large. While experimentalist-driven analysis of interactions may use prior expert knowledge to focus on the small fraction of critically important data, large-scale computational analyses have typically assumed that high-performance cross-validation in a network is due to a generalizable encoding of function. Because we find that gene function is not systemically encoded in networks, but dependent on specific and critical interactions, we conclude it is necessary to focus on the details of how networks encode function and what information computational analyses use to extract functional meaning. We explore a number of consequences of this and find that network structure itself provides clues as to which connections are critical and that systemic properties, such as scale-free-like behaviour, do not map onto the functional connectivity within networks
Analysis of strain and regional variation in gene expression in mouse brain
BACKGROUND: We performed a statistical analysis of a previously published set of gene expression microarray data from six different brain regions in two mouse strains. In the previous analysis, 24 genes showing expression differences between the strains and about 240 genes with regional differences in expression were identified. Like many gene expression studies, that analysis relied primarily on ad hoc 'fold change' and 'absent/present' criteria to select genes. To determine whether statistically motivated methods would give a more sensitive and selective analysis of gene expression patterns in the brain, we decided to use analysis of variance (ANOVA) and feature selection methods designed to select genes showing strain- or region-dependent patterns of expression. RESULTS: Our analysis revealed many additional genes that might be involved in behavioral differences between the two mouse strains and functional differences between the six brain regions. Using conservative statistical criteria, we identified at least 63 genes showing strain variation and approximately 600 genes showing regional variation. Unlike ad hoc methods, ours have the additional benefit of ranking the genes by statistical score, permitting further analysis to focus on the most significant. Comparison of our results to the previous studies and to published reports on individual genes show that we achieved high sensitivity while preserving selectivity. CONCLUSIONS: Our results indicate that molecular differences between the strains and regions studied are larger than indicated previously. We conclude that for large complex datasets, ANOVA and feature selection, alone or in combination, are more powerful than methods based on fold-change thresholds and other ad hoc selection criteria
Recommended from our members
ErmineJ: Tool for functional analysis of gene expression data sets
BACKGROUND: It is common for the results of a microarray study to be analyzed in the context of biologically-motivated groups of genes such as pathways or Gene Ontology categories. The most common method for such analysis uses the hypergeometric distribution (or a related technique) to look for "over-representation" of groups among genes selected as being differentially expressed or otherwise of interest based on a gene-by-gene analysis. However, this method suffers from some limitations, and biologist-friendly tools that implement alternatives have not been reported. RESULTS: We introduce ErmineJ, a multiplatform user-friendly stand-alone software tool for the analysis of functionally-relevant sets of genes in the context of microarray gene expression data. ErmineJ implements multiple algorithms for gene set analysis, including over-representation and resampling-based methods that focus on gene scores or correlation of gene expression profiles. In addition to a graphical user interface, ErmineJ has a command line interface and an application programming interface that can be used to automate analyses. The graphical user interface includes tools for creating and modifying gene sets, visualizing the Gene Ontology as a table or tree, and visualizing gene expression data. ErmineJ comes with a complete user manual, and is open-source software licensed under the Gnu Public License. CONCLUSION: The availability of multiple analysis algorithms, together with a rich feature set and simple graphical interface, should make ErmineJ a useful addition to the biologist's informatics toolbox. ErmineJ is available from
Integration of Neuroimaging and Microarray Datasets through Mapping and Model-Theoretic Semantic Decomposition of Unstructured Phenotypes
An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CTยฎ). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as โList all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributesโ. Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets
Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms
The growth in popularity of RNA expression microarrays has been accompanied by concerns about the reliability of the data especially when comparing between different platforms. Here, we present an evaluation of the reproducibility of microarray results using two platforms, Affymetrix GeneChips and Illumina BeadArrays. The study design is based on a dilution series of two human tissues (blood and placenta), tested in duplicate on each platform. The results of a comparison between the platforms indicate very high agreement, particularly for genes which are predicted to be differentially expressed between the two tissues. Agreement was strongly correlated with the level of expression of a gene. Concordance was also improved when probes on the two platforms could be identified as being likely to target the same set of transcripts of a given gene. These results shed light on the causes or failures of agreement across microarray platforms. The set of probes we found to be most highly reproducible can be used by others to help increase confidence in analyses of other data sets using these platforms
Numerical Modelling of Melt Behaviour in the Lower Vessel Head of a Nuclear Reactor
Acknowledgements The authors would like to thank the EPSRC MEMPHIS multi-phase programme grant, the EPSRC Computational modelling for advanced nuclear power plants project and the EU FP7 projects THINS and GoFastR for helping to fund this work.Peer reviewedPublisher PD
Numerical Modelling of Debris Bed Water Quenching
Acknowledgements The authors would like to thank the EPSRC MEMPHIS multi-phase programme grant, the EPSRC Computational modelling for advanced nuclear power plants project, the EU FP7 projects THINS and GoFastR and ExxonMobil for helping to fund this work.Peer reviewedPublisher PD
- โฆ