1,164 research outputs found

    Measuring prediction capacity of individual verbs for the identification of protein interactions

    Get PDF
    AbstractMotivation: The identification of events such as protein–protein interactions (PPIs) from the scientific literature is a complex task. One of the reasons is that there is no formal syntax to denote such relations in the scientific literature. Nonetheless, it is important to understand such relational event representations to improve information extraction solutions (e.g., for gene regulatory events).In this study, we analyze publicly available protein interaction corpora (AIMed, BioInfer, BioCreAtIve II) to determine the scope of verbs used to denote protein interactions and to measure their predictive capacity for the identification of PPI events. Our analysis is based on syntactical language patterns. This restriction has the advantage that the verb mention is used as the independent variable in the experiments enabling comparability of results in the usage of the verbs. The initial selection of verbs has been generated from a systematic analysis of the scientific literature and existing corpora for PPIs.We distinguish modifying interactions (MIs) such as posttranslational modifications (PTMs) from non-modifying interactions (NMIs) and assumed that MIs have a higher predictive capacity due to stronger scientific evidence proving the interaction. We found that MIs are less frequent in the corpus but can be extracted at the same precision levels as PPIs. A significant portion of correct PPI reportings in the BioCreAtIve II corpus use the verb “associate”, which semantically does not prove a relation.The performance of every monitored verb is listed and allows the selection of specific verbs to improve the performance of PPI extraction solutions. Programmatic access to the text processing modules is available online (www.ebi.ac.uk/webservices/whatizit/info.jsf) and the full analysis of Medline abstracts will be made through the Web pages of the Rebholz group

    An Activation Force-based Affinity Measure for Analyzing Complex Networks

    Get PDF
    Affinity measure is a key factor that determines the quality of the analysis of a complex network. Here, we introduce a type of statistics, activation forces, to weight the links of a complex network and thereby develop a desired affinity measure. We show that the approach is superior in facilitating the analysis through experiments on a large-scale word network and a protein-protein interaction (PPI) network consisting of ∼5,000 human proteins. The experiment on the word network verifies that the measured word affinities are highly consistent with human knowledge. Further, the experiment on the PPI network verifies the measure and presents a general method for the identification of functionally similar proteins based on PPIs. Most strikingly, we find an affinity network that compactly connects the cancer-associated proteins to each other, which may reveal novel information for cancer study; this includes likely protein interactions and key proteins in cancer-related signal transduction pathways

    Networks in cognitive science

    Get PDF
    Networks of interconnected nodes have long played a key role in Cognitive Science, from artificial neural networks to spreading activation models of semantic memory. Recently, however, a new Network Science has been developed, providing insights into the emergence of global, system-scale properties in contexts as diverse as the Internet, metabolic reactions, and collaborations among scientists. Today, the inclusion of network theory into Cognitive Sciences, and the expansion of complex-systems science, promises to significantly change the way in which the organization and dynamics of cognitive and behavioral processes are understood. In this paper, we review recent contributions of network theory at different levels and domains within the Cognitive Sciences.Postprint (author's final draft

    Text Mining and Gene Expression Analysis Towards Combined Interpretation of High Throughput Data

    Get PDF
    Microarrays can capture gene expression activity for thousands of genes simultaneously and thus make it possible to analyze cell physiology and disease processes on molecular level. The interpretation of microarray gene expression experiments profits from knowledge on the analyzed genes and proteins and the biochemical networks in which they play a role. The trend is towards the development of data analysis methods that integrate diverse data types. Currently, the most comprehensive biomedical knowledge source is a large repository of free text articles. Text mining makes it possible to automatically extract and use information from texts. This thesis addresses two key aspects, biomedical text mining and gene expression data analysis, with the focus on providing high-quality methods and data that contribute to the development of integrated analysis approaches. The work is structured in three parts. Each part begins by providing the relevant background, and each chapter describes the developed methods as well as applications and results. Part I deals with biomedical text mining: Chapter 2 summarizes the relevant background of text mining; it describes text mining fundamentals, important text mining tasks, applications and particularities of text mining in the biomedical domain, and evaluation issues. In Chapter 3, a method for generating high-quality gene and protein name dictionaries is described. The analysis of the generated dictionaries revealed important properties of individual nomenclatures and the used databases (Fundel and Zimmer, 2006). The dictionaries are publicly available via a Wiki, a web service, and several client applications (Szugat et al., 2005). In Chapter 4, methods for the dictionary-based recognition of gene and protein names in texts and their mapping onto unique database identifiers are described. These methods make it possible to extract information from texts and to integrate text-derived information with data from other sources. Three named entity identification systems have been set up, two of them building upon the previously existing tool ProMiner (Hanisch et al., 2003). All of them have shown very good performance in the BioCreAtIvE challenges (Fundel et al., 2005a; Hanisch et al., 2005; Fundel and Zimmer, 2007). In Chapter 5, a new method for relation extraction (Fundel et al., 2007) is presented. It was applied on the largest collection of biomedical literature abstracts, and thus a comprehensive network of human gene and protein relations has been generated. A classification approach (Küffner et al., 2006) can be used to specify relation types further; e. g., as activating, direct physical, or gene regulatory relation. Part II deals with gene expression data analysis: Gene expression data needs to be processed so that differentially expressed genes can be identified. Gene expression data processing consists of several sequential steps. Two important steps are normalization, which aims at removing systematic variances between measurements, and quantification of differential expression by p-value and fold change determination. Numerous methods exist for these tasks. Chapter 6 describes the relevant background of gene expression data analysis; it presents the biological and technical principles of microarrays and gives an overview of the most relevant data processing steps. Finally, it provides a short introduction to osteoarthritis, which is in the focus of the analyzed gene expression data sets. In Chapter 7, quality criteria for the selection of normalization methods are described, and a method for the identification of differentially expressed genes is proposed, which is appropriate for data with large intensity variances between spots representing the same gene (Fundel et al., 2005b). Furthermore, a system is described that selects an appropriate combination of feature selection method and classifier, and thus identifies genes which lead to good classification results and show consistent behavior in different sample subgroups (Davis et al., 2006). The analysis of several gene expression data sets dealing with osteoarthritis is described in Chapter 8. This chapter contains the biomedical analysis of relevant disease processes and distinct disease stages (Aigner et al., 2006a), and a comparison of various microarray platforms and osteoarthritis models. Part III deals with integrated approaches and thus provides the connection between parts I and II: Chapter 9 gives an overview of different types of integrated data analysis approaches, with a focus on approaches that integrate gene expression data with manually compiled data, large-scale networks, or text mining. In Chapter 10, a method for the identification of genes which are consistently regulated and have a coherent literature background (Küffner et al., 2005) is described. This method indicates how gene and protein name identification and gene expression data can be integrated to return clusters which contain genes that are relevant for the respective experiment together with literature information that supports interpretation. Finally, in Chapter 11 ideas on how the described methods can contribute to current research and possible future directions are presented

    THE ROLE OF READING COMPREHENSION IN LARGE-SCALE SUBJECT-MATTER ASSESSMENTS

    Get PDF
    This study was designed with the overall goal of understanding how difficulties in reading comprehension are associated with early adolescents' performance in large-scale assessments in subject domains including science and civic-related social studies. The current study extended previous research by taking a cognition-centered approach based on the Evidence-Centered Design (ECD) framework and by using U.S. data from four large-scale subject-matter assessments: the IEA TIMSS Science Study of 1999, IEA CIVED Civic Education Study of 1999, and the 1970s IEA Six Subject surveys in Science, and in Civic Education. Using multiple-choice items from the TIMSS science and CIVED tests, the study identified a list of linguistic features that contribute to item difficulty of subject-matter assessments through the Coh-Metrix software, human rating, and multiple regression analysis. These linguistic features include word length, word frequency, word abstractness, intentional verbs, negative expressions, and logical connectives. They pertain to different levels of Kintsch's reading comprehension model: surface level, textbase level, and situation model. Integrating this item-level information into multiple regression analysis and Multidimensional IRT modeling, the study provided feasible methods (1) to estimate reading demand of test items in each subject-matter assessment, and (2) to partial out variance related to high level of reading demand of some test items and independent of the domain proficiencies that the subject-matter assessment was intended to measure. Overall, results suggested that reading demands of all test items in TIMSS Science and CIVED tests were within the reading capabilities of almost all of the students, and these two tests were not saturated with high reading demand. In addition, multiple regression results from the earlier Six Subject Surveys showed that an independent measure of students' general vocabulary was highly correlated with their achievement in the domains of science and civic-related social studies. On average, boys outperformed girls in both subject domains, and students from home with ample literacy resources outperformed students from homes of few literacy resources. In the science assessment, interactions were found between gender and word knowledge, home literacy resources and word knowledge, meaning the correlation between vocabulary and science performances differed by gender and home background

    Women in Science 2012

    Get PDF
    The summer of 2012 saw the number of students seeking summer research experiences with a faculty mentor reaching record levels. In total, 179 students participated in the Summer Undergraduate Research Fellows (SURF) program, involving 59 faculty mentor-advisors, representing all of the Clark Science Center’s fourteen departments and programs.https://scholarworks.smith.edu/clark_womeninscience/1011/thumbnail.jp
    corecore