337 research outputs found

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok

    Conceptualization of Computational Modeling Approaches and Interpretation of the Role of Neuroimaging Indices in Pathomechanisms for Pre-Clinical Detection of Alzheimer Disease

    Get PDF
    With swift advancements in next-generation sequencing technologies alongside the voluminous growth of biological data, a diversity of various data resources such as databases and web services have been created to facilitate data management, accessibility, and analysis. However, the burden of interoperability between dynamically growing data resources is an increasingly rate-limiting step in biomedicine, specifically concerning neurodegeneration. Over the years, massive investments and technological advancements for dementia research have resulted in large proportions of unmined data. Accordingly, there is an essential need for intelligent as well as integrative approaches to mine available data and substantiate novel research outcomes. Semantic frameworks provide a unique possibility to integrate multiple heterogeneous, high-resolution data resources with semantic integrity using standardized ontologies and vocabularies for context- specific domains. In this current work, (i) the functionality of a semantically structured terminology for mining pathway relevant knowledge from the literature, called Pathway Terminology System, is demonstrated and (ii) a context-specific high granularity semantic framework for neurodegenerative diseases, known as NeuroRDF, is presented. Neurodegenerative disorders are especially complex as they are characterized by widespread manifestations and the potential for dramatic alterations in disease progression over time. Early detection and prediction strategies through clinical pointers can provide promising solutions for effective treatment of AD. In the current work, we have presented the importance of bridging the gap between clinical and molecular biomarkers to effectively contribute to dementia research. Moreover, we address the need for a formalized framework called NIFT to automatically mine relevant clinical knowledge from the literature for substantiating high-resolution cause-and-effect models

    A pilot study in an application of text mining to learning system evaluation

    Get PDF
    Text mining concerns discovering and extracting knowledge from unstructured data. It transforms textual data into a usable, intelligible format that facilitates classifying documents, finding explicit relationships or associations between documents, and clustering documents into categories. Given a collection of survey comments evaluating the civil engineering learning system, text mining technique is applied to discover and extract knowledge from the comments. This research focuses on the study of a systematic way to apply a software tool, SAS Enterprise Miner, to the survey data. The purpose is to categorize the comments into different groups in an attempt to identify major concerns from the users or students. Each group will be associated with a set of key terms. This is able to assist the evaluators of the learning system to obtain the ideas from those summarized terms without the need of going through a potentially huge amount of data --Abstract, page iii

    Translational drug interaction study using text mining technology

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Drug-Drug Interaction (DDI) is one of the major causes of adverse drug reaction (ADR) and has been demonstrated to threat public health. It causes an estimated 195,000 hospitalizations and 74,000 emergency room visits each year in the USA alone. Current DDI research aims to investigate different scopes of drug interactions: molecular level of pharmacogenetics interaction (PG), pharmacokinetics interaction (PK), and clinical pharmacodynamics consequences (PD). All three types of experiments are important, but they are playing different roles for DDI research. As diverse disciplines and varied studies are involved, interaction evidence is often not available cross all three types of evidence, which create knowledge gaps and these gaps hinder both DDI and pharmacogenetics research. In this dissertation, we proposed to distinguish the three types of DDI evidence (in vitro PK, in vivo PK, and clinical PD studies) and identify all knowledge gaps in experimental evidence for them. This is a collective intelligence effort, whereby a text mining tool will be developed for the large-scale mining and analysis of drug-interaction information such that it can be applied to retrieve, categorize, and extract the information of DDI from published literature available on PubMed. To this end, three tasks will be done in this research work: First, the needed lexica, ontology, and corpora for distinguishing three different types of studies were prepared. Despite the lexica prepared in this work, a comprehensive dictionary for drug metabolites or reaction, which is critical to in vitro PK study, is still lacking in pubic databases. Thus, second, a name entity recognition tool will be proposed to identify drug metabolites and reaction in free text. Third, text mining tools for retrieving DDI articles and extracting DDI evidence are developed. In this work, the knowledge gaps cross all three types of DDI evidence can be identified and the gaps between knowledge of molecular mechanisms underlying DDI and their clinical consequences can be closed with the result of DDI prediction using the retrieved drug gene interaction information such that we can exemplify how the tools and methods can advance DDI pharmacogenetics research.2 year

    Similarity measures and diversity rankings for query-focused sentence extraction

    Get PDF
    Query-focused sentence extraction generally refers to an extractive approach to select a set of sentences that responds to a specific information need. It is one of the major approaches employed in multi-document summarization, focused summarization, and complex question answering. The major advantage of most extractive methods over the natural language processing (NLP) intensive methods is that they are relatively simple, theoretically sound – drawing upon several supervised and unsupervised learning techniques, and often produce equally strong empirical performance. Many research areas, including information retrieval and text mining, have recently moved toward the extractive query-focused sentence generation as its outputs have great potential to support every day‟s information seeking activities. Particularly, as more information have been created and stored online, extractive-based summarization systems may quickly utilize several ubiquitous resources, such as Google search results and social medias, to extract summaries to answer users‟ queries.This thesis explores how the performance of sentence extraction tasks can be improved to create higher quality outputs. Specifically, two major areas are investigated. First, we examine the issue of natural language variation which affects the similarity judgment of sentences. As sentences are much shorter than documents, they generally contain fewer occurring words. Moreover, the similarity notions of sentences are different than those of documents as they tend to be very specific in meanings. Thus many document-level similarity measures are likely to perform well at this level. In this work, we address these issues in two application domains. First, we present a hybrid method, utilizing both unsupervised and supervised techniques, to compute the similarity of interrogative sentences for factoid question reuse. Next, we propose a novel structural similarity measure based on sentence semantics for paraphrase identification and textual entailment recognition tasks. The empirical evaluations suggest the effectiveness of the proposed methods in improving the accuracy of sentence similarity judgments.Furthermore, we examine the effects of the proposed similarity measure in two specific sentence extraction tasks, focused summarization and complex question answering. In conjunction with the proposed similarity measure, we also explore the issues of novelty, redundancy, and diversity in sentence extraction. To that end, we present a novel approach to promote diversity of extracted sets of sentences based on the negative endorsement principle. Negative-signed edges are employed to represent a redundancy relation between sentence nodes in graphs. Then, sentences are reranked according to the long-term negative endorsements from random walk. Additionally, we propose a unified centrality ranking and diversity ranking based on the aforementioned principle. The results from a comprehensive evaluation confirm that the proposed methods perform competitively, compared to many state-of-the-art methods.Ph.D., Information Science -- Drexel University, 201

    Is Neuro-Symbolic AI Meeting its Promise in Natural Language Processing? A Structured Review

    Full text link
    Advocates for Neuro-Symbolic Artificial Intelligence (NeSy) assert that combining deep learning with symbolic reasoning will lead to stronger AI than either paradigm on its own. As successful as deep learning has been, it is generally accepted that even our best deep learning systems are not very good at abstract reasoning. And since reasoning is inextricably linked to language, it makes intuitive sense that Natural Language Processing (NLP), would be a particularly well-suited candidate for NeSy. We conduct a structured review of studies implementing NeSy for NLP, with the aim of answering the question of whether NeSy is indeed meeting its promises: reasoning, out-of-distribution generalization, interpretability, learning and reasoning from small data, and transferability to new domains. We examine the impact of knowledge representation, such as rules and semantic networks, language structure and relational structure, and whether implicit or explicit reasoning contributes to higher promise scores. We find that systems where logic is compiled into the neural network lead to the most NeSy goals being satisfied, while other factors such as knowledge representation, or type of neural architecture do not exhibit a clear correlation with goals being met. We find many discrepancies in how reasoning is defined, specifically in relation to human level reasoning, which impact decisions about model architectures and drive conclusions which are not always consistent across studies. Hence we advocate for a more methodical approach to the application of theories of human reasoning as well as the development of appropriate benchmarks, which we hope can lead to a better understanding of progress in the field. We make our data and code available on github for further analysis.Comment: Surve

    Methods to assess food-evoked emotion across cultures

    Get PDF

    Semantic and pragmatic characterization of learning objects

    Get PDF
    Tese de doutoramento. Engenharia Informática. Universidade do Porto. Faculdade de Engenharia. 201

    Enforcing Customization in e-Learning Systems: an ontology and product line-based approach

    Full text link
    In the era of e-Learning, educational materials are considered a crucial point for all the stakeholders. On the one hand, instructors aim at creating learning materials that meet the needs and expectations of learners easily and effec-tively; On the other hand, learners want to acquire knowledge in a way that suits their characteristics and preferences. Consequently, the provision and customization of educational materials to meet the needs of learners is a constant challenge and is currently synonymous with technological devel-opment. Promoting the personalization of learning materials, especially dur-ing their development, will help to produce customized learning materials for specific learners' needs. The main objective of this thesis is to reinforce and strengthen Reuse, Cus-tomization and Ease of Production issues in e-Learning materials during the development process. The thesis deals with the design of a framework based on ontologies and product lines to develop customized Learning Objects (LOs). With this framework, the development of learning materials has the following advantages: (i) large-scale production, (ii) faster development time, (iii) greater (re) use of resources. The proposed framework is the main contribution of this thesis, and is char-acterized by the combination of three models: the Content Model, which addresses important points related to the structure of learning materials, their granularity and levels of aggregation; the Customization Model, which con-siders specific learner characteristics and preferences to customize the learn-ing materials; and the LO Product Line (LOPL) model, which handles the subject of variability and creates matter-them in an easy and flexible way. With these models, instructors can not only develop learning materials, but also reuse and customize them during development. An additional contribution is the Customization Model, which is based on the Learning Style Model (LSM) concept. Based on the study of seven of them, a Global Learning Style Model Ontology (GLSMO) has been con-structed to help instructors with information on the apprentice's characteris-tics and to recommend appropriate LOs for customization. The results of our work have been reflected in the design of an authoring tool for learning materials called LOAT. They have described their require-ments, the elements of their architecture, and some details of their user inter-face. As an example of its use, it includes a case study that shows how its use in the development of some learning components.En la era del e¿Learning, los materiales educativos se consideran un punto crucial para todos los participantes. Por un lado, los instructores tienen como objetivo crear materiales de aprendizaje que satisfagan las necesidades y ex-pectativas de los alumnos de manera fácil y efectiva; por otro lado, los alumnos quieren adquirir conocimientos de una manera que se adapte a sus características y preferencias. En consecuencia, la provisión y personaliza-ción de materiales educativos para satisfacer las necesidades de los estudian-tes es un desafío constante y es actualmente sinónimo de desarrollo tecnoló-gico. El fomento de la personalización de los materiales de aprendizaje, es-pecialmente durante su desarrollo, ayudará a producir materiales de aprendi-zaje específicos para las necesidades específicas de los alumnos. El objetivo fundamental de esta tesis es reforzar y fortalecer los temas de Reutilización, Personalización y Facilidad de Producción en materiales de e-Learning durante el proceso de desarrollo. La tesis se ocupa del diseño de un marco basado en ontologías y líneas de productos para desarrollar objetos de aprendizaje personalizados. Con este marco, el desarrollo de materiales de aprendizaje tiene las siguientes ventajas: (i) producción a gran escala, (ii) tiempo de desarrollo más rápido, (iii) mayor (re)uso de recursos. El marco propuesto es la principal aportación de esta tesis, y se caracteriza por la combinación de tres modelos: el Modelo de Contenido, que aborda puntos importantes relacionados con la estructura de los materiales de aprendizaje, su granularidad y niveles de agregación, el Modelo de Persona-lización, que considera las características y preferencias específicas del alumno para personalizar los materiales de aprendizaje, y el modelo de Línea de productos LO (LOPL), que maneja el tema de la variabilidad y crea ma-teriales de manera fácil y flexible. Con estos modelos, los instructores no sólo pueden desarrollar materiales de aprendizaje, sino también reutilizarlos y personalizarlos durante el desarrollo. Una contribución adicional es el modelo de personalización, que se basa en el concepto de modelo de estilo de aprendizaje. A partir del estudio de siete de ellos, se ha construido una Ontología de Modelo de Estilo de Aprendiza-je Global para ayudar a los instructores con información sobre las caracterís-ticas del aprendiz y recomendarlos apropiados para personalización. Los resultados de nuestro trabajo se han plasmado en el diseño de una he-rramienta de autor de materiales de aprendizaje llamada LOAT. Se han des-crito sus requisitos, los elementos de su arquitectura, y algunos detalles de su interfaz de usuario. Como ejemplo de su uso, se incluye un caso de estudio que muestra cómo su empleo en el desarrollo de algunos componentes de aprendizaje.En l'era de l'e¿Learning, els materials educatius es consideren un punt crucial per a tots els participants. D'una banda, els instructors tenen com a objectiu crear materials d'aprenentatge que satisfacen les necessitats i expectatives dels alumnes de manera fàcil i efectiva; d'altra banda, els alumnes volen ad-quirir coneixements d'una manera que s'adapte a les seues característiques i preferències. En conseqüència, la provisio' i personalitzacio' de materials edu-catius per a satisfer les necessitats dels estudiants és un desafiament constant i és actualment sinònim de desenvolupament tecnològic. El foment de la personalitzacio' dels materials d'aprenentatge, especialment durant el seu desenvolupament, ajudarà a produir materials d'aprenentatge específics per a les necessitats concretes dels alumnes. L'objectiu fonamental d'aquesta tesi és reforçar i enfortir els temes de Reutilització, Personalització i Facilitat de Producció en materials d'e-Learning durant el procés de desenvolupament. La tesi s'ocupa del disseny d'un marc basat en ontologies i línia de productes per a desenvolupar objec-tes d'aprenentatge personalitzats. Amb aquest marc, el desenvolupament de materials d'aprenentatge té els següents avantatges: (i) produccio' a gran esca-la, (ii) temps de desenvolupament mes ràpid, (iii) major (re)ús de recursos. El marc proposat és la principal aportacio' d'aquesta tesi, i es caracteritza per la combinacio' de tres models: el Model de Contingut, que aborda punts im-portants relacionats amb l'estructura dels materials d'aprenentatge, la se-ua granularitat i nivells d'agregació, el Model de Línia de Producte, que ges-tiona el tema de la variabilitat i crea materials d'aprenentatge de manera fàcil i flexible. Amb aquests models, els instructors no solament poden desenvolu-par materials d'aprenentatge, sinó que també poden reutilitzar-los i personalit-zar-los durant el desenvolupament. Una contribucio' addicional és el Model de Personalitzacio', que es basa en el concepte de model d'estil d'aprenentatge. A partir de l'estudi de set d'ells, s'ha construït una Ontologia de Model d'Estil d'Aprenentatge Global per a ajudar als instructors amb informacio' sobre les característiques de l'aprenent i recomanar els apropiats per a personalitzacio'. Els resultats del nostre treball s'han plasmat en el disseny d'una eina d'autor de materials d'aprenentatge anomenada LOAT. S'han descrit els seus requi-sits, els elements de la seua arquitectura, i alguns detalls de la seua interfície d'usuari. Com a exemple del seu ús, s'inclou un cas d'estudi que mostra com és el desenvolupament d'alguns components d'aprenentatge.Ezzat Labib Awad, A. (2017). Enforcing Customization in e-Learning Systems: an ontology and product line-based approach [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90515TESI
    corecore