607 research outputs found

    Legal documentation with the computer-aided indexing system CTX

    Get PDF
    Der Artikel befasst sich mit linguistischen Methoden in Information und Dokumentation, insb. zur Bearbeitung großer Textsammlungen zum Zwecke des Information Retrieval (automatische Indexierung)

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Patent data driven innovation logic

    Get PDF
    Innovation research is conventionally conducted with creativity techniques such as TRIZ, Mind Mapping, Brainstorming, etc. (Dewulf, Baillie 1998). Patent research is typically used to research novelty or prior art, and legal studies. This thesis is at the intersection of creativity techniques, and patent data analysis. It describes how to utilise patent data for distilling Innovation Logic and conducting innovation research. Using the patent research tool PatentInspiration (© AULIVE Software NV), the 4 different stages of the Innovation Logic approach have been subjected to text analysis in patent literature. The specific text patterns were identified and documented on several case studies, with one case study across the whole thesis: the toothbrush. The opportunities and limitations of Patent Data Driven Innovation Research have been documented and discussed. This methodology has been demonstrated within a proposed structural approach to problem solving, technology marketing and innovation research. Furthermore, the potential of artificial idea generation and artificial creativity was examined and debated for the purpose of computer aided creativity. This thesis examines and confirms three claims: CLAIM 1: PROPERTIES AND FUNCTIONS CAN BE ADJECTIVES AND VERBS IN PATENT LITERATURE CLAIM 2: PATENT DATA ANALYSIS AUGMENTS THE FULL INNOVATION LOGIC PROCESS CLAIM 3: ARTIFICIAL INNOVATION METHODS CAN BE FUELED BY PATENT DATA Patent data can be text mined, acting as a global brain consisting of over 100 million invention documents. It is possible to use this existing data to reverse engineer thinking methodologies, allowing scientists and engineers to solve new problems, invent new products or processes, or find new markets for existing technologies. Patent Data Driven Innovation Logic will demonstrate a systematic innovation approach that combines the force of contemporary data mining methods on patent literature, with a structured innovation research methodology.Open Acces

    AI-assisted patent prior art searching - feasibility study

    Get PDF
    This study seeks to understand the feasibility, technical complexities and effectiveness of using artificial intelligence (AI) solutions to improve operational processes of registering IP rights. The Intellectual Property Office commissioned Cardiff University to undertake this research. The research was funded through the BEIS Regulators’ Pioneer Fund (RPF). The RPF fund was set up to help address barriers to innovation in the UK economy

    Meaning refinement to improve cross-lingual information retrieval

    Get PDF
    Magdeburg, Univ., Fak. für Informatik, Diss., 2012von Farag Ahme

    Identifying chemical entities on literature:a machine learning approach using dictionaries as domain knowledge

    Get PDF
    Tese de doutoramento, Informática (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2013The volume of life science publications, and therefore the underlying biomedical knowledge, are growing at a fast pace. However the manual literature analysis is a slow and painful task. Hence, text mining systems have been developed to automatically locate the relevant information contained in the literature. An essential step in text mining is named entitiy recognition, but the inherent complexity of biomedical entities, such as chemical compounds, makes it difficult to obtain good performances in this task. This thesis proposes methods capable to improve the current performance of chemical entity recognition from text. Hereby a case based method for recognizing chemical entities is proposed and the obtained evaluation results outperform the most widely used methods, based in dictionaries. A lexical similarity based chemical entity resolution method was also developed and allows an efficient mapping of the recognized entities to the ChEBI database. To improve the chemical entity identification results we developed a validation method that exploits the semantic relationships in ChEBI to measure the similarity between the entities found in the text, in order to discriminate between the correctly identified entities that can be validated and identification errors that should be discarded. A machine learning method for entity recognition error is also proposed, which can efectively find recognition errors in rule based systems. The methods were integrated in a system capable of recognizing chemical entities in texts, map them to the ChEBI database, and provide evidence of validation or recognition error for the recognized entities.O volume de publicações científicas nas ciências da vida está a aumentar a um ritmo crescente. Contudo a análise manual da literatura é um processo árduo e moroso, pelo que têm sido desenvolvidos sistemas de prospecção de texto para identificar automaticamente a informação relevante contida na literatura. Um passo essencial em prospecção de texto é a identificação de entidades nomeadas, mas a complexidade inerente às entidades biomédicas, como é o caso dos compostos químicos, torna difícil obter bons desempenhos nesta tarefa. Esta tese propõe métodos para melhorar o desempenho actual do processo de reconhecimento de entidades químicas em texto. Para tal propõe-se um método para reconhecimento de entidades químicas baseado em aprendizagem automática, que obteve resultados superiores aos métodos baseados em dicionários utilizados actualmente. Desenvolveu-se ainda um método baseado em semelhança lexical que realiza o mapeamento de entidades para a ontologia ChEBI. Para melhorar os resultados de identificação de entidades químicas desenvolveu-se um método de validação que explora as relações semânticas do ChEBI para medir a semelhança entre as entidades encontradas no texto, de forma a discriminar as entidades correctamente identificadas dos erros de identificação. Um método de filtragem de erros baseado em aprendizagem automática é também proposto, e foi testado num sistema baseado em regras. Estes métodos foram integrados num sistema capaz de reconhecer as entidades químicas em texto, mapear para o ChEBI, e fornecer evidência para validação ou detecção de erros das entidades reconhecidas.Fundação para a Ciência e a Tecnologia (FCT, SFRH/BD/36015/2007
    corecore