1,675 research outputs found

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Getting More out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics.

    Get PDF
    This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/ outcome models in the UK’s largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors’ own group) who work in text processing for biomedicine and other areas. GATE is available online ,1. under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Facilitating Technology Transfer by Patent Knowledge Graph

    Get PDF
    Technologies are one of the most important driving forces of our societal development and realizing the value of technologies heavily depends on the transfer of technologies. Given the importance of technologies and technology transfer, an increasingly large amount of money has been invested to encourage technological innovation and technology transfer worldwide. However, while numerous innovative technologies are invented, most of them remain latent and un-transferred. The comprehension of technical documents and the identification of appropriate technologies for given needs are challenging problems in technology transfer due to information asymmetry and information overload problems. There is a lack of common knowledge base that can reveal the technical details of technical documents and assist with the identification of suitable technologies. To bridge this gap, this research proposes to construct knowledge graph for facilitating technology transfer. A case study is conducted to show the construction of a patent knowledge graph and to illustrate its benefit to finding relevant patents, the most common and important form of technologies

    A corpus-based study of Chinese and English translation of international economic law: an interdisciplinary study

    Get PDF
    International Economic Law (IEL), a sub-discipline of International Law, is concerned with the regulation of international economic relations and the behaviours of States, international organisations, and firms operating in the international arena. Due to the increase in commercial intercourse, translation of International Economic Law has become an important factor in promoting cross-cultural communication. The translation of IEL is not purely a technical exercise that simply involves the linguistic translations from one language to another but rather a social and cultural act. This research sets out to examine the translation of terminology used in International Economic Law (IEL) – drawing on data from a bespoke self-built Parallel Corpus of International Economic Law (PCIEL) using a corpus-based, systematic micro-level framework – to analyse the subject matter and to discuss the feasibility of translating these legal terms at the word level, and the sentence and discourse level, with a particular focus on the impact of cultural influences. The study presents the findings from the Chinese translator’s perspective regarding International Economic Law from English/Chinese into Chinese/English with a focus on the areas of law, economics, and culture. The contribution made by a corpus-based approach applied to the interdisciplinary subject of IEL is explored. In particular, this establishes a link between linguistic and non-linguistic study in translating legal texts, especially IEL. The corpus data are organized in different semantic fields and the translation analysis covers lexical, sentential and cultural perspectives. This research demonstrates that not only linguistic factors, but, also, cultural factors make clear contributions to the translation of terminology in PCIEL
    • 

    corecore