9,008 research outputs found
Developing a Hybrid Dictionary-based Bio-entity Recognition Technique
Background: Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. Methods: This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. Results: The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. Conclusions: The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.X1133Ysciescopu
Chemical named entities recognition: a review on approaches and applications
The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracte
Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing
Biological named entity recognition, the identification of biological terms in text, is essential for biomedical information extraction. Machine learning-based approaches have been widely applied in this area. However, the recognition performance of current approaches could still be improved. Our novel approach is to combine support vector machines (SVMs) and conditional random fields (CRFs), which can complement and facilitate each other. During the hybrid process, we use SVM to separate biological terms from non-biological terms, before we use CRFs to determine the types of biological terms, which makes full use of the power of SVM as a binary-class classifier and the data-labeling capacity of CRFs. We then merge the results of SVM and CRFs. To remove any inconsistencies that might result from the merging, we develop a useful algorithm and apply two rules. To ensure biological terms with a maximum length are identified, we propose a maximal bidirectional squeezing approach that finds the longest term. We also add a positive gain to rare events to reinforce their probability and avoid bias. Our approach will also gradually extend the context so more contextual information can be included. We examined the performance of four approaches with GENIA corpus and JNLPBA04 data. The combination of SVM and CRFs improved performance. The macro-precision, macro-recall, and macro-F1 of the SVM-CRFs hybrid approach surpassed conventional SVM and CRFs. After applying the new algorithms, the macro-F1 reached 91.67% with the GENIA corpus and 84.04% with the JNLPBA04 data
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Community’s Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by Consellería
de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
Recommended from our members
Retrieving information from heterogeneous freight data sources to answer natural language queries
textThe ability to retrieve accurate information from databases without an extensive knowledge of the contents and organization of each database is extremely beneficial to the dissemination and utilization of freight data. The challenges, however, are: 1) correctly identifying only the relevant information and keywords from questions when dealing with multiple sentence structures, and 2) automatically retrieving, preprocessing, and understanding multiple data sources to determine the best answer to user’s query. Current named entity recognition systems have the ability to identify entities but require an annotated corpus for training which in the field of transportation planning does not currently exist. A hybrid approach which combines multiple models to classify specific named entities was therefore proposed as an alternative. The retrieval and classification of freight related keywords facilitated the process of finding which databases are capable of answering a question. Values in data dictionaries can be queried by mapping keywords to data element fields in various freight databases using ontologies. A number of challenges still arise as a result of different entities sharing the same names, the same entity having multiple names, and differences in classification systems. Dealing with ambiguities is required to accurately determine which database provides the best answer from the list of applicable sources. This dissertation 1) develops an approach to identify and classifying keywords from freight related natural language queries, 2) develops a standardized knowledge representation of freight data sources using an ontology that both computer systems and domain experts can utilize to identify relevant freight data sources, and 3) provides recommendations for addressing ambiguities in freight related named entities. Finally, the use of knowledge base expert systems to intelligently sift through data sources to determine which ones provide the best answer to a user’s question is proposed.Civil, Architectural, and Environmental Engineerin
- …