259 research outputs found

    Biomedical name recognition: A machine learning approach

    Get PDF
    Master'sMASTER OF SCIENC

    Entity recognition for multi-modal socio-technical systems

    Get PDF
    Entity Recognition (ER) can be used as a method for extracting information about socio-technical systems from unstructured, natural language text data. This process is limited by the set of entity classes considered in many current ER solutions. In this thesis, we report on the development of an ER classifier that supports a wide range of entity classes that are relevant for analyzing multi-modal, socio-technical systems. Another limitation with current entity extractors is that they mainly support the detection of named entities, typically in the form of proper nouns. The presented solution also detects entities not referred to by a name, such as general references to places (e.g. forest) or natural resources (e.g. timber). We use supervised machine learning for this project. To overcome data sparseness issues that results from considering a large number of entity classes, we built two separate classifiers for predicting labels for entity boundary and class. We herein investigate rules for merging both labels while minimizing the loss of accuracy due to this step. The accuracy of our classifier for the largest model with 94 classes achieves 75.9%. We compare the performance of our solution to other standard systems on several datasets, finding that with the same number of classes, the accuracy of our classifier is comparable to other state-of-the-art ER packages

    Recognition of protein/gene names from text using an ensemble of classifiers

    Get PDF
    This paper proposes an ensemble of classifiers for biomedical name recognition in which three classifiers, one Support Vector Machine and two discriminative Hidden Markov Models, are combined effectively using a simple majority voting strategy. In addition, we incorporate three post-processing modules, including an abbreviation resolution module, a protein/gene name refinement module and a simple dictionary matching module, into the system to further improve the performance. Evaluation shows that our system achieves the best performance from among 10 systems with a balanced F-measure of 82.58 on the closed evaluation of the BioCreative protein/gene name recognitiontask (Task 1A)

    A System for Identifying Named Entities in Biomedical Text: how Results From two Evaluations Reflect on Both the System and the Evaluations

    Get PDF
    We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal

    Developing a Hybrid Dictionary-based Bio-entity Recognition Technique

    Get PDF
    Background: Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. Methods: This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. Results: The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. Conclusions: The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.X1133Ysciescopu

    Spanish named entity recognition in the biomedical domain

    Get PDF
    Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft
    corecore