16 research outputs found

    The gene normalization task in BioCreative III

    Get PDF
    BACKGROUND: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). RESULTS: We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. CONCLUSIONS: By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance

    Extração de informação aplicada a comentários da área do turismo

    Get PDF
    Motivation: The primary motivation of this dissertation was to show that it is possible to construct an NLP solution for the Portuguese language capable of helping in the hotel industry. Objective(s): The main objective of this dissertation was to extract useful information from hotel commentaries using NLP. Method: An NLP pipeline was created to extract useful information, and then sentimental analyse was used to characterise that information. Results: After processing all the commentaries of a hotel was possible to extract what people like or dislike about it. Conclusions: The two main conclusions were that is possible to create a Portuguese NLP pipeline for the hotel industry, and that is possible to extract useful information from thousands of commentaries.Motivação: A principal motivação por trás desta tese foi mostrar que é possível escrever um programa para NLP usando a língua portuguesa. Objetivo(s): O principal objetivo desta tese foi extrair informação hotel dos comentários feitos a hotéis usando NLP. Método: Foi criado um pipeline de NLP para extrair informação útil. Depois foi usado análise de sentimentos para caracterizar essa informação. Resultados: Depois de todos os comentários serem processados foi possível descobrir o que as pessoas gostam ou desgostam sobre um hotel. Conclusões: As duas principais conclusões foram que era possível fazer NLP em português e que era possível extrair informação útil de milhar de comentários.Mestrado em Engenharia Eletrónica e Telecomunicaçõe

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Acta Cybernetica : Volume 18. Number 2.

    Get PDF

    Anthropology of Color

    Get PDF
    The field of color categorization has always been intrinsically multi- and inter-disciplinary, since its beginnings in the nineteenth century. The main contribution of this book is to foster a new level of integration among different approaches to the anthropological study of color. The editors have put great effort into bringing together research from anthropology, linguistics, psychology, semiotics, and a variety of other fields, by promoting the exploration of the different but interacting and complementary ways in which these various perspectives model the domain of color experience. By so doing, they significantly promote the emergence of a coherent field of the anthropology of color

    Graduate School: Course Decriptions, 1972-73

    Full text link
    Official publication of Cornell University V.64 1972/7