2,284 research outputs found

    An Analysis of the Performances of the CasEN Named Entities Recognition System in the Ester2 Evaluation Campaign

    Get PDF
    8 pagesIn this paper, we present a detailed and critical analysis of the behaviour of the CasEN named entity recognition system during the French Ester2 evaluation campaign. In this project, CasEN has been confronted with the task of detecting and categorizing named entities in manual and automatic transcriptions of radio broadcastings. At first, we give a general presentation of the Ester2 campaign. Then, we describe our system, based on transducers. Next, we depict how systems were evaluated during this campaign and we report the main official results. Afterwards, we investigate in details the influence of some annotation biases which have significantly affected the estimation of the performances of systems. At last, we conduct an in-depth analysis of the effective errors of the CasEN system, providing us with some useful indications about phenomena that gave rise to errors (e.g. metonymy, encapsulation, detection of right boundaries) and are as many challenges for named entity recognition systems

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Futures of inland aquatic agricultural systems and implications for fish agri-food systems in southern Africa

    Get PDF
    The CGIAR Research Program on Aquatic Agricultural Systems (AAS) is collaborating with partners to develop and implement a foresight-based engagement with diverse stakeholders linked to aquatic agricultural systems. The program’s aim is to understand the implications of current drivers of change for fish agri-food systems, and consequently food and nutrition security, in Africa, Asia and the Pacific. Partners include the Global Forum on Agricultural Research (GFAR), the Forum for Agricultural Research in Africa (FARA) and the African Union’s New Partnership for Africa’s Development (AU-NEPAD). A key part of the program was a participatory scenario-building workshop held in July 2015 under the theme of "futures of aquatic agricultural systems and implications for fish agri-food systems in southern Africa." The objectives for the workshop were (i) to engage local stakeholders in exploring plausible futures of aquatic agricultural systems, and (ii) to broker and catalyze collaborative plans of action based on the foresight analysis. This report presents technical findings from the workshop. The CGIAR Research Program on Aquatic Agricultural Systems (AAS) is collaborating with partners to develop and implement a foresight-based engagement with diverse stakeholders linked to aquatic agricultural systems. The program’s aim is to understand the implications of current drivers of change for fish agri-food systems, and consequently food and nutrition security, in Africa, Asia and the Pacific. Partners include the Global Forum on Agricultural Research (GFAR), the Forum for Agricultural Research in Africa (FARA) and the African Union’s New Partnership for Africa’s Development (AU-NEPAD). A key part of the program was a participatory scenario-building workshop held in July 2015 under the theme of "futures of aquatic agricultural systems and implications for fish agri-food systems in southern Africa." The objectives for the workshop were (i) to engage local stakeholders in exploring plausible futures of aquatic agricultural systems, and (ii) to broker and catalyze collaborative plans of action based on the foresight analysis. This report presents technical findings from the workshop

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    A survey on sentiment analysis in Urdu: A resource-poor language

    Get PDF
    © 2020 Background/introduction: The dawn of the internet opened the doors to the easy and widespread sharing of information on subject matters such as products, services, events and political opinions. While the volume of studies conducted on sentiment analysis is rapidly expanding, these studies mostly address English language concerns. The primary goal of this study is to present state-of-art survey for identifying the progress and shortcomings saddling Urdu sentiment analysis and propose rectifications. Methods: We described the advancements made thus far in this area by categorising the studies along three dimensions, namely: text pre-processing lexical resources and sentiment classification. These pre-processing operations include word segmentation, text cleaning, spell checking and part-of-speech tagging. An evaluation of sophisticated lexical resources including corpuses and lexicons was carried out, and investigations were conducted on sentiment analysis constructs such as opinion words, modifiers, negations. Results and conclusions: Performance is reported for each of the reviewed study. Based on experimental results and proposals forwarded through this paper provides the groundwork for further studies on Urdu sentiment analysis

    Data-Driven Understanding of Smart Service Systems Through Text Mining

    Get PDF
    Smart service systems are everywhere, in homes and in the transportation, energy, and healthcare sectors. However, such systems have yet to be fully understood in the literature. Given the widespread applications of and research on smart service systems, we used text mining to develop a unified understanding of such systems in a data-driven way. Specifically, we used a combination of metrics and machine learning algorithms to preprocess and analyze text data related to smart service systems, including text from the scientific literature and news articles. By analyzing 5,378 scientific articles and 1,234 news articles, we identify important keywords, 16 research topics, 4 technology factors, and 13 application areas. We define ???smart service system??? based on the analytics results. Furthermore, we discuss the theoretical and methodological implications of our work, such as the 5Cs (connection, collection, computation, and communications for co-creation) of smart service systems and the text mining approach to understand service research topics. We believe this work, which aims to establish common ground for understanding these systems across multiple disciplinary perspectives, will encourage further research and development of modern service systems

    Developing Curriculum and Tasks for Teaching Reading

    Full text link
    This paper is aimed at proposing a curriculum and task design for teaching reading to students of an English teacher training. The curriculum discusses the needs and context analysis, goals, objectives, beliefs, content organization, task design, learning assessment and evaluation of the course. While being skill-oriented on one hand, the curriculum works towards a more process or discourse-oriented design. This is done to cater for the needs to build students' reading skills, understanding of text's construction and positive attitudes towards reading as not merely a means of acquiring a full command in the target language but also a tool to access knowledge and advancement

    A Hybrid Approach for Large-Scale Product Categorization Based on Weighted KNN and LSTM-BPV

    Get PDF
    In modern e-commerce systems, large volumes of new items are being added to the product list everyday, which calls for automatic product categorization. In this thesis we propose a weighted K-Nearest Neighbour (KNN) based classification system for solving large-scale e-commerce product taxonomy classification problem. We use information retrieval (IR) model as similarity function in our weighted KNN algorithm. Among all IR models used in this study, we achieved highest classification performance through using information-based (IB) model as similarity function in the KNN algorithm. Moreover, our proposed method can improve the overall performance when combining prediction results with those from advanced neural network based method, namely Long Short-Term Memory with Balanced Pooling Views (LSTM-BPV). The hybrid system could achieve results comparable to the state of the art (SotA). We also get good results by fine-tuning pre-trained Bidirectional Encoder Representations from Transformers (BERT) model

    The TXM Portal Software giving access to Old French Manuscripts Online

    Get PDF
    Texte intégral en ligne : http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdfInternational audiencehttp://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf This paper presents the new TXM software platform giving online access to Old French Text Manuscripts images and tagged transcriptions for concordancing and text mining. This platform is able to import medieval sources encoded in XML according to the TEI Guidelines for linking manuscript images to transcriptions, encode several diplomatic levels of transcription including abbreviations and word level corrections. It includes a sophisticated tokenizer able to deal with TEI tags at different levels of linguistic hierarchy. Words are tagged on the fly during the import process using IMS TreeTagger tool with a specific language model. Synoptic editions displaying side by side manuscript images and text transcriptions are automatically produced during the import process. Texts are organized in a corpus with their own metadata (title, author, date, genre, etc.) and several word properties indexes are produced for the CQP search engine to allow efficient word patterns search to build different type of frequency lists or concordances. For syntactically annotated texts, special indexes are produced for the Tiger Search engine to allow efficient syntactic concordances building. The platform has also been tested on classical Latin, ancient Greek, Old Slavonic and Old Hieroglyphic Egyptian corpora (including various types of encoding and annotations)
    corecore