35 research outputs found

    A Proposal for New Evaluation Metrics and Result Visualization Technique for Sentiment Analysis Tasks

    Get PDF
    Proceedings of: 4th International Conference of the CLEF Initiative (CLEF 2013). 4th International Conference of the CLEF Initiative (CLEF 2013). Valencia, Spain, September 23-26, 2013.In this paper we propound the use of a number of entropybased metrics and a visualization tool for the intrinsic evaluation of Sentiment and Reputation Analysis tasks. We provide a theoretical justification for their use and discuss how they complement other accuracybased metrics. We apply the proposed techniques to the analysis of TASS-SEPLN and RepLab 2012 results and show how the metric is effective for system comparison purposes, for system development and postmortem evaluation.FJVA and JCdA are supported by EU FP7 project LiMoSINe (contract 288024). CPM has been partially supported by the Spanish Government-Comisión Interministerial de Ciencia y Tecnología project TEC2011-26807 for this paper.Publicad

    Large scale biomedical texts classification: a kNN and an ESA-based approaches

    Full text link
    With the large and increasing volume of textual data, automated methods for identifying significant topics to classify textual documents have received a growing interest. While many efforts have been made in this direction, it still remains a real challenge. Moreover, the issue is even more complex as full texts are not always freely available. Then, using only partial information to annotate these documents is promising but remains a very ambitious issue. MethodsWe propose two classification methods: a k-nearest neighbours (kNN)-based approach and an explicit semantic analysis (ESA)-based approach. Although the kNN-based approach is widely used in text classification, it needs to be improved to perform well in this specific classification problem which deals with partial information. Compared to existing kNN-based methods, our method uses classical Machine Learning (ML) algorithms for ranking the labels. Additional features are also investigated in order to improve the classifiers' performance. In addition, the combination of several learning algorithms with various techniques for fixing the number of relevant topics is performed. On the other hand, ESA seems promising for this classification task as it yielded interesting results in related issues, such as semantic relatedness computation between texts and text classification. Unlike existing works, which use ESA for enriching the bag-of-words approach with additional knowledge-based features, our ESA-based method builds a standalone classifier. Furthermore, we investigate if the results of this method could be useful as a complementary feature of our kNN-based approach.ResultsExperimental evaluations performed on large standard annotated datasets, provided by the BioASQ organizers, show that the kNN-based method with the Random Forest learning algorithm achieves good performances compared with the current state-of-the-art methods, reaching a competitive f-measure of 0.55% while the ESA-based approach surprisingly yielded reserved results.ConclusionsWe have proposed simple classification methods suitable to annotate textual documents using only partial information. They are therefore adequate for large multi-label classification and particularly in the biomedical domain. Thus, our work contributes to the extraction of relevant information from unstructured documents in order to facilitate their automated processing. Consequently, it could be used for various purposes, including document indexing, information retrieval, etc.Comment: Journal of Biomedical Semantics, BioMed Central, 201

    Demo : Swip, a semantic web interface using patterns

    Get PDF
    International audienceOur purpose is to provide end-users with a means to query ontology based knowledge bases using natural language queries and thus hide the complexity of formulating a query expressed in a graph query language such as SPARQL. The main originality of our approach lies in the use of query patterns. Our contribution is materialized in a system named SWIP, standing for Semantic Web Interface Using Patterns. The demo will present use cases of this system

    REINA at RepLab2013 Topic Detection Task: Community Detection

    Get PDF
    [EN]Social networks have become a large repository of comments which can extract multiple information. Twitter is one of the most widespread social networks and larger and is therefore an important source for detecting states of opinion, events and happenings before even the mainstream media. Topic detection is important to discover areas of interest that arise in the tweets. We have used classical systems for a similarity matrix and we have used community detection techniques. The results have been good and allows us to study new possibilities

    REINA at RepLab2013 Topic Detection Task: Community Detection

    Get PDF
    Social networks have become a large repository of comments which can extract multiple information. Twitter is one of the most widespread social networks and larger and is therefore an important source for detecting states of opinion, events and happenings before even the mainstream media. Topic detection is important to discover areas of interest that arise in the tweets. We have used classical systems for a similarity matrix and we have used community detection techniques. The results have been good and allows us to study new possibilities

    Combining Overall and Target Oriented Sentiment Analysis over Portuguese Text from Social Media

    Get PDF
    This document describes an approach to perform sentiment analysis on social media Portuguese content. In a single system, we perform polarity classification for both the overall sentiment, and target oriented sentiment. In both modes we train a Maximum Entropy classifier. The overall model is based on BoW type features, and also features derived from POS tagging and from sentiment lexicons. Target oriented analysis begins with named entity recognition, followed by the classification of sentiment polarity on these entities. This classifier model uses features dedicated to the entity mention textual zone, including negation detection, and the syntactic function of the target occurrence segment. Our experiments have achieved an accuracy of 75% for target oriented polarity classification, and 97% in overall polarity

    Ewaluacja skuteczności systemów wyszukiwania informacji. Wyniki eksperymentu Polish Task realizowanego w ramach Conference and Labs of the Evaluation Forum (CLEF) 2012

    Get PDF
    W niniejszym artykule prezentujemy realizację laboratorium ewaluacyjnego CLEF (Conference and Labs of the Evaluation Forum) ze specjalnym uwzględnieniem kampanii CHiC (Cultural Heritage in CLEF). Opisujemy realizację oraz wyniki zadania Polish Task in ChiC. W artykule zaprezentowano wnioski z realzacji zadania. Zostały omówione wyniki uzyskane przez uczestników zadania przy użyciu różnych strategii indeksowania oraz wyszukiwania zasobów. Porównaliśmy efektywność metod tf-idf, OKAPI, DFR oraz data fusion.The article presents the design of CLEF (Conference and Labs of the Evaluation Forum) evaluation labs with special attention paid to CHiC (Cultural Heritage in CLEF). We describe design of Polish Task in CHiClab and discuss conclusions from lab realisation. We discuss results achieved by different participants using different indexing and matching approaches. Efficiency of tf-idf, OKAPI, DFR and data fusion was compared and analysed

    Natural language query interpretation into SPARQL using patterns

    Get PDF
    International audienceOur purpose is to provide end-users with a means to query ontology based knowledge bases using natural language queries and thus hide the complexity of formulating a query expressed in a graph query language such as SPARQL. The main originality of our approach lies in the use of query patterns. In this article we justify the postulate supporting our work which claims that queries issued by real life end-users are variations of a few typical query families. We also explain how our approach is designed to be adaptable to different user languages. Evaluations on the QALD-3 data set have shown the relevancy of the approach
    corecore