6 research outputs found

    A collaborative system for sentiment analysis

    Full text link

    Računalnolingvistička analiza korisničkih komentara na internetskim portalima

    Get PDF
    U rujnu 2013. godine prikupili smo korisničke komentare s tri hrvatska internetska portala (jutarnji.hr, net.hr i bitno.net) na temu prikupljanja potpisa za referendum o braku. Komentari su nastali u periodu od 12. do 26. svibnja 2013. godine. Na tim komentarima obavili smo nekoliko računalnolingvističkih analiza – analizu sentimenata, analizu polarnih frazi i analizu jezika korištenog u komentarima. Analize su pokazale da su komentari na portalima mahom negativni i prema sentimentu i prema polarnim frazama (najpozitivniji su na katoličkom portalu bitno.net), a na portalima jutarnji.hr i net.hr statistički značajnima su se pokazale razlike između ukupnog sentimenta komentara i sentimenta prema inicijativi (što je sentiment prema inicijativi pozitivniji, pozitivniji je i ukupni sentiment komentara). Također, statistički značajnom se pokazala i razlika između ukupnog sentimenta i jezika korištenog u komentarima (što je sentiment prema inicijativi pozitivniji, jezik komentatora je standardniji). Kada je riječ o korištenom jeziku, oko 50% svih komentara pisano je nestandardnim jezikom, s mnogo vulgarizama.In September 2013 we collected users' comments on three Croatian news websites (jutarnji.hr, net.hr i bitno.net), made on articles about collecting signatures for a marriage referendum in Croatia. Comments were made between 12 May 2013 and 26 May 2013. We then conducted several computational linguistics analyses (sentiment analysis, polarity analysis and language analysis) on those comments. Results showed that comments on websites are mostly negative, with mostly negative sentiments and many negative polar words, phrases and sentences (the most positive website is Catholic website bitno.net). When it comes to quantitative analysis of sentiments and language, we found that there are statistically significant differences between general sentiment in comments and sentiment towards the initiative (the more positive sentiment towards the initiative, the more positive genral sentiment) and between general sentiment and language used in comments (the more positive general sentiment, the more standard the language). When it comes to language analysis, we found that around 50% of all comments was written in non-standard language variety, with many vulgarisms

    Aspect Miner: Λεπτομερής εξόρυξη γνώμης σε επίπεδο γνωρισμάτων από συλλογές βαθμολογημένων κριτικών

    Get PDF
    Ο παγκόσμιος ιστός προσφέρει τεράστιες ποσότητες ιδιοπαραγόμενου περιεχομένου, συμπεριλαμβανομένων και κριτικών. Οι κριτικές αυτές, ανεξάρτητα αν αφορούν προϊό-ντα, υπηρεσίες, βιβλία, μουσική ή ταινίες, αποτελούν πρώτης τάξης στόχο για την ε-φαρμογή τεχνικών ανάλυσης γνώμης. Σε αυτή την εργασία παρουσιάζουμε το Aspect Miner, ένα ολοκληρωμένο σύστημα εξόρυξης γνώμης που ειδικεύεται σε κριτικές χρη-στών που δημοσιεύονται στο διαδίκτυο. Αξιοποιώντας τις βαθμολογίες χρηστών που συχνά συνοδεύουν αυτές τις κριτικές, το Aspect Miner μπορεί να εκπαιδευτεί ώστε να διακρίνει όχι μόνο μεταξύ θετικού και αρνητικού συναισθήματος, αλλά και μεταξύ πολ-λαπλών βαθμίδων έντασης. Επιπλέον, το Aspect Miner είναι ικανό να κατηγοριοποιεί γνώμες τόσο σε επίπεδο προτάσεων όσο και σε επίπεδο αξιολογικών γνωρισμάτων, και είναι προσαρμόσιμο σε κείμενα οποιασδήποτε θεματικής περιοχής. Το σύστημα δομείται γύρω από τρεις βασικές εργασίες: (i) την κατηγοριοποίηση υπο-κειμενικών όρων (ii) την αναγνώριση γνωρισμάτων και (iii) την συναισθηματική ανάλυση προτάσεων. Για την πρώτη, προτείνουμε ένα σχήμα κατηγοριοποίησης βάσει των βαθ-μολογιών χρηστών σε ένα σώμα εκπαίδευσης. Για τη δεύτερη, εξετάζουμε τη χρήση του μοντέλου θεμάτων LDA στο πεδίο της αναγνώρισης και εξαγωγής γνωρισμάτων των αντικειμένων μιας συλλογής κριτικών, και επιχειρούμε να αντιμετωπίσουμε τους εγγενείς περιορισμούς του επεκτείνοντας το με ένα επιπρόσθετο βήμα μετα-εκμάθησης όπου συναθροίζουμε πολλαπλά υποψήφια μοντέλα γνωρισμάτων σε ένα. Τέλος, για την ανά-λυση προτάσεων, κάνουμε χρήση των αποτελεσμάτων των παραπάνω εργασιών, σε συνδυασμό με μια γλωσσική τεχνική βασισμένη σε συντακτικά δέντρα που υποστηρίζε-ται από ένα σύνολο κανόνων εξαρτήσεων. Τα πειράματά μας δείχνουν ότι η ακρίβεια της τεχνικής μας ως προς τις παραπάνω εργασίες είναι συγκρίσιμη και σε ορισμένες περιπτώσεις καλύτερη από άλλες συναφείς μεθόδους.The web offers vast quantities of user-generated content, including reviews. These reviews, be they about products, services, books, music or movies, constitute a primary target for the application of opinion analysis techniques. We present Aspect Miner, an integrated opinion mining system tailored to user reviews published on the web. By leveraging the user ratings that typically accompany these reviews, Aspect Miner can be trained to distinguish not only positive from negative sentiment, but also between multiple sentiment intensity levels. Moreover, Aspect Miner is able to classify opinions on the sentence level as well as on the level of individual ratable aspects that are present in a sentence, and is adaptable to texts of any domain. The system is built around three core subtasks: (i) classification of subjective terms (ii) aspect identification and (iii) sentence sentiment analysis. For the first subtask, we propose a classification scheme that employs the user ratings in a training corpus. For the second one, we look into the LDA topic model as a means to identify and extract the features of the reviews items in the corpus and we attempt to address its inherent limitations by employing an additional post-processing step that aggregates multiple disparate feature models into a single concise one. Finally, in order to perform analysis on the sentence level, we make use of the results of the aforementioned subtasks together with a syntax-tree based linguistic method powered by a set of predefined typed dependency rules. Our experiments show that the accuracy of our approach on these specific tasks is at least comparable to – and under certain circumstances surpasses – a number of other popular sentiment analysis techniques

    Semantic and pragmatic characterization of learning objects

    Get PDF
    Tese de doutoramento. Engenharia Informática. Universidade do Porto. Faculdade de Engenharia. 201

    Genre and Domain Dependencies in Sentiment Analysis

    Get PDF
    Genre and domain influence an author\''s style of writing and therefore a text\''s characteristics. Natural language processing is prone to such variations in textual characteristics: it is said to be genre and domain dependent. This thesis investigates genre and domain dependencies in sentiment analysis. Its goal is to support the development of robust sentiment analysis approaches that work well and in a predictable manner under different conditions, i.e. for different genres and domains. Initially, we show that a prototypical approach to sentiment analysis -- viz. a supervised machine learning model based on word n-gram features -- performs differently on gold standards that originate from differing genres and domains, but performs similarly on gold standards that originate from resembling genres and domains. We show that these gold standards differ in certain textual characteristics, viz. their domain complexity. We find a strong linear relation between our approach\''s accuracy on a particular gold standard and its domain complexity, which we then use to estimate our approach\''s accuracy. Subsequently, we use certain textual characteristics -- viz. domain complexity, domain similarity, and readability -- in a variety of applications. Domain complexity and domain similarity measures are used to determine parameter settings in two tasks. Domain complexity guides us in model selection for in-domain polarity classification, viz. in decisions regarding word n-gram model order and word n-gram feature selection. Domain complexity and domain similarity guide us in domain adaptation. We propose a novel domain adaptation scheme and apply it to cross-domain polarity classification in semi- and unsupervised domain adaptation scenarios. Readability is used for feature engineering. We propose to adopt readability gradings, readability indicators as well as word and syntax distributions as features for subjectivity classification. Moreover, we generalize a framework for modeling and representing negation in machine learning-based sentiment analysis. This framework is applied to in-domain and cross-domain polarity classification. We investigate the relation between implicit and explicit negation modeling, the influence of negation scope detection methods, and the efficiency of the framework in different domains. Finally, we carry out a case study in which we transfer the core methods of our thesis -- viz. domain complexity-based accuracy estimation, domain complexity-based model selection, and negation modeling -- to a gold standard that originates from a genre and domain hitherto not used in this thesis

    Robust compositional polarity classification

    Full text link
    corecore