569 research outputs found

    Chinese Subjective Sentence Extraction Based on Dictionary and Combination Classifiers

    Get PDF
    AbstractFor extracting of Chinese subjective sentence, this paper proposes a new dictionary-based extraction method and a novel classifier combination strategy. For the first method, we use the training data to score the subjective dictionary, which was composed of indicative verb, indicative adverbs, sentiment words, interjection and punctuation. Then we use the dictionary to score the test data, and filter the sentences by setting a reasonable threshold. New classifier combination strategies base on the maximum error correction capability. To enhance the accuracy, the method improves the traditional single error correction and achieves the dual error correction both in positive and negative classes. Experimental results show that the two methods are effective .And the final results show that the combination of two ways achieves a satisfactory subjective sentence extraction performance

    Basic tasks of sentiment analysis

    Full text link
    Subjectivity detection is the task of identifying objective and subjective sentences. Objective sentences are those which do not exhibit any sentiment. So, it is desired for a sentiment analysis engine to find and separate the objective sentences for further analysis, e.g., polarity detection. In subjective sentences, opinions can often be expressed on one or multiple topics. Aspect extraction is a subtask of sentiment analysis that consists in identifying opinion targets in opinionated text, i.e., in detecting the specific aspects of a product or service the opinion holder is either praising or complaining about

    Harvesting and summarizing user-generated content for advanced speech-based human-computer interaction

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 155-164).There have been many assistant applications on mobile devices, which could help people obtain rich Web content such as user-generated data (e.g., reviews, posts, blogs, and tweets). However, online communities and social networks are expanding rapidly and it is impossible for people to browse and digest all the information via simple search interface. To help users obtain information more efficiently, both the interface for data access and the information representation need to be improved. An intuitive and personalized interface, such as a dialogue system, could be an ideal assistant, which engages a user in a continuous dialogue to garner the user's interest and capture the user's intent, and assists the user via speech-navigated interactions. In addition, there is a great need for a type of application that can harvest data from the Web, summarize the information in a concise manner, and present it in an aggregated yet natural way such as direct human dialogue. This thesis, therefore, aims to conduct research on a universal framework for developing speech-based interface that can aggregate user-generated Web content and present the summarized information via speech-based human-computer interaction. To accomplish this goal, several challenges must be met. Firstly, how to interpret users' intention from their spoken input correctly? Secondly, how to interpret the semantics and sentiment of user-generated data and aggregate them into structured yet concise summaries? Lastly, how to develop a dialogue modeling mechanism to handle discourse and present the highlighted information via natural language? This thesis explores plausible approaches to tackle these challenges. We will explore a lexicon modeling approach for semantic tagging to improve spoken language understanding and query interpretation. We will investigate a parse-and-paraphrase paradigm and a sentiment scoring mechanism for information extraction from unstructured user-generated data. We will also explore sentiment-involved dialogue modeling and corpus-based language generation approaches for dialogue and discourse. Multilingual prototype systems in multiple domains have been implemented for demonstration.by Jingjing Liu.Ph.D

    Automatic Population of Structured Reports from Narrative Pathology Reports

    Get PDF
    There are a number of advantages for the use of structured pathology reports: they can ensure the accuracy and completeness of pathology reporting; it is easier for the referring doctors to glean pertinent information from them. The goal of this thesis is to extract pertinent information from free-text pathology reports and automatically populate structured reports for cancer diseases and identify the commonalities and differences in processing principles to obtain maximum accuracy. Three pathology corpora were annotated with entities and relationships between the entities in this study, namely the melanoma corpus, the colorectal cancer corpus and the lymphoma corpus. A supervised machine-learning based-approach, utilising conditional random fields learners, was developed to recognise medical entities from the corpora. By feature engineering, the best feature configurations were attained, which boosted the F-scores significantly from 4.2% to 6.8% on the training sets. Without proper negation and uncertainty detection, the quality of the structured reports will be diminished. The negation and uncertainty detection modules were built to handle this problem. The modules obtained overall F-scores ranging from 76.6% to 91.0% on the test sets. A relation extraction system was presented to extract four relations from the lymphoma corpus. The system achieved very good performance on the training set, with 100% F-score obtained by the rule-based module and 97.2% F-score attained by the support vector machines classifier. Rule-based approaches were used to generate the structured outputs and populate them to predefined templates. The rule-based system attained over 97% F-scores on the training sets. A pipeline system was implemented with an assembly of all the components described above. It achieved promising results in the end-to-end evaluations, with 86.5%, 84.2% and 78.9% F-scores on the melanoma, colorectal cancer and lymphoma test sets respectively

    On the integration of conceptual hierarchies with deep learning for explainable open-domain question answering

    Get PDF
    Question Answering, with its potential to make human-computer interactions more intuitive, has had a revival in recent years with the influx of deep learning methods into natural language processing and the simultaneous adoption of personal assistants such as Siri, Google Now, and Alexa. Unfortunately, Question Classification, an essential element of question answering, which classifies questions based on the class of the expected answer had been overlooked. Although the task of question classification was explicitly developed for use in question answering systems, the more advanced task of question classification, which classifies questions into between fifty and a hundred question classes, had developed into independent tasks with no application in question answering. The work presented in this thesis bridges this gap by making use of fine-grained question classification for answer selection, arguably the most challenging subtask of question answering, and hence the defacto standard of measure of its performance on question answering. The use of question classification in a downstream task required significant improvement to question classification, which was achieved in this work by integrating linguistic information and deep learning through what we call Types, a novel method of representing Concepts. Our work on a purely rule-based system for fine-grained Question Classification using Types achieved an accuracy of 97.2%, close to a 6 point improvement over the previous state of the art and has remained state of the art in question classification for over two years. The integration of these question classes and a deep learning model for Answer Selection resulted in MRR and MAP scores which outperform the current state of the art by between 3 and 5 points on both versions of a standard test set
    • …
    corecore