143 research outputs found

    Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming

    Full text link

    Keywords at Work: Investigating Keyword Extraction in Social Media Applications

    Full text link
    This dissertation examines a long-standing problem in Natural Language Processing (NLP) -- keyword extraction -- from a new angle. We investigate how keyword extraction can be formulated on social media data, such as emails, product reviews, student discussions, and student statements of purpose. We design novel graph-based features for supervised and unsupervised keyword extraction from emails, and use the resulting system with success to uncover patterns in a new dataset -- student statements of purpose. Furthermore, the system is used with new features on the problem of usage expression extraction from product reviews, where we obtain interesting insights. The system while used on student discussions, uncover new and exciting patterns. While each of the above problems is conceptually distinct, they share two key common elements -- keywords and social data. Social data can be messy, hard-to-interpret, and not easily amenable to existing NLP resources. We show that our system is robust enough in the face of such challenges to discover useful and important patterns. We also show that the problem definition of keyword extraction itself can be expanded to accommodate new and challenging research questions and datasets.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145929/1/lahiri_1.pd

    Opinion Expression Mining by Exploiting Keyphrase Extraction

    Get PDF

    Sentence Compressor

    Get PDF
    Nowadays, internet becomes the main source of information. Most people rely on the internet to find the information for research and assignments. People will try to find the right articles, journals, or web pages that are related to their task. In order to choose the right materials, they have to go through every articles, journals, and web pages to find the important points. However, it is very time-consuming to find go through every long articles. This information explosion as led to a constant state of information overload problem. As the solution, a desktop application named Sentence Compressor is developed to compress the long articles. This project aims to develop a desktop application that shortens the length of the long sentences without changing the original meaning. Integer Linear Programming (ILP) techniques is used to solve the sentence compression problem. Bilingual Evaluation Understudy (BLEU) is used to measure the quality of the produced output. Five articles were randomly selected for the experiment. The BLEU score for the articles compressed by Sentence Compressor and articles compressed by human is compared. The system performance evaluation is also done to measure the usefulness of this application. More than 65% of the respondents agreed that Sentence Compressor is useful in information searching

    Automatic Keyphrase Extraction: A Survey of the State of the Art

    Full text link

    A tree based keyphrase extraction technique for academic literature

    Get PDF
    Automatic keyphrase extraction techniques aim to extract quality keyphrases to summarize a document at a higher level. Among the existing techniques some of them are domain-specific and require application domain knowledge, some of them are based on higher-order statistical methods and are computationally expensive, and some of them require large train data which are rare for many applications. Overcoming these issues, this thesis proposes a new unsupervised automatic keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, which is domain-independent, employs limited statistical knowledge, and requires no train data. The proposed technique also introduces a new variant of the binary tree, called KeyPhrase Extraction (KePhEx) tree to extract final keyphrases from candidate keyphrases. Depending on the candidate keyphrases the KePhEx tree structure is either expanded or shrunk or maintained. In addition, a measure, called Cohesiveness Index or CI, is derived that denotes the degree of cohesiveness of a given node with respect to the root which is used in extracting final keyphrases from a resultant tree in a flexible manner and is utilized in ranking keyphrases alongside Term Frequency. The effectiveness of the proposed technique is evaluated using an experimental evaluation on a benchmark corpus, called SemEval-2010 with total 244 train and test articles, and compared with other relevant unsupervised techniques by taking the representatives from both statistical (such as Term Frequency-Inverse Document Frequency and YAKE) and graph-based techniques (PositionRank, CollabRank (SingleRank), TopicRank, and MultipartiteRank) into account. Three evaluation metrics, namely precision, recall and F1 score are taken into consideration during the experiments. The obtained results demonstrate the improved performance of the proposed technique over other similar techniques in terms of precision, recall, and F1 scores

    Automatic Summarization for Student Reflective Responses

    Get PDF
    Educational research has demonstrated that asking students to respond to reflection prompts can improve both teaching and learning. However, summarizing student responses to these prompts is an onerous task for humans and poses challenges for existing summarization methods. From the input perspective, there are three challenges. First, there is a lexical variety problem due to the fact that different students tend to use different expressions. Second, there is a length variety problem that student inputs range from single words to multiple sentences. Third, there is a redundancy issue since some content among student responses are not useful. From the output perspective, there are two additional challenges. First, the human summaries consist of a list of important phrases instead of sentences. Second, from an instructor's perspective, the number of students who have a particular problem or are interested in a particular topic is valuable. The goal of this research is to enhance student response summarization at multiple levels of granularity. At the sentence level, we propose a novel summarization algorithm by extending traditional ILP-based framework with a low-rank matrix approximation to address the challenge of lexical variety. At the phrase level, we propose a phrase summarization framework by a combination of phrase extraction, phrase clustering, and phrase ranking. Experimental results show the effectiveness on multiple student response data sets. Also at the phrase level, we propose a quantitative phrase summarization algorithm in order to estimate the number of students who semantically mention the phrases in a summary. We first introduce a new phrase-based highlighting scheme for automatic summarization. It highlights the phrases in the human summaries and also the corresponding semantically-equivalent phrases in student responses. Enabled by the highlighting scheme, we improve the previous phrase-based summarization framework by developing a supervised candidate phrase extraction, learning to estimate the phrase similarities, and experimenting with different clustering algorithms to group phrases into clusters. Experimental results show that our proposed methods not only yield better summarization performance evaluated using ROUGE, but also produce summaries that capture the pressing student needs

    大規模な説明文つき画像を用いたキーフレーズ推定に基づく画像説明文の自動生成

    Get PDF
    学位の種別:課程博士University of Tokyo(東京大学

    Recent Advances in Social Data and Artificial Intelligence 2019

    Get PDF
    The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace
    corecore