436 research outputs found

    A framework for the Comparative analysis of text summarization techniques

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceWe see that with the boom of information technology and IOT (Internet of things), the size of information which is basically data is increasing at an alarming rate. This information can always be harnessed and if channeled into the right direction, we can always find meaningful information. But the problem is this data is not always numerical and there would be problems where the data would be completely textual, and some meaning has to be derived from it. If one would have to go through these texts manually, it would take hours or even days to get a concise and meaningful information out of the text. This is where a need for an automatic summarizer arises easing manual intervention, reducing time and cost but at the same time retaining the key information held by these texts. In the recent years, new methods and approaches have been developed which would help us to do so. These approaches are implemented in lot of domains, for example, Search engines provide snippets as document previews, while news websites produce shortened descriptions of news subjects, usually as headlines, to make surfing easier. Broadly speaking, there are mainly two ways of text summarization – extractive and abstractive summarization. Extractive summarization is the approach in which important sections of the whole text are filtered out to form the condensed form of the text. While the abstractive summarization is the approach in which the text as a whole is interpreted and examined and after discerning the meaning of the text, sentences are generated by the model itself describing the important points in a concise way

    Using Multi-Label Multi-Class Support Vector Machines with Semantic and Lexical Features for Aspect Category Detection

    Get PDF
    In contrast to the aspects, aspect categories are often coarser and don't always appear as terms in sentences. Besides, the typical way to element the types associated with part is generally grainier concerning factors and doesn't exist within verdicts. The primary intent of the study is to investigate the efficacy of Lexicon, linguistic, vector-based, and features correlated to semantics within the aspect of the responsibility built with the finding of aspect category detection ACD). Semantic and emotional data are captured via vector-based features. Further, it examines vector-based feature superiority issues within the compression of features of text-based characteristics. Study purposes to the linguistic efficacy with the Lexicon, linguistic, and semantic features, also vector-based dependent to the system. Also, the information led with vector-based features that capture the semantic with sentimental analysis characteristics. With the experimental outcomes, the performance efficacy with the vector-based features outperformed text-based features. The methodologies associated with deep learning have generated features within the vector orientation relevant to the word-based structures. Therefore, the proposed method achieved effectiveness with the determined constraints by applying the metrics of precision, recall, and F1 scores. Correlating with the performance of ABSA's state-of-the-art techniques, the proposed research process gained superior outcomes

    Resolving pronominal anaphora using commonsense knowledge

    Get PDF
    Coreference resolution is the task of resolving all expressions in a text that refer to the same entity. Such expressions are often used in writing and speech as shortcuts to avoid repetition. The most frequent form of coreference is the anaphor. To resolve anaphora not only grammatical and syntactical strategies are required, but also semantic approaches should be taken into consideration. This dissertation presents a framework for automatically resolving pronominal anaphora by integrating recent findings from the field of linguistics with new semantic features. Commonsense knowledge is the routine knowledge people have of the everyday world. Because such knowledge is widely used it is frequently omitted from social communications such as texts. It is understandable that without this knowledge computers will have difficulty making sense of textual information. In this dissertation a new set of computational and linguistic features are used in a supervised learning approach to resolve the pronominal anaphora in document. Commonsense knowledge sources such as ConceptNet and WordNet are used and similarity measures are extracted to uncover the elaborative information embedded in the words that can help in the process of anaphora resolution. The anaphoric system is tested on 350 Wall Street Journal articles from the BBN corpus. When compared with other systems available such as BART (Versley et al. 2008) and Charniak and Elsner 2009, our system performed better and also resolved a much wider range of anaphora. We were able to achieve a 92% F-measure on the BBN corpus and an average of 85% F-measure when tested on other genres of documents such as children stories and short stories selected from the web

    A Domain Oriented LDA Model for Mining Product Defects from Online Customer Reviews

    Get PDF
    Online reviews provide important demand-side knowledge for product manufacturers to improve product quality. However, discovering and quantifying potential products’ defects from large amounts of online reviews is a nontrivial task. In this paper, we propose a Latent Product Defect Mining model that identifies critical product defects. We define domain-oriented key attributes, such as components and keywords used to describe a defect, and build a novel LDA model to identify and acquire integral information about product defects. We conduct comprehensive evaluations including quantitative and qualitative evaluations to ensure the quality of discovered information. Experimental results show that the proposed model outperforms the standard LDA model, and could find more valuable information. Our research contributes to the extant product quality analytics literature and has significant managerial implications for researchers, policy makers, customers, and practitioners
    corecore