2,753 research outputs found

    A Winnow-Based Approach to Context-Sensitive Spelling Correction

    Full text link
    A large class of machine-learning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of very high dimensionality, and their target concepts refer to only a small subset of the features in the space. Under such conditions, multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good theoretical properties. We present an algorithm combining variants of Winnow and weighted-majority voting, and apply it to a problem in the aforementioned class: context-sensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting "to" for "too", "casual" for "causal", etc. We evaluate our algorithm, WinSpell, by comparing it against BaySpell, a statistics-based method representing the state of the art for this task. We find: (1) When run with a full (unpruned) set of features, WinSpell achieves accuracies significantly higher than BaySpell was able to achieve in either the pruned or unpruned condition; (2) When compared with other systems in the literature, WinSpell exhibits the highest performance; (3) The primary reason that WinSpell outperforms BaySpell is that WinSpell learns a better linear separator; (4) When run on a test set drawn from a different corpus than the training set was drawn from, WinSpell is better able than BaySpell to adapt, using a strategy we will present that combines supervised learning on the training set with unsupervised learning on the (noisy) test set.Comment: To appear in Machine Learning, Special Issue on Natural Language Learning, 1999. 25 page

    Adaptive Representations for Tracking Breaking News on Twitter

    Full text link
    Twitter is often the most up-to-date source for finding and tracking breaking news stories. Therefore, there is considerable interest in developing filters for tweet streams in order to track and summarize stories. This is a non-trivial text analytics task as tweets are short, and standard retrieval methods often fail as stories evolve over time. In this paper we examine the effectiveness of adaptive mechanisms for tracking and summarizing breaking news stories. We evaluate the effectiveness of these mechanisms on a number of recent news events for which manually curated timelines are available. Assessments based on ROUGE metrics indicate that an adaptive approaches are best suited for tracking evolving stories on Twitter.Comment: 8 Pag

    Text Summarization Techniques: A Brief Survey

    Get PDF
    In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.Comment: Some of references format have update

    Conceptualizing and Measuring Well-Being Using Statistical Semantics and Numerical Rating Scales

    Get PDF
    How to define and measure individualsā€™ well-being is important, as this has an impact on both research and society at large. This thesis concerns how to define and measure the self-reported well-being of individuals, which involves both theorizing as well as developing and applying empirical and statistical methods in order to gain a better understanding of well-being.The first paper critically reviews the literature on well-being. It identifies an individualistic bias in current approaches and accompanying measures related to well-being and happiness; for example, through an over-emphasis on the importance of self-centered aspects of well-being (e.g., the unprecedented focus on satisfaction with life) whilst disregarding the importance of harmony in life, interconnectedness and psychological balance in relation to well- being. It is also discussed how closed-ended well-being measures impose the researchersā€™ values and limit the ability of respondents to express themselves in regard to their perceived well-being.The second paper addresses concerns regarding this individualistic bias by developing the harmony in life scale, which focuses on interconnectedness and psychological balance. In addition, an open-ended approach is developed in the paper, allowing individuals to freely describe their pursuit of well-being by means of open-ended responses analyzed using statistical semantics (including techniques from artificial intelligence such as natural language processing and machine learning). The results show that the harmony in life scale and the traditional satisfaction with life scale form a two-factor model of well-being, where the harmony in life scale explains more unique variance in measures of psychological well-being, stress, depression and anxiety, but not happiness. It is further demonstrated that participants describe their pursuit of harmony in life using words related to interconnectedness (including words such as: peace, balance, cooperation), whereas they describe their pursuit of satisfaction with life using words related to independence (including words such as: money, achievement, fulfillment). It is concluded that the harmony in life scale complements the satisfaction with life scale for a more comprehensive understanding of subjective well-being.The third paper focuses on developing and evaluating a method for measuring and describing psychological constructs using open-ended questions analyzed by means of statistical semantics rather than closed-ended numerical rating scales. This semantic measures approach is tested and compared with traditional rating scales in nine studies, including two different paradigms involving reports regarding objective stimuli (i.e., the evaluation of facial expressions) and reports regarding subjective states (i.e., the self-reporting of harmony in life, satisfaction with life, depression and worry). The results indicate that semantic measures encompass higher, or competitive, levels of reliability and validity compared to traditional numerical rating scales. In addition, semantic measures appear to be better suited for differentiating between psychological constructs, such as harmony in life versus satisfaction with life as well as depression versus worry.In this thesis, the findings from these three papers are elaborated and integrated into two independent perspectives. The first perspective focuses on the theoretical and empirical differences between harmony in life and satisfaction with life within a context of societal and national progress. It is concluded that harmony in life complements satisfaction with life. The second perspective focuses on the open-ended, statistical semantics approach. It is proposed that statistical semantics may beneficially be used more widely as a research tool within psychological research
    • ā€¦
    corecore