18 research outputs found

    A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives

    Full text link
    We present an open source morphological analyzer for Japanese nouns, verbs and adjectives. The system builds upon the morphological analyzing capabilities of MeCab to incorporate finer details of classification such as politeness, tense, mood and voice attributes. We implemented our analyzer in the form of a finite state transducer using the open source finite state compiler FOMA toolkit. The source code and tool is available at https://bitbucket.org/skylander/yc-nlplab/

    The Utility of Text: The Case of Amicus Briefs and the Supreme Court

    Full text link
    We explore the idea that authoring a piece of text is an act of maximizing one's expected utility. To make this idea concrete, we consider the societally important decisions of the Supreme Court of the United States. Extensive past work in quantitative political science provides a framework for empirically modeling the decisions of justices and how they relate to text. We incorporate into such a model texts authored by amici curiae ("friends of the court" separate from the litigants) who seek to weigh in on the decision, then explicitly model their goals in a random utility model. We demonstrate the benefits of this approach in improved vote prediction and the ability to perform counterfactual analysis.Comment: Working draf

    Learning Topics and Positions from Debatepedia

    Get PDF
    We explore Debatepedia, a community-authored encyclopedia of sociopolitical de-bates, as evidence for inferring a low-dimensional, human-interpretable representa-tion in the domain of issues and positions. We introduce a generative model positing latent topics and cross-cutting positions that gives special treatment to person mentions and opin-ion words. We evaluate the resulting repre-sentation’s usefulness in attaching opinionated documents to arguments and its consistency with human judgments about positions.

    Text as Strategic Choice

    No full text
    People write everyday — articles, blogs, emails — with a purpose and an audience in mind. Politicians adapt their speeches to convince their audience, news media slant stories for their market, and teenagers on social media seek social status among their peers through their posts. Hence, language is purposeful and strategic. In this thesis, we introduce a framework for text analysis that make explicit the purposefulness of the author and develop methods that consider the interaction between the author, her text, and her audience’s responses. We frame the authoring process as a decision theoretic problem — the observed text is the result of an author maximizing her utility.  We will explore this perspective by developing a set of novel statistical models that characterize authors’ strategic behaviors through their utility functions. We consider three particular domains — political campaigns, the scientific community, and the judiciary — using our models and develop the necessary tools to evaluate our assumptions and hypotheses. In each of these domains, our models yield better response prediction accuracy and provide an interpretable means of investigating the underlying processes. Together, they exemplify our approach to text modeling and data exploration.  Throughout this thesis, we will illustrate how our models can be used as tools for in-depth exploration of text data and hypothesis generation. </p

    A Utility Model of Authors in the Scientific Community

    No full text
    Authoring a scientific paper is a complex process involving many decisions. We in-troduce a probabilistic model of some of the important aspects of that process: that authors have individual preferences, that writing a paper requires trading off among the preferences of authors as well as ex-trinsic rewards in the form of commu-nity response to their papers, that prefer-ences (of individuals and the community) and tradeoffs vary over time. Variants of our model lead to improved predictive ac-curacy of citations given texts and texts given authors. Further, our model’s pos-terior suggests an interesting relationship between seniority and author choices.

    A Probabilistic Model for Canonicalizing Named Entity Mentions

    No full text
    <p>We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.</p

    Discovering Factions in the Computational Linguistics Community

    No full text
    <p>We present a joint probabilistic model of who cites whom in computational linguistics, and also of the words they use to do the citing. The model reveals latent factions, or groups of individuals whom we expect to collaborate more closely within their faction, cite within the faction using language distinct from citation outside the faction, and be largely understandable through the language used when cited from without. We conduct an exploratory data analysis on the ACL Anthology. We extend the model to reveal changes in some authors’ faction memberships over time.</p
    corecore