Search CORE

18 research outputs found

A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives

Author: Sim Yanchuan
Publication venue
Publication date: 15/12/2014
Field of study

We present an open source morphological analyzer for Japanese nouns, verbs and adjectives. The system builds upon the morphological analyzing capabilities of MeCab to incorporate finer details of classification such as politeness, tense, mood and voice attributes. We implemented our analyzer in the form of a finite state transducer using the open source finite state compiler FOMA toolkit. The source code and tool is available at https://bitbucket.org/skylander/yc-nlplab/

arXiv.org e-Print Archive

CiteSeerX

The Utility of Text: The Case of Amicus Briefs and the Supreme Court

Author: Routledge Bryan
Sim Yanchuan
Smith Noah A.
Publication venue
Publication date: 25/11/2014
Field of study

We explore the idea that authoring a piece of text is an act of maximizing one's expected utility. To make this idea concrete, we consider the societally important decisions of the Supreme Court of the United States. Extensive past work in quantitative political science provides a framework for empirically modeling the decisions of justices and how they relate to text. We incorporate into such a model texts authored by amici curiae ("friends of the court" separate from the litigants) who seek to weigh in on the decision, then explicitly model their goals in a random utility model. We demonstrate the benefits of this approach in improved vote prediction and the ability to perform counterfactual analysis.Comment: Working draf

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning Topics and Positions from Debatepedia

Author: GOTTOPATI Swapna
JIANG Jing
QIU Minghui
SIM Yanchuan
SMITH Noah
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/10/2013
Field of study

We explore Debatepedia, a community-authored encyclopedia of sociopolitical de-bates, as evidence for inferring a low-dimensional, human-interpretable representa-tion in the domain of issues and positions. We introduce a generative model positing latent topics and cross-cutting positions that gives special treatment to person mentions and opin-ion words. We evaluate the resulting repre-sentation’s usefulness in attaching opinionated documents to arguments and its consistency with human judgments about positions.

CiteSeerX

Institutional Knowledge at Singapore Management University

Text as Strategic Choice

Author: Yanchuan Sim (5361935)
Publication venue
Publication date: 02/12/2022
Field of study

People write everyday — articles, blogs, emails — with a purpose and an audience in mind. Politicians adapt their speeches to convince their audience, news media slant stories for their market, and teenagers on social media seek social status among their peers through their posts. Hence, language is purposeful and strategic. In this thesis, we introduce a framework for text analysis that make explicit the purposefulness of the author and develop methods that consider the interaction between the author, her text, and her audience’s responses. We frame the authoring process as a decision theoretic problem — the observed text is the result of an author maximizing her utility. We will explore this perspective by developing a set of novel statistical models that characterize authors’ strategic behaviors through their utility functions. We consider three particular domains — political campaigns, the scientific community, and the judiciary — using our models and develop the necessary tools to evaluate our assumptions and hypotheses. In each of these domains, our models yield better response prediction accuracy and provide an interpretable means of investigating the underlying processes. Together, they exemplify our approach to text modeling and data exploration. Throughout this thesis, we will illustrate how our models can be used as tools for in-depth exploration of text data and hypothesis generation. </p

FigShare

A Utility Model of Authors in the Scientific Community

Author: Bryan R. Routledge
Noah A. Smith
Yanchuan Sim
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Authoring a scientific paper is a complex process involving many decisions. We in-troduce a probabilistic model of some of the important aspects of that process: that authors have individual preferences, that writing a paper requires trading off among the preferences of authors as well as ex-trinsic rewards in the form of commu-nity response to their papers, that prefer-ences (of individuals and the community) and tradeoffs vary over time. Variants of our model lead to improved predictive ac-curacy of citations given texts and texts given authors. Further, our model’s pos-terior suggests an interesting relationship between seniority and author choices.

CiteSeerX

Crossref

Modeling User Arguments, Interactions and Attributes for Stance Prediction in Online Debate Forums

Author: Jing JIANG
QIU Minghui
SIM Yanchuan
SMITH Noah A.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 02/05/2015
Field of study

Crossref

Institutional Knowledge at Singapore Management University

A Probabilistic Model for Canonicalizing Named Entity Mentions

Author: Dani Yogatama (5361938)
Noah A. Smith (663492)
Yanchuan Sim (5361935)
Publication venue
Publication date: 29/06/2018
Field of study

<p>We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.</p

FigShare

Discovering Factions in the Computational Linguistics Community

Author: David A. Smith (2577271)
Noah A. Smith (663492)
Yanchuan Sim (5361935)
Publication venue
Publication date: 29/06/2018
Field of study

<p>We present a joint probabilistic model of who cites whom in computational linguistics, and also of the words they use to do the citing. The model reveals latent factions, or groups of individuals whom we expect to collaborate more closely within their faction, cite within the faction using language distinct from citation outside the faction, and be largely understandable through the language used when cited from without. We conduct an exploratory data analysis on the ACL Anthology. We extend the model to reveal changes in some authors’ faction memberships over time.</p

FigShare