1,338 research outputs found

    HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

    Full text link
    When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendations. In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our approach utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to leverage the sensitivity of Bayes factors on the prior for comparing hypotheses with each other. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including website navigation, business reviews and online music played. Our work expands the repertoire of methods available for studying human trails on the Web.Comment: Published in the proceedings of WWW'1

    An automated Chinese text processing system (ACCESS): user-friendly interface and feature enhancement.

    Get PDF
    Suen Tow Sunny.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 65-67).Introduction --- p.1Chapter 1. --- ACCESS with an Extendible User-friendly X/Chinese Interface --- p.4Chapter 1.1. --- System requirement --- p.4Chapter 1.1.1. --- User interface issue --- p.4Chapter 1.1.2. --- Development issue --- p.5Chapter 1.2. --- Development decision --- p.6Chapter 1.2.1. --- X window system --- p.6Chapter 1.2.2. --- X/Chinese toolkit --- p.7Chapter 1.2.3. --- C language --- p.8Chapter 1.2.4. --- Source code control system --- p.8Chapter 1.3. --- System architecture --- p.9Chapter 1.4. --- User interface --- p.10Chapter 1.5. --- Sample screen --- p.13Chapter 1.6. --- System extension --- p.14Chapter 1.7. --- System portability --- p.18Chapter 2. --- Study on Algorithms for Automatically Correcting Characters in Chinese Cangjie-typed Text --- p.19Chapter 2.1. --- Chinese character input --- p.19Chapter 2.1.1. --- Chinese keyboards --- p.20Chapter 2.1.2. --- Keyboard redefinition scheme --- p.21Chapter 2.2. --- Cangjie input method --- p.24Chapter 2.3. --- Review on existing techniques for automatically correcting words in English text --- p.26Chapter 2.3.1. --- Nonword error detection --- p.27Chapter 2.3.2. --- Isolated-word error correction --- p.28Chapter 2.3.2.1. --- Spelling error patterns --- p.29Chapter 2.3.2.2. --- Correction techniques --- p.31Chapter 2.3.3. --- Context-dependent word correction research --- p.32Chapter 2.3.3.1. --- Natural language processing approach --- p.33Chapter 2.3.3.2. --- Statistical language model --- p.35Chapter 2.4. --- Research on error rates and patterns in Cangjie input method --- p.37Chapter 2.5. --- Similarities and differences between Chinese and English typed text --- p.41Chapter 2.5.1. --- Similarities --- p.41Chapter 2.5.2. --- Differences --- p.42Chapter 2.6. --- Proposed algorithm for automatic Chinese text correction --- p.44Chapter 2.6.1. --- Sentence level --- p.44Chapter 2.6.2. --- Part-of-speech level --- p.45Chapter 2.6.3. --- Character level --- p.47Conclusion --- p.50Appendix A Cangjie Radix Table --- p.51Appendix B Sample Text --- p.52Article 1 --- p.52Article 2 --- p.53Article 3 --- p.56Article 4 --- p.58Appendix C Error Statistics --- p.61References --- p.6

    Learning narrative structure from annotated folktales

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 97-100).Narrative structure is an ubiquitous and intriguing phenomenon. By virtue of structure we recognize the presence of Villainy or Revenge in a story, even if that word is not actually present in the text. Narrative structure is an anvil for forging new artificial intelligence and machine learning techniques, and is a window into abstraction and conceptual learning as well as into culture and its in influence on cognition. I advance our understanding of narrative structure by describing Analogical Story Merging (ASM), a new machine learning algorithm that can extract culturally-relevant plot patterns from sets of folktales. I demonstrate that ASM can learn a substantive portion of Vladimir Propp's in influential theory of the structure of folktale plots. The challenge was to take descriptions at one semantic level, namely, an event timeline as described in folktales, and abstract to the next higher level: structures such as Villainy, Stuggle- Victory, and Reward. ASM is based on Bayesian Model Merging, a technique for learning regular grammars. I demonstrate that, despite ASM's large search space, a carefully-tuned prior allows the algorithm to converge, and furthermore it reproduces Propp's categories with a chance-adjusted Rand index of 0.511 to 0.714. Three important categories are identied with F-measures above 0.8. The data are 15 Russian folktales, comprising 18,862 words, a subset of Propp's original tales. This subset was annotated for 18 aspects of meaning by 12 annotators using the Story Workbench, a general text-annotation tool I developed for this work. Each aspect was doubly-annotated and adjudicated at inter-annotator F-measures that cluster around 0.7 to 0.8. It is the largest, most deeply-annotated narrative corpus assembled to date. The work has significance far beyond folktales. First, it points the way toward important applications in many domains, including information retrieval, persuasion and negotiation, natural language understanding and generation, and computational creativity. Second, abstraction from natural language semantics is a skill that underlies many cognitive tasks, and so this work provides insight into those processes. Finally, the work opens the door to a computational understanding of cultural in influences on cognition and understanding cultural differences as captured in stories.by Mark Alan Finlayson.Ph.D

    Using Raster Sketches for Digital Image Retrieval

    Get PDF
    This research addresses the problem of content-based image retrieval using queries on image-object shape, completely in the raster domain. It focuses on the particularities of image databases encountered in typical topographic applications and presents the development of an environment for visual information management that enables such queries. The query consists of a user-provided raster sketch of the shape of an imaged object. The objective of the search is to retrieve images that contain an object sufficiently similar to the one specified in the query. The new contribution of this work combines the design of a comprehensive digital image database on-line query access strategy through the development of a feature library, image library and metadata library and the necessary matching tools. The matching algorithm is inspired by least-squares matching (lsm), and represents an extension of lsm to function with a variety of raster representations. The image retrieval strategy makes use of a hierarchical organization of linked feature (image-object) shapes within the feature library. The query results are ranked according to statistical scores and the user can subsequently narrow or broaden his/her search according to the previously obtained results and the purpose of the search

    Using Raster Sketches for Digital Image Retrieval

    Get PDF
    This research addresses the problem of content-based image retrieval using queries on image-object shape, completely in the raster domain. It focuses on the particularities of image databases encountered in typical topographic applications and presents the development of an environment for visual information management that enables such queries. The query consists of a user-provided raster sketch of the shape of an imaged object. The objective of the search is to retrieve images that contain an object sufficiently similar to the one specified in the query. The new contribution of this work combines the design of a comprehensive digital image database on-line query access strategy through the development of a feature library, image library and metadata library and the necessary matching tools. The matching algorithm is inspired by least-squares matching (lsm), and represents an extension of lsm to function with a variety of raster representations. The image retrieval strategy makes use of a hierarchical organization of linked feature (image-object) shapes within the feature library. The query results are ranked according to statistical scores and the user can subsequently narrow or broaden his/her search according to the previously obtained results and the purpose of the search

    Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

    Get PDF
    This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report. The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn

    Natural language processing meets business:algorithms for mining meaning from corporate texts

    Get PDF

    Natural language processing meets business:algorithms for mining meaning from corporate texts

    Get PDF
    • …
    corecore