Search CORE

4,431 research outputs found

Talk it up! — Integrating and prioritizing conversational data in documentation

Author: Fox Barbara
Stenzel Kristine
Williams Nicholas
Publication venue
Publication date: 01/06/2016
Field of study

Syllabus for workshop, CoLang 2016This course will introduce participants to some of the basic methodological and theoretical issues related to recording and analyzing everyday conversations. We will discuss specific contributions of naturalistic interactions to understanding aspects of linguistic structure, social interaction, and culture and explore how interactional data can be better integrated into language documentation projects.2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language Research – ALASKA Alaska Native Language Cente

ScholarWorks@UA

Recent Developments in Document Clustering

Author: Andrews Nicholas O.
Fox Edward A.
Publication venue
Publication date: 01/10/2007
Field of study

This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clustering algorithms are considered with two hypothetical scenarios in mind: online query clustering with tight efficiency constraints, and offline clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a table summarizing important properties, and open problems as well as directions for future research are discussed

Computer Science Technical Reports @Virginia Tech

Clustering for Data Reduction: A Divide and Conquer Approach

Author: Andrews Nicholas O.
Fox Edward A.
Publication venue
Publication date: 01/01/2007
Field of study

We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our "divide and conquer" approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items

Computer Science Technical Reports @Virginia Tech

CiteSeerX