4,431 research outputs found
Talk it up! — Integrating and prioritizing conversational data in documentation
Syllabus for workshop, CoLang 2016This course will introduce participants to some of the basic methodological and theoretical issues related to recording and analyzing everyday conversations. We will discuss specific contributions of naturalistic interactions to understanding aspects of linguistic structure, social interaction, and culture and explore how interactional data can be better integrated into language documentation projects.2015 NSF/BCS 1500841: CoLang 2016: Institute on Collaborative Language Research – ALASKA
Alaska Native Language Cente
Recent Developments in Document Clustering
This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clustering algorithms are considered with two hypothetical scenarios in mind: online query clustering with tight efficiency constraints, and offline clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a table summarizing important properties, and open problems as well as directions for future research are discussed
Clustering for Data Reduction: A Divide and Conquer Approach
We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our "divide and conquer" approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items
- …
