288 research outputs found
A grammar of Awtuw
LoC Class: PL6603, LoC Subject Headings: Awtuw language--Grammar, Papuan languages--Grammar, Sandaun Province (Papua New Guinea)--Language
Streaming Coreset Constructions for M-Estimators
We introduce a new method of maintaining a (k,epsilon)-coreset for clustering M-estimators over insertion-only streams. Let (P,w) be a weighted set (where w : P - > [0,infty) is the weight function) of points in a rho-metric space (meaning a set X equipped with a positive-semidefinite symmetric function D such that D(x,z) <=rho(D(x,y) + D(y,z)) for all x,y,z in X). For any set of points C, we define COST(P,w,C) = sum_{p in P} w(p) min_{c in C} D(p,c). A (k,epsilon)-coreset for (P,w) is a weighted set (Q,v) such that for every set C of k points, (1-epsilon)COST(P,w,C) <= COST(Q,v,C) <= (1+epsilon)COST(P,w,C). Essentially, the coreset (Q,v) can be used in place of (P,w) for all operations concerning the COST function. Coresets, as a method of data reduction, are used to solve fundamental problems in machine learning of streaming and distributed data.
M-estimators are functions D(x,y) that can be written as psi(d(x,y)) where ({X}, d) is a true metric (i.e. 1-metric) space. Special cases of M-estimators include the well-known k-median (psi(x) =x) and k-means (psi(x) = x^2) functions. Our technique takes an existing offline construction for an M-estimator coreset and converts it into the streaming setting, where n data points arrive sequentially. To our knowledge, this is the first streaming construction for any M-estimator that does not rely on the merge-and-reduce tree. For example, our coreset for streaming metric k-means uses O(epsilon^{-2} k log k log n) points of storage. The previous state-of-the-art required storing at least O(epsilon^{-2} k log k log^{4} n) points
New Frameworks for Offline and Streaming Coreset Constructions
A coreset for a set of points is a small subset of weighted points that
approximately preserves important properties of the original set. Specifically,
if is a set of points, is a set of queries, and is a cost function, then a set with weights
is an -coreset for some parameter if
is a multiplicative approximation to
for all . Coresets are used to solve fundamental
problems in machine learning under various big data models of computation. Many
of the suggested coresets in the recent decade used, or could have used a
general framework for constructing coresets whose size depends quadratically on
what is known as total sensitivity .
In this paper we improve this bound from to . Thus our
results imply more space efficient solutions to a number of problems, including
projective clustering, -line clustering, and subspace approximation.
Moreover, we generalize the notion of sensitivity sampling for sup-sampling
that supports non-multiplicative approximations, negative cost functions and
more. The main technical result is a generic reduction to the sample complexity
of learning a class of functions with bounded VC dimension. We show that
obtaining an -sample for this class of functions with appropriate
parameters and suffices to achieve space efficient
-coresets.
Our result implies more efficient coreset constructions for a number of
interesting problems in machine learning; we show applications to
-median/-means, -line clustering, -subspace approximation, and the
integer -projective clustering problem
The Temperature and Density Structure of the Solar Corona. I. Observations of the Quiet Sun with the EUV Imaging Spectrometer (EIS) on Hinode
Measurements of the temperature and density structure of the solar corona
provide critical constraints on theories of coronal heating. Unfortunately, the
complexity of the solar atmosphere, observational uncertainties, and the
limitations of current atomic calculations, particularly those for Fe, all
conspire to make this task very difficult. A critical assessment of plasma
diagnostics in the corona is essential to making progress on the coronal
heating problem. In this paper we present an analysis of temperature and
density measurements above the limb in the quiet corona using new observations
from the EUV Imaging Spectrometer (EIS) on \textit{Hinode}. By comparing the Si
and Fe emission observed with EIS we are able to identify emission lines that
yield consistent emission measure distributions. With these data we find that
the distribution of temperatures in the quiet corona above the limb is strongly
peaked near 1 MK, consistent with previous studies. We also find, however, that
there is a tail in the emission measure distribution that extends to higher
temperatures. EIS density measurements from several density sensitive line
ratios are found to be generally consistent with each other and with previous
measurements in the quiet corona. Our analysis, however, also indicates that a
significant fraction of the weaker emission lines observed in the EIS
wavelength ranges cannot be understood with current atomic data.Comment: Submitted to Ap
A grammar of Awtuw
The aim of this thesis is to describe the structure of the Awtuw
language, spoken by about 400 people in the southern foothills of the
Torricelli Mountains of northwestern Papua New Guinea.
A brief preface presents my theoretical assumptions and methodological
orientation. Language is viewed as a cultural phenomenon which, while
by no means discrete from other facets of culture, has a distinct
central focus that may be described independently without severe
distortion. Grammatical classes and categories are isolated on the
basis of language-internal morphosyntactic criteria and correlated
with semantic functions.
The introductory chapter places Awtuw in its geographical, cultural,
and linguistic context, identifies the three dialects of Awtuw, and
discusses the ubiquitous phenomenon of multilingualism in the
Awtuw-speaking and surrounding area.
Chapter 2 presents a brief description of Awtuw's phonemes and
formalizes the major morphophonological processes. Awtuw has eleven
phonemic consonants and seven vowels isolated on the basis of minimal
pairs. Morphophonemic rules simplify geminates and certain other
consonant clusters, elide vowels, assimilate nasals to following
stops, and insert epenthetic vowels. There are also a number of vowel
harmony rules that assimilate affix vowels to stem vowels. Chapters 3 through 6 present an analysis of various morphosyntactic
phenomena. Chapter 3 devises a number of formal identifying criteria
which are used as binary features to analyze Awtuw's parts-of-speech
classes. Chapter 4 describes the structure of the verb complex and
the categories represented by verbal affixes, and presents a feature-based analysis of the Tense, Mood, and Aspect system. Chapter
5 begins with a discussion of grammatical relations, classifies verb
roots on the basis of the case frames that they occur in, and
correlates these classes with inherent aspect and other semantic
categories. Chapter 6 describes the case-marking suffixes and their
functions.
Chapters 7 through 10 focus on aspects of Awtuw syntactic structure.
Chapter 7 describes the structure of the Noun Phrase. Chapter 8
presents a classification of verbless predication types. Chapter 9
discusses a variety of operations on the clause, including
question-formation, negation, reflexivization, and focusing of
constituents. And Chapter 10 analyzes interpredicate and interclausal
relations. It includes discussion of various types of verb
serialization, complementation, relative clauses, adverbial clauses,
conditionals, and coordinate constructions.
Chapter 11 begins with an analysis of Awtuw kinship terminology and
goes on to discuss color terminology, numeration and measurement, body
part terminology, and the terms for major biological classes.
Finally, Chapter 12 presents a brief description of a variety of
paralinguistic phenomena
- …