288 research outputs found

    A grammar of Awtuw

    Get PDF
    LoC Class: PL6603, LoC Subject Headings: Awtuw language--Grammar, Papuan languages--Grammar, Sandaun Province (Papua New Guinea)--Language

    Streaming Coreset Constructions for M-Estimators

    Get PDF
    We introduce a new method of maintaining a (k,epsilon)-coreset for clustering M-estimators over insertion-only streams. Let (P,w) be a weighted set (where w : P - > [0,infty) is the weight function) of points in a rho-metric space (meaning a set X equipped with a positive-semidefinite symmetric function D such that D(x,z) <=rho(D(x,y) + D(y,z)) for all x,y,z in X). For any set of points C, we define COST(P,w,C) = sum_{p in P} w(p) min_{c in C} D(p,c). A (k,epsilon)-coreset for (P,w) is a weighted set (Q,v) such that for every set C of k points, (1-epsilon)COST(P,w,C) <= COST(Q,v,C) <= (1+epsilon)COST(P,w,C). Essentially, the coreset (Q,v) can be used in place of (P,w) for all operations concerning the COST function. Coresets, as a method of data reduction, are used to solve fundamental problems in machine learning of streaming and distributed data. M-estimators are functions D(x,y) that can be written as psi(d(x,y)) where ({X}, d) is a true metric (i.e. 1-metric) space. Special cases of M-estimators include the well-known k-median (psi(x) =x) and k-means (psi(x) = x^2) functions. Our technique takes an existing offline construction for an M-estimator coreset and converts it into the streaming setting, where n data points arrive sequentially. To our knowledge, this is the first streaming construction for any M-estimator that does not rely on the merge-and-reduce tree. For example, our coreset for streaming metric k-means uses O(epsilon^{-2} k log k log n) points of storage. The previous state-of-the-art required storing at least O(epsilon^{-2} k log k log^{4} n) points

    New Frameworks for Offline and Streaming Coreset Constructions

    Full text link
    A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if PP is a set of points, QQ is a set of queries, and f:P×QRf:P\times Q\to\mathbb{R} is a cost function, then a set SPS\subseteq P with weights w:P[0,)w:P\to[0,\infty) is an ϵ\epsilon-coreset for some parameter ϵ>0\epsilon>0 if sSw(s)f(s,q)\sum_{s\in S}w(s)f(s,q) is a (1+ϵ)(1+\epsilon) multiplicative approximation to pPf(p,q)\sum_{p\in P}f(p,q) for all qQq\in Q. Coresets are used to solve fundamental problems in machine learning under various big data models of computation. Many of the suggested coresets in the recent decade used, or could have used a general framework for constructing coresets whose size depends quadratically on what is known as total sensitivity tt. In this paper we improve this bound from O(t2)O(t^2) to O(tlogt)O(t\log t). Thus our results imply more space efficient solutions to a number of problems, including projective clustering, kk-line clustering, and subspace approximation. Moreover, we generalize the notion of sensitivity sampling for sup-sampling that supports non-multiplicative approximations, negative cost functions and more. The main technical result is a generic reduction to the sample complexity of learning a class of functions with bounded VC dimension. We show that obtaining an (ν,α)(\nu,\alpha)-sample for this class of functions with appropriate parameters ν\nu and α\alpha suffices to achieve space efficient ϵ\epsilon-coresets. Our result implies more efficient coreset constructions for a number of interesting problems in machine learning; we show applications to kk-median/kk-means, kk-line clustering, jj-subspace approximation, and the integer (j,k)(j,k)-projective clustering problem

    The Temperature and Density Structure of the Solar Corona. I. Observations of the Quiet Sun with the EUV Imaging Spectrometer (EIS) on Hinode

    Full text link
    Measurements of the temperature and density structure of the solar corona provide critical constraints on theories of coronal heating. Unfortunately, the complexity of the solar atmosphere, observational uncertainties, and the limitations of current atomic calculations, particularly those for Fe, all conspire to make this task very difficult. A critical assessment of plasma diagnostics in the corona is essential to making progress on the coronal heating problem. In this paper we present an analysis of temperature and density measurements above the limb in the quiet corona using new observations from the EUV Imaging Spectrometer (EIS) on \textit{Hinode}. By comparing the Si and Fe emission observed with EIS we are able to identify emission lines that yield consistent emission measure distributions. With these data we find that the distribution of temperatures in the quiet corona above the limb is strongly peaked near 1 MK, consistent with previous studies. We also find, however, that there is a tail in the emission measure distribution that extends to higher temperatures. EIS density measurements from several density sensitive line ratios are found to be generally consistent with each other and with previous measurements in the quiet corona. Our analysis, however, also indicates that a significant fraction of the weaker emission lines observed in the EIS wavelength ranges cannot be understood with current atomic data.Comment: Submitted to Ap

    A grammar of Awtuw

    No full text
    The aim of this thesis is to describe the structure of the Awtuw language, spoken by about 400 people in the southern foothills of the Torricelli Mountains of northwestern Papua New Guinea. A brief preface presents my theoretical assumptions and methodological orientation. Language is viewed as a cultural phenomenon which, while by no means discrete from other facets of culture, has a distinct central focus that may be described independently without severe distortion. Grammatical classes and categories are isolated on the basis of language-internal morphosyntactic criteria and correlated with semantic functions. The introductory chapter places Awtuw in its geographical, cultural, and linguistic context, identifies the three dialects of Awtuw, and discusses the ubiquitous phenomenon of multilingualism in the Awtuw-speaking and surrounding area. Chapter 2 presents a brief description of Awtuw's phonemes and formalizes the major morphophonological processes. Awtuw has eleven phonemic consonants and seven vowels isolated on the basis of minimal pairs. Morphophonemic rules simplify geminates and certain other consonant clusters, elide vowels, assimilate nasals to following stops, and insert epenthetic vowels. There are also a number of vowel harmony rules that assimilate affix vowels to stem vowels. Chapters 3 through 6 present an analysis of various morphosyntactic phenomena. Chapter 3 devises a number of formal identifying criteria which are used as binary features to analyze Awtuw's parts-of-speech classes. Chapter 4 describes the structure of the verb complex and the categories represented by verbal affixes, and presents a feature-based analysis of the Tense, Mood, and Aspect system. Chapter 5 begins with a discussion of grammatical relations, classifies verb roots on the basis of the case frames that they occur in, and correlates these classes with inherent aspect and other semantic categories. Chapter 6 describes the case-marking suffixes and their functions. Chapters 7 through 10 focus on aspects of Awtuw syntactic structure. Chapter 7 describes the structure of the Noun Phrase. Chapter 8 presents a classification of verbless predication types. Chapter 9 discusses a variety of operations on the clause, including question-formation, negation, reflexivization, and focusing of constituents. And Chapter 10 analyzes interpredicate and interclausal relations. It includes discussion of various types of verb serialization, complementation, relative clauses, adverbial clauses, conditionals, and coordinate constructions. Chapter 11 begins with an analysis of Awtuw kinship terminology and goes on to discuss color terminology, numeration and measurement, body part terminology, and the terms for major biological classes. Finally, Chapter 12 presents a brief description of a variety of paralinguistic phenomena
    corecore