20,696 research outputs found

    A tight lower bound instance for k-means++ in constant dimension

    Full text link
    The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial kk centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from the given points. For i>1i > 1, pick a point to be the ithi^{th} center with probability proportional to the square of the Euclidean distance of this point to the closest previously (i1)(i-1) chosen centers. The k-means++ seeding algorithm is not only simple and fast but also gives an O(logk)O(\log{k}) approximation in expectation as shown by Arthur and Vassilvitskii. There are datasets on which this seeding algorithm gives an approximation factor of Ω(logk)\Omega(\log{k}) in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably high probability (say 1/poly(k)1/poly(k)). Brunsch and R\"{o}glin gave a dataset where the k-means++ seeding algorithm achieves an O(logk)O(\log{k}) approximation ratio with probability that is exponentially small in kk. However, this and all other known lower-bound examples are high dimensional. So, an open problem was to understand the behavior of the algorithm on low dimensional datasets. In this work, we give a simple two dimensional dataset on which the seeding algorithm achieves an O(logk)O(\log{k}) approximation ratio with probability exponentially small in kk. This solves open problems posed by Mahajan et al. and by Brunsch and R\"{o}glin.Comment: To appear in TAMC 2014. arXiv admin note: text overlap with arXiv:1306.420

    Seismic stratigraphy and history of deep circulation and sediment drift development in Baffin Bay and the Labrador Sea

    Get PDF
    Drilling results and seismic-reflection records at and across Ocean Drilling Program (ODP) Sites 645 (western Baffin Bay), 646, and 647 (Labrador Sea) provide important constraints on the history of deep-water circulation and sedimentation in response to Cenozoic climatic change, as well as the tectonic evolution of the region. Sites 646 and 647 were drilled on the flanks of two sediment drift deposits—the Eirik Ridge and Gloria Drift, respectively. Age control at Site 645 was poor because of the restricted biotas there, but the drill site provides a continuous sequence from the lower Miocene to the present. Sediment at Site 646 was deposited at high rates, providing a high resolution record of the last 8.5 Ma. At Site 647 sedimentation was variable and discontinuous, but a complete upper-lower Eocene through lower Oligocene sequence was recovered, whereas the upper Oligocene to Holocene sequence was interrupted by several hiatuses. The drift sequence at Site 646 was constructed after the middle to early Pliocene (ca. 4.5 Ma). Before that time, evidence exists for variable bottom-current activity, with events at about 7.5 Ma (a change in water-mass characteristics and decreasing velocities) and 5.6 Ma (an increase in current velocity preceding the major 4.5-Ma event; R2 regional reflector). The 7.5-Ma event produced a major regional reflector (R3/R4), which was originally thought to be Eocene/ Oligocene in age. A major water-mass change also occurred at the onset of ice-rafting at about 2.5 Ma in the late Pliocene. In seismic records no evidence exists of drift building before the early Pliocene, but a probable late-middle Miocene erosional event occurred on the south flank of Eirik Ridge and along the West Greenland margin. Sediment supply from the Imarssuak mid-ocean canyon (IMOC) increased concurrently with the advent of drift construction. Gloria Drift also was built largely after the late Miocene. A major increase in sediment supply occurred in the early Pliocene, following a major hiatus (5.6 to 2.5 Ma; equivalent to the youngest possible age for the R2 reflector underlying Gloria Drift), and most seismic records exhibit sediment waves above this horizon. This increased sediment supply is the result of hemipelagic deposition from encroaching deposits of the North Atlantic mid-ocean canyon, as well as to supply of ice-rafted detritus in the late Pliocene. A hiatus encompasses the interval from approximately 17.5 to 8,2 Ma, and the interval between the two major hiatuses is extremely condensed. A deeper reflector (R3) corresponds to a change from calcareous (below) to opal-rich hemipelagic strata in the lower Oligocene, not to a regional unconformity reflecting increased bottom-water activity, as previously thought. However, some evidence exists to support a latest Eocene-earliest Oligocene increase in bottom-current activity on Gloria Drift. In Baffin Bay, there is evidence for bottom-water activity from textural studies of cores and from apparent drift features exhibited in multichannel lines along the western margin. Probable contour-currents have been active since at least the late middle Miocene, with episodes of decreasing intensity that apparently occurred in the late Miocene and Quaternary. The record from Site 645 and in seismic lines may indicate that formation of bottom water occurred in the late Neogene in Baffin Bay in conjunction with climatic deterioration, but Baffin Bay was not an important source of deep-water masses to the Labrador Sea after the late Pliocene. Not surprisingly, many of the Labrador Sea deep-circulation events correspond closely to major North Atlantic events and to important global climatic and paleoceanographic events, but a major drift-building episode may have occurred later in the Labrador Sea than it did in either the eastern North Atlantic or the western North Atlantic

    The Embodied Statistician

    Get PDF
    How do infants, children, and adults learn grammatical rules from the mere observation of grammatically structured sequences? We present an embodied hypothesis that (a) people covertly imitate stimuli; (b) imitation tunes the particular neuromuscular systems used in the imitation, facilitating transitions between the states corresponding to the successive grammatical stimuli; and (c) the discrimination between grammatical and ungrammatical stimuli is based on differential ease of imitation of the sequences. We report two experiments consistent with the embodied account of statistical learning. Experiment 1 demonstrates that sequences composed of stimuli imitated with different neuromuscular systems were more difficult to learn compared to sequences imitated within a single neuromuscular system. Experiment 2 provides further evidence by showing that selectively interfering with the tuned neuromuscular system while attempting to discriminate between grammatical and ungrammatical sequences disrupted performance only on sequences imitated by that particular neuromuscular system. Together these results are difficult for theories postulating that grammatical rule learning is based primarily on abstract statistics representing transition probabilities

    Measurement of the Spatial Cross-Correlation Function of Damped Lyman Alpha Systems and Lyman Break Galaxies

    Full text link
    We present the first spectroscopic measurement of the spatial cross-correlation function between damped Lyman alpha systems (DLAs) and Lyman break galaxies (LBGs). We obtained deep u'BVRI images of nine QSO fields with 11 known z ~ 3 DLAs and spectroscopically confirmed 211 R < 25.5 photometrically selected z > 2 LBGs. We find strong evidence for an overdensity of LBGs near DLAs versus random, the results of which are similar to that of LBGs near other LBGs. A maximum likelihood cross-correlation analysis found the best fit correlation length value of r_0 = 2.9^(+1.4)_(-1.5) h^(-1)Mpc using a fixed value of gamma = 1.6. The implications of the DLA-LBG clustering amplitude on the average dark matter halo mass of DLAs are discussed.Comment: 12 pages, 2 figures, accepted for publication in Astrophysical Journal Letter

    "Reporting of Two or More Races in the 1999 American Community Survey"

    Get PDF
    This paper investigates the causes of western Germany's remarkably poor performance since 1992. The paper challenges the view that the poor record of the nineties, particularly the marked deterioration in public finances since unification, might be largely attributable to unification. Instead, the analysis highlights the role of ill-timed and overly ambitious fiscal consolidation in conjunction with tight monetary policies of an exceptional length and degree. The issue of fiscal sustainability and Germany's fiscal and monetary policies are assessed both in the light of economic theory and in comparison to the best practices of other more successful countries. The analysis concludes that Germany's dismal record of the nineties must not be seen as a direct and apparently inevitable result of unification. Rather, the record arose as a perfectly unnecessary consequence of unsound macro demand policies conducted under the Bundesbank's dictate in response to it, policies that caused the severe and protracted de-stabilization of western Germany in the first place.

    Reporting of Two or More Races In the 1999 American Community Survey

    Get PDF
    This study presents data on race, collected at selected sites throughout the country for the 1999 American Community Survey (ACS). In particular, the distribution of the population by race and Hispanic or Latino origin is examined, as are the reporting of multiple races, number of races, and major race combinations and the extent to which the race and Hispanic/Latino questions were not answered. Although the ACS sites were not intended to be a nationally representative sample, the study's results provide important insights into what might be learned from Census 2000.

    COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

    Full text link
    COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

    A thermal model for adaptive competition in a market

    Full text link
    New continuous and stochastic extensions of the minority game, devised as a fundamental model for a market of competitive agents, are introduced and studied in the context of statistical physics. The new formulation reproduces the key features of the original model, without the need for some of its special assumptions and, most importantly, it demonstrates the crucial role of stochastic decision-making. Furthermore, this formulation provides the exact but novel non-linear equations for the dynamics of the system.Comment: 4 RevTeX pages, 3 EPS figures. Revised versio

    Coherent storage and phase modulation of single hard x-ray photons using nuclear excitons

    Full text link
    Coherent storage and phase modulation of x-ray single-photon wave packets in resonant scattering of light off nuclei is investigated theoretically. We show that by switching off and on again the magnetic field in the nuclear sample, phase-sensitive storage of photons in the keV regime can be achieved. Corresponding π\pi phase modulation of the stored photon can be accomplished if the retrieving magnetic field is rotated by 180180^{\circ}. The development of such x-ray single-photon control techniques is a first step towards forwarding quantum optics and quantum information to shorter wavelengths and more compact photonic devices.Comment: 12 pages, 6 figures; v2 modified to match the published version, condensed to 4 figures, results unchange
    corecore