313 research outputs found

    Scalable k-Means Clustering via Lightweight Coresets

    Full text link
    Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of lightweight coresets that allows for both multiplicative and additive errors. We provide a single algorithm to construct lightweight coresets for k-means clustering as well as soft and hard Bregman clustering. The algorithm is substantially faster than existing constructions, embarrassingly parallel, and the resulting coresets are smaller. We further show that the proposed approach naturally generalizes to statistical k-means clustering and that, compared to existing results, it can be used to compute smaller summaries for empirical risk minimization. In extensive experiments, we demonstrate that the proposed algorithm outperforms existing data summarization strategies in practice.Comment: To appear in the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD

    Gem-induced cytoskeleton remodeling increases cellular migration of HTLV-1-infected cells, formation of infected-to-target T-cell conjugates and viral transmission

    Get PDF
    Efficient HTLV-1 viral transmission occurs through cell-to-cell contacts. The Tax viral transcriptional activator protein facilitates this process. Using a comparative transcriptomic analysis, we recently identified a series of genes up-regulated in HTLV-1 Tax expressing T-lymphocytes. We focused our attention towards genes that are important for cytoskeleton dynamic and thus may possibly modulate cell-to-cell contacts. We first demonstrate that Gem, a member of the small GTP-binding proteins within the Ras superfamily, is expressed both at the RNA and protein levels in Tax-expressing cells and in HTLV-1-infected cell lines. Using a series of ChIP assays, we show that Tax recruits CREB and CREB Binding Protein (CBP) onto a c-AMP Responsive Element (CRE) present in the gem promoter. This CRE sequence is required to drive Tax-activated gem transcription. Since Gem is involved in cytoskeleton remodeling, we investigated its role in infected cells motility. We show that Gem co-localizes with F-actin and is involved both in T-cell spontaneous cell migration as well as chemotaxis in the presence of SDF-1/CXCL12. Importantly, gem knock-down in HTLV-1-infected cells decreases cell migration and conjugate formation. Finally, we demonstrate that Gem plays an important role in cell-to-cell viral transmission

    Mining oral history collections using music information retrieval methods

    Get PDF
    Recent work at the Sussex Humanities Lab, a digital humanities research program at the University of Sussex, has sought to address an identified gap in the provision and use of audio feature analysis for spoken word collections. Traditionally, oral history methodologies and practices have placed emphasis on working with transcribed textual surrogates, rather than the digital audio files created during the interview process. This provides a pragmatic access to the basic semantic content, but obviates access to other potentially meaningful aural information; our work addresses the potential for methods to explore this extra-semantic information, by working with the audio directly. Audio analysis tools, such as those developed within the established field of Music Information Retrieval (MIR), provide this opportunity. This paper describes the application of audio analysis techniques and methods to spoken word collections. We demonstrate an approach using freely available audio and data analysis tools, which have been explored and evaluated in two workshops. We hope to inspire new forms of content analysis which complement semantic analysis with investigation into the more nuanced properties carried in audio signals

    Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

    Full text link
    We propose two new methods to address the weak scaling problems of KRR: the Balanced KRR (BKRR) and K-means KRR (KKRR). These methods consider alternative ways to partition the input dataset into p different parts, generating p different models, and then selecting the best model among them. Compared to a conventional implementation, KKRR2 (optimized version of KKRR) improves the weak scaling efficiency from 0.32% to 38% and achieves a 591times speedup for getting the same accuracy by using the same data and the same hardware (1536 processors). BKRR2 (optimized version of BKRR) achieves a higher accuracy than the current fastest method using less training time for a variety of datasets. For the applications requiring only approximate solutions, BKRR2 improves the weak scaling efficiency to 92% and achieves 3505 times speedup (theoretical speedup: 4096 times).Comment: This paper has been accepted by ACM International Conference on Supercomputing (ICS) 201

    STLV-1 co-infection is correlated with an increased SFV proviral load in the peripheral blood of SFV/STLV-1 naturally infected non-human primates

    Get PDF
    Simian T-Leukemia Virus type 1 and Simian Foamy Virus infect non-human primates. While STLV-1, as HTLV-1, causes Adult T-cell Leukemia/lymphoma, SFV infection is asymptomatic. Both retroviruses can be transmitted from NHPs to humans through bites that allow contact between infected saliva and recipient blood. Because both viruses infect CD4+ T-cells, they might interfere with each other replication, and this might impact viral transmission. Impact of STLV-1 co-infection on SFV replication was analyzed in 18 SFV-positive/STLV-1-negative and 18 naturally SFV/STLV-1 co-infected Papio anubis. Even if 9 animals were found STLV-1-positive in saliva, STLV-1 PVL was much higher in the blood. SFV proviruses were detected in the saliva of all animals. Interestingly, SFV proviral load was much higher in the blood of STLV-1/SFV co-infected animals, compared to STLV-1-negative animals. Given that soluble Tax protein can enter uninfected cells, we tested its effect on foamy virus promoter and we show that Tax protein can transactivate the foamy LTR. This demonstrates that true STLV-1 co-infection or Tax only has an impact on SFV replication and may influence the ability of the virus to be zoonotically transmitted as well as its ability to promote hematological abnormalities

    Clustering beat-chroma patterns in a large music database

    Get PDF
    A musical style or genre implies a set of common conventions and patterns combined and deployed in different ways to make individual musical pieces; for instance, most would agree that contemporary pop music is assembled from a relatively small palette of harmonic and melodic patterns. The purpose of this paper is to use a database of tens of thousands of songs in combination with a compact representation of melodic-harmonic content (the beat-synchronous chromagram) and data-mining tools (clustering) to attempt to explicitly catalog this palette — at least within the limitations of the beat-chroma representation. We use online k-means clustering to summarize 3.7 million 4-beat bars in a codebook of a few hundred prototypes. By measuring how accurately such a quantized codebook can reconstruct the original data, we can quantify the degree of diversity (distortion as a function of codebook size) and temporal structure (i.e. the advantage gained by joint quantizing multiple frames) in this music. The most popular codewords themselves reveal the common chords used in the music. Finally, the quantized representation of music can be used for music retrieval tasks such as artist and genre classification, and identifying songs that are similar in terms of their melodic-harmonic content

    Deformation–sedimentation feedback and the development of anomalously thick aggradational turbidite lobes: Outcrop and subsurface examples from the Hikurangi Margin, New Zealand

    Get PDF
    Concepts of the interaction between autogenic (e.g., flow process) and allogenic (e.g., tectonics) controls on sedimentation have advanced to a state that allows the controlling forces to be distinguished. Here we examine outcropping and subsurface Neogene deep-marine clastic systems that traversed the Hikurangi subduction margin via thrust-bounded trench-slope basins, providing an opportunity to examine the interplay of structural deformation and deep-marine sedimentation. Sedimentary logging and mapping of Miocene outcrops from the exhumed portion of the subduction wedge record heavily amalgamated, sand-rich lobe complexes, up to 200 m thick, which accumulated behind NE–SW-oriented growth structures. There was no significant deposition from low-density parts of the gravity flows in the basin center, although lateral fringes demonstrate fining and thinning indicative of deposits from low-density flows. Seismic data from the offshore portion of the margin show analogous lobate reflector geometries. These deposits accumulate into complexes up to 5 km wide, 8 km long, and 300 m thick, comparable in scale with the outcropping lobes on this margin. Mapping reveals lobe complexes that are vertically stacked behind thrusts. These results illustrate repeated trapping of the sandier parts of turbidity currents to form aggradational lobe complexes, with the finer-grained suspended load bypassing to areas downstream. However, the repeated development of lobes characterized by partial bypass implies that a feedback mechanism operates to perpetuate a partial confinement condition, via rejuvenation of accommodation. The mechanism proposed is a coupling of sediment loading and deformation rate, such that load-driven subsidence focuses stress on basin-bounding faults and perpetuates generation of accommodation in the basin, hence modulating tectonic forcing. Recognition of such a mechanism has implications for understanding the tectono-stratigraphic evolution of deep-marine fold and thrust belts and the distribution of resources within them

    Anti-HTLV antibody profiling reveals an antibody signature for HTLV-I-Associated Myelopathy/Tropical Spastic Paraparesis (HAM/TSP)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>HTLV-I is the causal agent of adult T cell leukemia (ATLL) and HTLV-I-associated myelopathy/tropical spastic paraparesis (HAM/TSP). Biomarkers are needed to diagnose and/or predict patients who are at risk for HAM/TSP or ATLL. Therefore, we investigated using luciferase immunoprecipitation technology (LIPS) antibody responses to seven HTLV-I proteins in non-infected controls, asymptomatic HTLV-I-carriers, ATLL and HAM/TSP sera samples. Antibody profiles were correlated with viral load and examined in longitudinal samples.</p> <p>Results</p> <p>Anti-GAG antibody titers detected by LIPS differentiated HTLV-infected subjects from uninfected controls with 100% sensitivity and 100% specificity, but did not differ between HTLV-I infected subgroups. However, anti-Env antibody titers were over 4-fold higher in HAM/TSP compared to both asymptomatic HTLV-I (<it>P </it>< 0.0001) and ATLL patients (<it>P </it>< 0.0005). Anti-Env antibody titers above 100,000 LU had 75% positive predictive value and 79% negative predictive value for identifying the HAM/TSP sub-type. Anti-Tax antibody titers were also higher (<it>P </it>< 0.0005) in the HAM/TSP compared to the asymptomatic HTLV-I carriers. Proviral load correlated with anti-Env antibodies in asymptomatic carriers (<it>R </it>= 0.76), but not in HAM/TSP.</p> <p>Conclusion</p> <p>These studies indicate that anti-HTLV-I antibody responses detected by LIPS are useful for diagnosis and suggest that elevated anti-Env antibodies are a common feature found in HAM/TSP patients.</p

    Earthworm management in tropical agroecosystems

    Get PDF
    Collaborative research in the Macrofauna project has enabled development of some techniques that presently are at different stages of advancement, from promising pilot experiments (tomato production and inoculation in plant nursery bags at Yurimaguas and in India) to the fully developed technique of massive worm production and biofertilization of tea gardens in Tamil Nadu (India) (patent deposited). Failures have also helped to gain better insight into the potential feasibility of techniques that had been considered in the objectives of this project. Endogeic earthworms (#Pontoscolex corethrurus$) may be produced in large quantities, i.e.about 12000 worms (1.6-2.8 kg live wt)/m2/year in specific culture beds using either sawdust (Yurimaguas, Peru) or a mixture of high and low quality materials (Tamil Nadu, India) mixed into soil as substrates. Cost of production of 1 kg of earthworm biomass through bed culture is about 3.6 Euro, much lower than the cost of hand collection of worms from pastures/grasslands where these species are abundant (6-125 Euro depending on the cost of labour and earthworm density). The theorical value of an active earthworm community with an average biomass of 400 kg live wt has been estimated at 1400 Euro, the price that it would cost to reintroduce an equivalent biomass produced in our culture units, indicating the cost of land restoration. Direct inoculation of earthworms in the field to improve production may only affect plant growth positively if a large biomass (greater than 30 g live wt/m2) is inoculated from the beginning. An alternative may be to concentrate the inoculum in small areas regularly distributed across the field... (D'après résumé d'auteur
    • …
    corecore