279 research outputs found

    A Cost-based Optimizer for Gradient Descent Optimization

    Full text link
    As the use of machine learning (ML) permeates into diverse application domains, there is an urgent need to support a declarative framework for ML. Ideally, a user will specify an ML task in a high-level and easy-to-use language and the framework will invoke the appropriate algorithms and system configurations to execute it. An important observation towards designing such a framework is that many ML tasks can be expressed as mathematical optimization problems, which take a specific form. Furthermore, these optimization problems can be efficiently solved using variations of the gradient descent (GD) algorithm. Thus, to decouple a user specification of an ML task from its execution, a key component is a GD optimizer. We propose a cost-based GD optimizer that selects the best GD plan for a given ML task. To build our optimizer, we introduce a set of abstract operators for expressing GD algorithms and propose a novel approach to estimate the number of iterations a GD algorithm requires to converge. Extensive experiments on real and synthetic datasets show that our optimizer not only chooses the best GD plan but also allows for optimizations that achieve orders of magnitude performance speed-up.Comment: Accepted at SIGMOD 201

    Daily Eastern News: November 10, 1975

    Get PDF
    https://thekeep.eiu.edu/den_1975_nov/1005/thumbnail.jp

    Learning Generalized Linear Models Over Normalized Data

    Full text link
    Enterprise data analytics is a booming area in the data man-agement industry. Many companies are racing to develop toolkits that closely integrate statistical and machine learn-ing techniques with data management systems. Almost all such toolkits assume that the input to a learning algorithm is a single table. However, most relational datasets are not stored as single tables due to normalization. Thus, analysts often perform key-foreign key joins before learning on the join output. This strategy of learning after joins introduces redundancy avoided by normalization, which could lead to poorer end-to-end performance and maintenance overheads due to data duplication. In this work, we take a step towards enabling and optimizing learning over joins for a common class of machine learning techniques called generalized linear models that are solved using gradient descent algorithms in an RDBMS setting. We present alternative approaches to learn over a join that are easy to implement over existing RDBMSs. We introduce a new approach named factorized learning that pushes ML computations through joins and avoids redundancy in both I/O and computations. We study the tradeoff space for all our approaches both analytically and empirically. Our results show that factorized learning is often substantially faster than the alternatives, but is not always the fastest, necessitating a cost-based approach. We also discuss extensions of all our approaches to multi-table joins as well as to Hive

    QuickSel: Quick Selectivity Learning with Mixture Models

    Full text link
    Estimating the selectivity of a query is a key step in almost any cost-based query optimizer. Most of today's databases rely on histograms or samples that are periodically refreshed by re-scanning the data as the underlying data changes. Since frequent scans are costly, these statistics are often stale and lead to poor selectivity estimates. As an alternative to scans, query-driven histograms have been proposed, which refine the histograms based on the actual selectivities of the observed queries. Unfortunately, these approaches are either too costly to use in practice---i.e., require an exponential number of buckets---or quickly lose their advantage as they observe more queries. In this paper, we propose a selectivity learning framework, called QuickSel, which falls into the query-driven paradigm but does not use histograms. Instead, it builds an internal model of the underlying data, which can be refined significantly faster (e.g., only 1.9 milliseconds for 300 queries). This fast refinement allows QuickSel to continuously learn from each query and yield increasingly more accurate selectivity estimates over time. Unlike query-driven histograms, QuickSel relies on a mixture model and a new optimization algorithm for training its model. Our extensive experiments on two real-world datasets confirm that, given the same target accuracy, QuickSel is 34.0x-179.4x faster than state-of-the-art query-driven histograms, including ISOMER and STHoles. Further, given the same space budget, QuickSel is 26.8%-91.8% more accurate than periodically-updated histograms and samples, respectively

    Singing the same tune? International continuities and discontinuities in how police talk about using force

    Full text link
    This article focuses on a research project conducted in six jurisdictions: England, The Netherlands, Germany, Australia, Venezuela, and Brazil. These societies are very different ethnically, socially, politically, economically, historically and have wildly different levels of crime. Their policing arrangements also differ significantly: how they are organised; how their officers are equipped and trained; what routine operating procedures they employ; whether they are armed; and much else besides. Most relevant for this research, they represent policing systems with wildly different levels of police shootings, Police in the two Latin American countries represented here have a justified reputation for the frequency with which they shoot people, whereas at the other extreme the police in England do not routinely carry firearms and rarely shoot anyone. To probe whether these differences are reflected in the way that officers talk about the use of force, police officers in these different jurisdictions were invited to discuss in focus groups a scenario in which police are thwarted in their attempt to arrest two youths (one of whom is a known local criminal) by the youths driving off with the police in pursuit, and concludes with the youths crashing their car and escaping in apparent possession of a gun, It might be expected that focus groups would prove starkly different, and indeed they were, but not in the way that might be expected. There was little difference in affirmation of normative and legal standards regarding the use of force. It was in how officers in different jurisdictions envisaged the circumstances in which the scenario took place that led Latin American officers to anticipate that they would shoot the suspects, whereas officers in the other jurisdictions had little expectation that they would open fire in the conditions as they imagined them to be

    Distinct Transcriptome Expression of the Temporal Cortex of the Primate Microcebus murinus during Brain Aging versus Alzheimer's Disease-Like Pathology

    Get PDF
    Aging is the primary risk factor of neurodegenerative disorders such as Alzheimer's disease (AD). However, the molecular events occurring during brain aging are extremely complex and still largely unknown. For a better understanding of these age-associated modifications, animal models as close as possible to humans are needed. We thus analyzed the transcriptome of the temporal cortex of the primate Microcebus murinus using human oligonucleotide microarrays (Affymetrix). Gene expression profiles were assessed in the temporal cortex of 6 young adults, 10 healthy old animals and 2 old, “AD-like” animals that presented ß-amyloid plaques and cortical atrophy, which are pathognomonic signs of AD in humans. Gene expression data of the 14,911 genes that were detected in at least 3 samples were analyzed. By SAM (significance analysis of microarrays), we identified 47 genes that discriminated young from healthy old and “AD-like” animals. These findings were confirmed by principal component analysis (PCA). ANOVA of the expression data from the three groups identified 695 genes (including the 47 genes previously identified by SAM and PCA) with significant changes of expression in old and “AD-like” in comparison to young animals. About one third of these genes showed similar changes of expression in healthy aging and in “AD-like” animals, whereas more than two thirds showed opposite changes in these two groups in comparison to young animals. Hierarchical clustering analysis of the 695 markers indicated that each group had distinct expression profiles which characterized each group, especially the “AD-like” group. Functional categorization showed that most of the genes that were up-regulated in healthy old animals and down-regulated in “AD-like” animals belonged to metabolic pathways, particularly protein synthesis. These data suggest the existence of compensatory mechanisms during physiological brain aging that disappear in “AD-like” animals. These results open the way to new exploration of physiological and “AD-like” aging in primates

    Changes in muscle contractile characteristics and jump height following 24 days of unilateral lower limb suspension

    Get PDF
    We measured changes in maximal voluntary and electrically evoked torque and rate of torque development because of limb unloading. We investigated whether these changes during single joint isometric muscle contractions were related to changes in jump performance involving dynamic muscle contractions and several joints. Six healthy male subjects (21 ± 1 years) underwent 3 weeks of unilateral lower limb suspension (ULLS) of the right limb. Plantar flexor and knee extensor maximal voluntary contraction (MVC) torque and maximal rate of torque development (MRTD), voluntary activation, and maximal triplet torque (thigh; 3 pulses at 300 Hz) were measured next to squat jump height before and after ULLS. MVC of plantar flexors and knee extensors (MVCke) and triplet torque decreased by 12% (P = 0.012), 21% (P = 0.001) and 11% (P = 0.016), respectively. Voluntary activation did not change (P = 0.192). Absolute MRTD during voluntary contractions decreased for plantar flexors (by 17%, P = 0.027) but not for knee extensors (P = 0.154). Absolute triplet MRTD decreased by 17% (P = 0.048). The reduction in MRTD disappeared following normalization to MVC. Jump height with the previously unloaded leg decreased significantly by 28%. No significant relationships were found between any muscle variable and jump height (r < 0.48), but decreases in torque were (triplet, r = 0.83, P = 0.04) or tended to be (MVCke r = 0.71, P = 0.11) related to decreases in jump height. Thus, reductions in isometric muscle torque following 3 weeks of limb unloading were significantly related to decreases in the more complex jump task, although torque in itself (without intervention) was not related to jump performance
    corecore