1,062 research outputs found

    Compressed sensing and robust recovery of low rank matrices

    Get PDF
    In this paper, we focus on compressed sensing and recovery schemes for low-rank matrices, asking under what conditions a low-rank matrix can be sensed and recovered from incomplete, inaccurate, and noisy observations. We consider three schemes, one based on a certain Restricted Isometry Property and two based on directly sensing the row and column space of the matrix. We study their properties in terms of exact recovery in the ideal case, and robustness issues for approximately low-rank matrices and for noisy measurements

    A Cost-based Optimizer for Gradient Descent Optimization

    Full text link
    As the use of machine learning (ML) permeates into diverse application domains, there is an urgent need to support a declarative framework for ML. Ideally, a user will specify an ML task in a high-level and easy-to-use language and the framework will invoke the appropriate algorithms and system configurations to execute it. An important observation towards designing such a framework is that many ML tasks can be expressed as mathematical optimization problems, which take a specific form. Furthermore, these optimization problems can be efficiently solved using variations of the gradient descent (GD) algorithm. Thus, to decouple a user specification of an ML task from its execution, a key component is a GD optimizer. We propose a cost-based GD optimizer that selects the best GD plan for a given ML task. To build our optimizer, we introduce a set of abstract operators for expressing GD algorithms and propose a novel approach to estimate the number of iterations a GD algorithm requires to converge. Extensive experiments on real and synthetic datasets show that our optimizer not only chooses the best GD plan but also allows for optimizations that achieve orders of magnitude performance speed-up.Comment: Accepted at SIGMOD 201

    High-sensitivity microfluidic calorimeters for biological and chemical applications

    Get PDF
    High-sensitivity microfluidic calorimeters raise the prospect of achieving high-throughput biochemical measurements with minimal sample consumption. However, it has been challenging to realize microchip-based calorimeters possessing both high sensitivity and precise sample-manipulation capabilities. Here, we report chip-based microfluidic calorimeters capable of characterizing the heat of reaction of 3.5-nL samples with 4.2-nW resolution. Our approach, based on a combination of hard- and soft-polymer microfluidics, provides both exceptional thermal response and the physical strength necessary to construct high-sensitivity calorimeters that can be scaled to automated, highly multiplexed array architectures. Polydimethylsiloxane microfluidic valves and pumps are interfaced to parylene channels and reaction chambers to automate the injection of analyte at 1 nL and below. We attained excellent thermal resolution via on-chip vacuum encapsulation, which provides unprecedented thermal isolation of the minute microfluidic reaction chambers. We demonstrate performance of these calorimeters by resolving measurements of the heat of reaction of urea hydrolysis and the enthalpy of mixing of water with methanol. The device structure can be adapted easily to enable a wide variety of other standard calorimeter operations; one example, a flow calorimeter, is described

    Radiation effects of silver and zinc battery electrodes Final report, Apr. 1965 - Oct. 1966

    Get PDF
    Gamma radiation effects on silver and zinc battery electrode

    Recognizing the need for personalization of haemophilia patient‐reported outcomes in the prophylaxis era

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134854/1/hae13066.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/134854/2/hae13066_am.pd

    Asynchronous Training of Word Embeddings for Large Text Corpora

    Full text link
    Word embeddings are a powerful approach for analyzing language and have been widely popular in numerous tasks in information retrieval and text mining. Training embeddings over huge corpora is computationally expensive because the input is typically sequentially processed and parameters are synchronously updated. Distributed architectures for asynchronous training that have been proposed either focus on scaling vocabulary sizes and dimensionality or suffer from expensive synchronization latencies. In this paper, we propose a scalable approach to train word embeddings by partitioning the input space instead in order to scale to massive text corpora while not sacrificing the performance of the embeddings. Our training procedure does not involve any parameter synchronization except a final sub-model merge phase that typically executes in a few minutes. Our distributed training scales seamlessly to large corpus sizes and we get comparable and sometimes even up to 45% performance improvement in a variety of NLP benchmarks using models trained by our distributed procedure which requires 1/101/10 of the time taken by the baseline approach. Finally we also show that we are robust to missing words in sub-models and are able to effectively reconstruct word representations.Comment: This paper contains 9 pages and has been accepted in the WSDM201

    BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees

    Full text link
    The rising volume of datasets has made training machine learning (ML) models a major computational cost in the enterprise. Given the iterative nature of model and parameter tuning, many analysts use a small sample of their entire data during their initial stage of analysis to make quick decisions (e.g., what features or hyperparameters to use) and use the entire dataset only in later stages (i.e., when they have converged to a specific model). This sampling, however, is performed in an ad-hoc fashion. Most practitioners cannot precisely capture the effect of sampling on the quality of their model, and eventually on their decision-making process during the tuning phase. Moreover, without systematic support for sampling operators, many optimizations and reuse opportunities are lost. In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML training. BlinkML allows users to make error-computation tradeoffs: instead of training a model on their full data (i.e., full model), BlinkML can quickly train an approximate model with quality guarantees using a sample. The quality guarantees ensure that, with high probability, the approximate model makes the same predictions as the full model. BlinkML currently supports any ML model that relies on maximum likelihood estimation (MLE), which includes Generalized Linear Models (e.g., linear regression, logistic regression, max entropy classifier, Poisson regression) as well as PPCA (Probabilistic Principal Component Analysis). Our experiments show that BlinkML can speed up the training of large-scale ML tasks by 6.26x-629x while guaranteeing the same predictions, with 95% probability, as the full model.Comment: 22 pages, SIGMOD 201

    Methodology for vetting heavily doped semiconductors for intermediate band photovoltaics: A case study in sulfur-hyperdoped silicon

    Get PDF
    We present a methodology for estimating the efficiency potential for candidate impurity-band photovoltaic materials from empirical measurements. This methodology employs both Fourier transform infrared spectroscopy and low-temperature photoconductivity to calculate a “performance figure of merit” and to determine both the position and bandwidth of the impurity band. We evaluate a candidate impurity-band material, silicon hyperdoped with sulfur; we find that the figure of merit is more than one order of magnitude too low for photovoltaic devices that exceed the thermodynamic efficiency limit for single band gap materials.National Science Foundation (U.S.) (Energy, Power, and Adaptive Systems Grant Contract ECCS-1102050)National Science Foundation (U.S.) (United States. Dept. of Energy NSF CA EEC-1041895)Center for Clean Water and Clean Energy at MIT and KFUP
    corecore