1,688 research outputs found

    An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws

    Full text link
    We study the compute-optimal trade-off between model and training data set sizes for large neural networks. Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla. While that work studies transformer-based large language models trained on the MassiveText corpus gopher, as a starting point for development of a mathematical theory, we focus on a simpler learning model and data generating process, each based on a neural network with a sigmoidal output unit and single hidden layer of ReLU activation units. We introduce general error upper bounds for a class of algorithms which incrementally update a statistic (for example gradient descent). For a particular learning model inspired by barron 1993, we establish an upper bound on the minimal information-theoretically achievable expected error as a function of model and data set sizes. We then derive allocations of computation that minimize this bound. We present empirical results which suggest that this approximation correctly identifies an asymptotic linear compute-optimal scaling. This approximation also generates new insights. Among other things, it suggests that, as the input dimension or latent space complexity grows, as might be the case for example if a longer history of tokens is taken as input to a language model, a larger fraction of the compute budget should be allocated to growing the learning model rather than training data

    Is Stochastic Gradient Descent Near Optimal?

    Full text link
    The success of neural networks over the past decade has established them as effective models for many relevant data generating processes. Statistical theory on neural networks indicates graceful scaling of sample complexity. For example, Joen & Van Roy (arXiv:2203.00246) demonstrate that, when data is generated by a ReLU teacher network with WW parameters, an optimal learner needs only O~(W/ϵ)\tilde{O}(W/\epsilon) samples to attain expected error ϵ\epsilon. However, existing computational theory suggests that, even for single-hidden-layer teacher networks, to attain small error for all such teacher networks, the computation required to achieve this sample complexity is intractable. In this work, we fit single-hidden-layer neural networks to data generated by single-hidden-layer ReLU teacher networks with parameters drawn from a natural distribution. We demonstrate that stochastic gradient descent (SGD) with automated width selection attains small expected error with a number of samples and total number of queries both nearly linear in the input dimension and width. This suggests that SGD nearly achieves the information-theoretic sample complexity bounds of Joen & Van Roy (arXiv:2203.00246) in a computationally efficient manner. An important difference between our positive empirical results and the negative theoretical results is that the latter address worst-case error of deterministic algorithms, while our analysis centers on expected error of a stochastic algorithm.Comment: arXiv admin note: substantial text overlap with arXiv:2203.0024

    Practical License Plate Recognition in Unconstrained Surveillance Systems with Adversarial Super-Resolution

    Full text link
    Although most current license plate (LP) recognition applications have been significantly advanced, they are still limited to ideal environments where training data are carefully annotated with constrained scenes. In this paper, we propose a novel license plate recognition method to handle unconstrained real world traffic scenes. To overcome these difficulties, we use adversarial super-resolution (SR), and one-stage character segmentation and recognition. Combined with a deep convolutional network based on VGG-net, our method provides simple but reasonable training procedure. Moreover, we introduce GIST-LP, a challenging LP dataset where image samples are effectively collected from unconstrained surveillance scenes. Experimental results on AOLP and GIST-LP dataset illustrate that our method, without any scene-specific adaptation, outperforms current LP recognition approaches in accuracy and provides visual enhancement in our SR results that are easier to understand than original data.Comment: Accepted at VISAPP, 201

    New-type of Multi-purpose Standard Radon Chamber in South Korea

    Get PDF
    Radon is an inert and a radioactive gas which is colorless, tasteless and odorless. As the radon decay proceeds, and if DNA damage continues beyond repair capacity of cells in the human body, it can cause severe health problems such as lung cancer in the long-term. There is a tendency that those countries where legal restriction on radon is strict, various studies related to radon are under way. In South Korea, radon has been regulated under recommendation level. Even though there are about 3 standard radon chambers in Korea, they have not been in an active use because of lack of demand. Also, most of them are specialized in calibration of radon detectors only. Recently, Korean government started giving some attention to radon issue and supporting radon research fields. Thus, this study was carried out to develop a new type of radon chamber for multi-purpose such as 1) radon emission rate from natural and artificial radon sources; 2) calibration of radon detectors; 3) evaluation of radon mitigation efficiency. Keywords: Radon, Radon Chamber, Indoor Air Quality, Chamber Desig

    A Study of Parameters Related to the Etch Rate for a Dry Etch Process Using NF 3

    Get PDF
    The characteristics of the dry etching of SiNx:H thin films for display devices using SF6/O2 and NF3/O2 were investigated using a dual-frequency capacitively coupled plasma reactive ion etching (CCP-RIE) system. The investigation was carried out by varying the RF power ratio (13.56 MHz/2 MHz), pressure, and gas flow ratio. For the SiNx:H film, the etch rates obtained using NF3/O2 were higher than those obtained using SF6/O2 under various process conditions. The relationships between the etch rates and the usual monitoring parameters—the optical emission spectroscopy (OES) intensity of atomic fluorine (685.1 nm and 702.89 nm) and the voltages VH and VL—were investigated. The OES intensity data indicated a correlation between the bulk plasma density and the atomic fluorine density. The etch rate was proportional to the product of the OES intensity of atomic fluorine (I(F)) and the square root of the voltages (Vh+Vl) on the assumption that the velocity of the reactive fluorine was proportional to the square root of the voltages
    • …
    corecore