1,688 research outputs found
An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws
We study the compute-optimal trade-off between model and training data set
sizes for large neural networks. Our result suggests a linear relation similar
to that supported by the empirical analysis of chinchilla. While that work
studies transformer-based large language models trained on the MassiveText
corpus gopher, as a starting point for development of a mathematical theory, we
focus on a simpler learning model and data generating process, each based on a
neural network with a sigmoidal output unit and single hidden layer of ReLU
activation units. We introduce general error upper bounds for a class of
algorithms which incrementally update a statistic (for example gradient
descent). For a particular learning model inspired by barron 1993, we establish
an upper bound on the minimal information-theoretically achievable expected
error as a function of model and data set sizes. We then derive allocations of
computation that minimize this bound. We present empirical results which
suggest that this approximation correctly identifies an asymptotic linear
compute-optimal scaling. This approximation also generates new insights. Among
other things, it suggests that, as the input dimension or latent space
complexity grows, as might be the case for example if a longer history of
tokens is taken as input to a language model, a larger fraction of the compute
budget should be allocated to growing the learning model rather than training
data
Is Stochastic Gradient Descent Near Optimal?
The success of neural networks over the past decade has established them as
effective models for many relevant data generating processes. Statistical
theory on neural networks indicates graceful scaling of sample complexity. For
example, Joen & Van Roy (arXiv:2203.00246) demonstrate that, when data is
generated by a ReLU teacher network with parameters, an optimal learner
needs only samples to attain expected error .
However, existing computational theory suggests that, even for
single-hidden-layer teacher networks, to attain small error for all such
teacher networks, the computation required to achieve this sample complexity is
intractable. In this work, we fit single-hidden-layer neural networks to data
generated by single-hidden-layer ReLU teacher networks with parameters drawn
from a natural distribution. We demonstrate that stochastic gradient descent
(SGD) with automated width selection attains small expected error with a number
of samples and total number of queries both nearly linear in the input
dimension and width. This suggests that SGD nearly achieves the
information-theoretic sample complexity bounds of Joen & Van Roy
(arXiv:2203.00246) in a computationally efficient manner. An important
difference between our positive empirical results and the negative theoretical
results is that the latter address worst-case error of deterministic
algorithms, while our analysis centers on expected error of a stochastic
algorithm.Comment: arXiv admin note: substantial text overlap with arXiv:2203.0024
Practical License Plate Recognition in Unconstrained Surveillance Systems with Adversarial Super-Resolution
Although most current license plate (LP) recognition applications have been
significantly advanced, they are still limited to ideal environments where
training data are carefully annotated with constrained scenes. In this paper,
we propose a novel license plate recognition method to handle unconstrained
real world traffic scenes. To overcome these difficulties, we use adversarial
super-resolution (SR), and one-stage character segmentation and recognition.
Combined with a deep convolutional network based on VGG-net, our method
provides simple but reasonable training procedure. Moreover, we introduce
GIST-LP, a challenging LP dataset where image samples are effectively collected
from unconstrained surveillance scenes. Experimental results on AOLP and
GIST-LP dataset illustrate that our method, without any scene-specific
adaptation, outperforms current LP recognition approaches in accuracy and
provides visual enhancement in our SR results that are easier to understand
than original data.Comment: Accepted at VISAPP, 201
New-type of Multi-purpose Standard Radon Chamber in South Korea
Radon is an inert and a radioactive gas which is colorless, tasteless and odorless. As the radon decay proceeds, and if DNA damage continues beyond repair capacity of cells in the human body, it can cause severe health problems such as lung cancer in the long-term. There is a tendency that those countries where legal restriction on radon is strict, various studies related to radon are under way. In South Korea, radon has been regulated under recommendation level. Even though there are about 3 standard radon chambers in Korea, they have not been in an active use because of lack of demand. Also, most of them are specialized in calibration of radon detectors only. Recently, Korean government started giving some attention to radon issue and supporting radon research fields. Thus, this study was carried out to develop a new type of radon chamber for multi-purpose such as 1) radon emission rate from natural and artificial radon sources; 2) calibration of radon detectors; 3) evaluation of radon mitigation efficiency. Keywords: Radon, Radon Chamber, Indoor Air Quality, Chamber Desig
A Study of Parameters Related to the Etch Rate for a Dry Etch Process Using NF 3
The characteristics of the dry etching of SiNx:H thin films for display devices using SF6/O2 and NF3/O2 were investigated using a dual-frequency capacitively coupled plasma reactive ion etching (CCP-RIE) system. The investigation was carried out by varying the RF power ratio (13.56 MHz/2 MHz), pressure, and gas flow ratio. For the SiNx:H film, the etch rates obtained using NF3/O2 were higher than those obtained using SF6/O2 under various process conditions. The relationships between the etch rates and the usual monitoring parameters—the optical emission spectroscopy (OES) intensity of atomic fluorine (685.1 nm and 702.89 nm) and the voltages VH and VL—were investigated. The OES intensity data indicated a correlation between the bulk plasma density and the atomic fluorine density. The etch rate was proportional to the product of the OES intensity of atomic fluorine (I(F)) and the square root of the voltages (Vh+Vl) on the assumption that the velocity of the reactive fluorine was proportional to the square root of the voltages
- …