41 research outputs found
Deep regression learning with optimal loss function
In this paper, we develop a novel efficient and robust nonparametric
regression estimator under a framework of feedforward neural network. There are
several interesting characteristics for the proposed estimator. First, the loss
function is built upon an estimated maximum likelihood function, who integrates
the information from observed data, as well as the information from data
structure. Consequently, the resulting estimator has desirable optimal
properties, such as efficiency. Second, different from the traditional maximum
likelihood estimation (MLE), the proposed method avoid the specification of the
distribution, hence is flexible to any kind of distribution, such as heavy
tails, multimodal or heterogeneous distribution. Third, the proposed loss
function relies on probabilities rather than direct observations as in least
squares, contributing the robustness in the proposed estimator. Finally, the
proposed loss function involves nonparametric regression function only. This
enables a direct application of existing packages, simplifying the computation
and programming. We establish the large sample property of the proposed
estimator in terms of its excess risk and minimax near-optimal rate. The
theoretical results demonstrate that the proposed estimator is equivalent to
the true MLE in which the density function is known. Our simulation studies
show that the proposed estimator outperforms the existing methods in terms of
prediction accuracy, efficiency and robustness. Particularly, it is comparable
to the true MLE, and even gets better as the sample size increases. This
implies that the adaptive and data-driven loss function from the estimated
density may offer an additional avenue for capturing valuable information. We
further apply the proposed method to four real data examples, resulting in
significantly reduced out-of-sample prediction errors compared to existing
methods
Type-IV DCT, DST, and MDCT algorithms with reduced numbers of arithmetic operations
We present algorithms for the type-IV discrete cosine transform (DCT-IV) and
discrete sine transform (DST-IV), as well as for the modified discrete cosine
transform (MDCT) and its inverse, that achieve a lower count of real
multiplications and additions than previously published algorithms, without
sacrificing numerical accuracy. Asymptotically, the operation count is reduced
from ~2NlogN to ~(17/9)NlogN for a power-of-two transform size N, and the exact
count is strictly lowered for all N > 4. These results are derived by
considering the DCT to be a special case of a DFT of length 8N, with certain
symmetries, and then pruning redundant operations from a recent improved fast
Fourier transform algorithm (based on a recursive rescaling of the
conjugate-pair split radix algorithm). The improved algorithms for DST-IV and
MDCT follow immediately from the improved count for the DCT-IV.Comment: 11 page
Type-II/III DCT/DST algorithms with reduced number of arithmetic operations
We present algorithms for the discrete cosine transform (DCT) and discrete
sine transform (DST), of types II and III, that achieve a lower count of real
multiplications and additions than previously published algorithms, without
sacrificing numerical accuracy. Asymptotically, the operation count is reduced
from ~ 2N log_2 N to ~ (17/9) N log_2 N for a power-of-two transform size N.
Furthermore, we show that a further N multiplications may be saved by a certain
rescaling of the inputs or outputs, generalizing a well-known technique for N=8
by Arai et al. These results are derived by considering the DCT to be a special
case of a DFT of length 4N, with certain symmetries, and then pruning redundant
operations from a recent improved fast Fourier transform algorithm (based on a
recursive rescaling of the conjugate-pair split radix algorithm). The improved
algorithms for DCT-III, DST-II, and DST-III follow immediately from the
improved count for the DCT-II.Comment: 9 page
Qwen Technical Report
Large language models (LLMs) have revolutionized the field of artificial
intelligence, enabling natural language processing tasks that were previously
thought to be exclusive to humans. In this work, we introduce Qwen, the first
installment of our large language model series. Qwen is a comprehensive
language model series that encompasses distinct models with varying parameter
counts. It includes Qwen, the base pretrained language models, and Qwen-Chat,
the chat models finetuned with human alignment techniques. The base language
models consistently demonstrate superior performance across a multitude of
downstream tasks, and the chat models, particularly those trained using
Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The
chat models possess advanced tool-use and planning capabilities for creating
agent applications, showcasing impressive performance even when compared to
bigger models on complex tasks like utilizing a code interpreter. Furthermore,
we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as
well as mathematics-focused models, Math-Qwen-Chat, which are built upon base
language models. These models demonstrate significantly improved performance in
comparison with open-source models, and slightly fall behind the proprietary
models.Comment: 59 pages, 5 figure
Towards Semantics-Enhanced Pre-Training: Can Lexicon Definitions Help Learning Sentence Meanings?
Self-supervised pre-training techniques, albeit relying on large amounts of text, have enabled rapid growth in learning language representations for natural language understanding. However, as radically empirical models on sentences, they are subject to the input data distribution, inevitably incorporating data bias and reporting bias, which may lead to inaccurate understanding of sentences. To address this problem, we propose to adopt a human learner's approach: when we cannot make sense of a word in a sentence, we often consult the dictionary for specific meanings; but can the same work for empirical models? In this work, we try to inform the pre-trained masked language models of word meanings for semantics-enhanced pre-training. To achieve a contrastive and holistic view of word meanings, a definition pair of two related words is presented to the masked language model such that the model can better associate a word with its crucial semantic features. Both intrinsic and extrinsic evaluations validate the proposed approach on semantics-orientated tasks, with an almost negligible increase of training data
Process of overburden failure in steeply inclined multi-seam mining: insights from physical modelling
Ground surface damage caused by steeply inclined coal seam mining is widely distributed in China, but there is little research on the failure process and movement mechanism of strata induced by steeply inclined multi-seam mining. In this paper, a physical model test is carried out to study the failure process and movement mechanism of overburden in steeply inclined multi-seam stepwise mining. The results show that at the initial stage, the main failure of the rock mass is the small-scale collapse at the initial cut and the roof (stability stage of the rock mass). After the roof is exposed over a certain range, the rock mass in the downhill direction slips into the goaf and gradually destroys the interburdens of the goaf, similar to the displacement effect of dominoes (severe failure stage of the rock mass). When the structure of the goaf fails, the overburden subsides, causing extensive damage to the ground surface. The surface damage directly above the goaf is mainly caused by serious subsidence deformation, while the surface damage in the downhill direction is dominated by cracks