562 research outputs found
Tokenization with Factorized Subword Encoding
In recent years, language models have become increasingly larger and more
complex. However, the input representations for these models continue to rely
on simple and greedy subword tokenization methods. In this paper, we propose a
novel tokenization method that factorizes subwords onto discrete triplets using
a VQ-VAE model. The effectiveness of the proposed tokenization method, referred
to as the Factorizer, is evaluated on language modeling and morpho-syntactic
tasks for 7 diverse languages. Results indicate that this method is more
appropriate and robust for morphological tasks than the commonly used byte-pair
encoding (BPE) tokenization algorithm.Comment: Findings of ACL 202
Architectural support for probabilistic branches
A plethora of research efforts have focused on fine-tuning branch predictors to increasingly higher levels of accuracy. However, several important optimization, financial, and statistical data analysis algorithms rely on probabilistic computation. These applications draw random values from a distribution and steer control flow based on those values. Such probabilistic branches are challenging to predict because of their inherent probabilistic nature. As a result, probabilistic codes significantly suffer from branch mispredictions.
This paper proposes Probabilistic Branch Support (PBS), a hardware/software cooperative technique that leverages the observation that the outcome of probabilistic branches needs to be correct only in a statistical sense. PBS stores the outcome and the probabilistic values that lead to the outcome of the current execution to direct the next execution of the probabilistic branch, thereby completely removing the penalty for mispredicted probabilistic branches. PBS relies on marking probabilistic branches in software for hardware to exploit. Our evaluation shows that PBS improves MPKI by 45% on average (and up to 99%) and IPC by 6.7% (up to 17%) over the TAGE-SC-L predictor. PBS requires 193 bytes of hardware overhead and introduces statistically negligible algorithmic inaccuracy
Trained on 100 million words and still in shape: BERT meets British National Corpus
While modern masked language models (LMs) are trained on ever larger corpora,
we here explore the effects of down-scaling training to a modestly-sized but
representative, well-balanced, and publicly available English text source --
the British National Corpus. We show that pre-training on this carefully
curated corpus can reach better performance than the original BERT model. We
argue that this type of corpora has great potential as a language modeling
benchmark. To showcase this potential, we present fair, reproducible and
data-efficient comparative studies of LMs, in which we evaluate several
training objectives and model architectures and replicate previous empirical
results in a systematic way. We propose an optimized LM architecture called
LTG-BERT.Comment: Accepted to EACL 202
Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education
This paper introduces Archer, a community-based computing resource for
computer architecture research and education. The Archer infrastructure
integrates virtualization and batch scheduling middleware to deliver
high-throughput computing resources aggregated from resources distributed
across wide-area networks and owned by different participating entities in a
seamless manner. The paper discusses the motivations leading to the design of
Archer, describes its core middleware components, and presents an analysis of
the functionality and performance of a prototype wide-area deployment running a
representative computer architecture simulation workload.Comment: 11 pages, 2 figures. Describes the Archer project,
http://archer-project.or
Prostate specific antigen concentration at age 60 and death or metastasis from prostate cancer: case-control study
Objective To determine the relation between concentrations of prostate specific antigen at age 60 and subsequent diagnosis of clinically relevant prostate cancer in an unscreened population to evaluate whether screening for prostate cancer and chemoprevention could be stratified by risk
A reconfigurable stochastic architecture for highly reliable computing
Mounting concerns over variability, defects and noise motivate a new approach for integrated circuits: the design of stochastic logic, that is to say, digital circuitry that operates on probabilistic signals, and so can cope with errors and uncertainty. Techniques for prob-abilistic analysis are well established. We advocate a strategy for synthesis. In this paper, we present a reconfigurable architecture that implements the computation of arbitrary continuous functions with stochastic logic. We analyze the sources of error: approxima-tion, quantization, and random fluctuations. We demonstrate the ef-fectiveness of our method on a collection of benchmarks for image processing. Synthesis trials show that our stochastic architecture requires less area than conventional hardware implementations. It achieves a large speed up compared to software conventional im-plementations. Most importantly, it is much more tolerant of soft errors (bit flips) than these deterministic implementations
- …