Search CORE

562 research outputs found

Tokenization with Factorized Subword Encoding

Author: Samuel David
Øvrelid Lilja
Publication venue
Publication date: 13/06/2023
Field of study

In recent years, language models have become increasingly larger and more complex. However, the input representations for these models continue to rely on simple and greedy subword tokenization methods. In this paper, we propose a novel tokenization method that factorizes subwords onto discrete triplets using a VQ-VAE model. The effectiveness of the proposed tokenization method, referred to as the Factorizer, is evaluated on language modeling and morpho-syntactic tasks for 7 diverse languages. Results indicate that this method is more appropriate and robust for morphological tasks than the commonly used byte-pair encoding (BPE) tokenization algorithm.Comment: Findings of ACL 202

arXiv.org e-Print Archive

How Daily Operational Meetings Can Support Transformation To A Lean Improvement Culture

Author: Hansen David
Jørgensen Rasmus
Lilja Johan
Publication venue
Publication date: 01/01/2017
Field of study

Online Research Database In Technology

How Positive Practices Can Accelerate Transformation To a Lean Improvement Culture

Author: Hansen David
Jørgensen Rasmus
Lilja Johan
Publication venue
Publication date: 01/01/2016
Field of study

Online Research Database In Technology

Architectural support for probabilistic branches

Author: Adileh Almutaz
Eeckhout Lieven
Lilja David J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

A plethora of research efforts have focused on fine-tuning branch predictors to increasingly higher levels of accuracy. However, several important optimization, financial, and statistical data analysis algorithms rely on probabilistic computation. These applications draw random values from a distribution and steer control flow based on those values. Such probabilistic branches are challenging to predict because of their inherent probabilistic nature. As a result, probabilistic codes significantly suffer from branch mispredictions. This paper proposes Probabilistic Branch Support (PBS), a hardware/software cooperative technique that leverages the observation that the outcome of probabilistic branches needs to be correct only in a statistical sense. PBS stores the outcome and the probabilistic values that lead to the outcome of the current execution to direct the next execution of the probabilistic branch, thereby completely removing the penalty for mispredicted probabilistic branches. PBS relies on marking probabilistic branches in software for hardware to exploit. Our evaluation shows that PBS improves MPKI by 45% on average (and up to 99%) and IPC by 6.7% (up to 17%) over the TAGE-SC-L predictor. PBS requires 193 bytes of hardware overhead and introduces statistically negligible algorithmic inaccuracy

Crossref

Ghent University Academic Bibliography

Trained on 100 million words and still in shape: BERT meets British National Corpus

Author: Kutuzov Andrey
Samuel David
Velldal Erik
Øvrelid Lilja
Publication venue
Publication date: 29/03/2023
Field of study

While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British National Corpus. We show that pre-training on this carefully curated corpus can reach better performance than the original BERT model. We argue that this type of corpora has great potential as a language modeling benchmark. To showcase this potential, we present fair, reproducible and data-efficient comparative studies of LMs, in which we evaluate several training objectives and model architectures and replicate previous empirical results in a systematic way. We propose an optimized LM architecture called LTG-BERT.Comment: Accepted to EACL 202

arXiv.org e-Print Archive

Z-score maps from low-dose 18F-FDG PET of the brain in neurodegenerative dementia

Author: Danfors Torsten
Fällmar David
Iyer Victor
Kilander Lena
Larsson Elna-Marie
Lilja Johan
Lubberink Mark
Sörensen Jens
Publication venue
Publication date: 01/01/2018
Field of study

VBN

Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education

Author: Boykin P. Oscar
Figueiredo Renato
Fortes Jose A. B.
John Lizy
Kaeli David
Li Tao
Lilja David
McKee Sally
Memik Gokhan
Peir Jie-Kwon
Roy Alain
Tyson Gary
Wolinsky David
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This paper introduces Archer, a community-based computing resource for computer architecture research and education. The Archer infrastructure integrates virtualization and batch scheduling middleware to deliver high-throughput computing resources aggregated from resources distributed across wide-area networks and owned by different participating entities in a seamless manner. The paper discusses the motivations leading to the design of Archer, describes its core middleware components, and presents an analysis of the functionality and performance of a prototype wide-area deployment running a representative computer architecture simulation workload.Comment: 11 pages, 2 figures. Describes the Archer project, http://archer-project.or

arXiv.org e-Print Archive

CiteSeerX

Crossref

Prostate specific antigen concentration at age 60 and death or metastasis from prostate cancer: case-control study

Author: Bjartell Anders
Björk Thomas
Cronin Angel M
Dahlin Anders
Lilja Hans
Manjer Jonas
Nilsson Peter M
Scardino Peter T
Ulmert David
Vickers Andrew J
Publication venue: BMJ Publishing Group Ltd.
Publication date: 01/01/2010
Field of study

Objective To determine the relation between concentrations of prostate specific antigen at age 60 and subsequent diagnosis of clinically relevant prostate cancer in an unscreened population to evaluate whether screening for prostate cancer and chemoprevention could be stratified by risk

Crossref

Lund University Publications

PubMed Central

Oxford University Research Archive

A reconfigurable stochastic architecture for highly reliable computing

Author: David J. Lilja
Kia Bazargan
Marc D. Riedel
Weikang Qian
Xin Li
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Mounting concerns over variability, defects and noise motivate a new approach for integrated circuits: the design of stochastic logic, that is to say, digital circuitry that operates on probabilistic signals, and so can cope with errors and uncertainty. Techniques for prob-abilistic analysis are well established. We advocate a strategy for synthesis. In this paper, we present a reconfigurable architecture that implements the computation of arbitrary continuous functions with stochastic logic. We analyze the sources of error: approxima-tion, quantization, and random fluctuations. We demonstrate the ef-fectiveness of our method on a collection of benchmarks for image processing. Synthesis trials show that our stochastic architecture requires less area than conventional hardware implementations. It achieves a large speed up compared to software conventional im-plementations. Most importantly, it is much more tolerant of soft errors (bit flips) than these deterministic implementations

CiteSeerX

Crossref