Search CORE

9,493 research outputs found

Supervised Learning Through the Lens of Compression

Author: Moran S.
Ofir D.
Yehudayoff A.
Publication venue
Publication date: 01/01/2016
Field of study

Language Modeling Is Compression

Author: Aitchison Matthew
Catt Elliot
Delétang Grégoire
Duquenne Paul-Ambroise
Genewein Tim
Grau-Moya Jordi
Hutter Marcus
Mattern Christopher
Orseau Laurent
Ruoss Anian
Veness Joel
Wenliang Li Kevin
Publication venue
Publication date: 19/09/2023
Field of study

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model

arXiv.org e-Print Archive

ARCHANGEL: Tamper-proofing Video Archives using Temporal Content Hashes on the Blockchain

Author: Bell Mark
Brown Alan
Bui Tu
Collomosse John
Cooper Daniel
Das Arindra
Green Alex
Higgins Jez
Keller Jared
Sheridan John
Thereaux Olivier
Publication venue
Publication date: 01/01/2019
Field of study

We present ARCHANGEL; a novel distributed ledger based system for assuring the long-term integrity of digital video archives. First, we describe a novel deep network architecture for computing compact temporal content hashes (TCHs) from audio-visual streams with durations of minutes or hours. Our TCHs are sensitive to accidental or malicious content modification (tampering) but invariant to the codec used to encode the video. This is necessary due to the curatorial requirement for archives to format shift video over time to ensure future accessibility. Second, we describe how the TCHs (and the models used to derive them) are secured via a proof-of-authority blockchain distributed across multiple independent archives. We report on the efficacy of ARCHANGEL within the context of a trial deployment in which the national government archives of the United Kingdom, Estonia and Norway participated.Comment: Accepted to CVPR Blockchain Workshop 201

arXiv.org e-Print Archive

Crossref

University of Surrey

Surrey Research Insight

UVSD: Software for Detection of Color Underwater Features

Author: Mamaenko Anton
Rzhanov Yuri
Yoklavich M
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/09/2005
Field of study

Underwater Video Spot Detector (UVSD) is a software package designed to analyze underwater video for continuous spatial measurements (path traveled, distance to the bottom, roughness of the surface etc.) Laser beams of known geometry are often used in underwater imagery to estimate the distance to the bottom. This estimation is based on the manual detection of laser spots which is labor intensive and time consuming so usually only a few frames can be processed this way. This allows for spatial measurements on single frames (distance to the bottom, size of objects on the sea-bottom), but not for the whole video transect. We propose algorithms and a software package implementing them for the semi-automatic detection of laser spots throughout a video which can significantly increase the effectiveness of spatial measurements. The algorithm for spot detection is based on the Support Vector Machines approach to Artificial Intelligence. The user is only required to specify on certain frames the points he or she thinks are laser dots (to train an SVM model), and then this model is used by the program to detect the laser dots on the rest of the video. As a result the precise (precision is only limited by quality of the video) spatial scale is set up for every frame. This can be used to improve video mosaics of the sea-bottom. The temporal correlation between spot movements changes and their shape provides the information about sediment roughness. Simultaneous spot movements indicate changing distance to the bottom; while uncorrelated changes indicate small local bumps. UVSD can be applied to quickly identify and quantify seafloor habitat patches, help visualize habitats and benthic organisms within large-scale landscapes, and estimate transect length and area surveyed along video transects

UNH Scholars' Repository