162 research outputs found
Study concept drift in 150-year english literature
The meaning of a concept or a word changes over time. Such concept drift reects the change of the social consensus as well. Studying concept drift over time is valuable for researchers who are interested in language or culture evolution. Recent word embedding technologies inspire us to automatically detect concept drift in large-scale corpora. However, comparing embeddings generated from different corpora is a complex task. In this paper, we propose to use a simple approach for detecting concept drift based on the change in word contexts from different time periods and apply it to subsequent time periods so that the detailed drift could be detected and visualised. We dive into certain words to track how the meaning of a word changes gradually over a long time span with relevant historical events which demonstrates the effect of our method
Adaptive Encoding Strategies for Erasing-Based Lossless Floating-Point Compression
Lossless floating-point time series compression is crucial for a wide range
of critical scenarios. Nevertheless, it is a big challenge to compress time
series losslessly due to the complex underlying layouts of floating-point
values. The state-of-the-art erasing-based compression algorithm Elf
demonstrates a rather impressive performance. We give an in-depth exploration
of the encoding strategies of Elf, and find that there is still much room for
improvement. In this paper, we propose Elf*, which employs a set of
optimizations for leading zeros, center bits and sharing condition.
Specifically, we develop a dynamic programming algorithm with a set of pruning
strategies to compute the adaptive approximation rules efficiently. We
theoretically prove that the adaptive approximation rules are globally optimal.
We further extend Elf* to Streaming Elf*, i.e., SElf*, which achieves almost
the same compression ratio as Elf*, while enjoying even higher efficiency in
streaming scenarios. We compare Elf* and SElf* with 8 competitors using 22
datasets. The results demonstrate that SElf* achieves 9.2% relative compression
ratio improvement over the best streaming competitor while maintaining similar
efficiency, and that Elf* ranks among the most competitive batch compressors.
All source codes are publicly released
Erasing-based lossless compression method for streaming floating-point time series
There are a prohibitively large number of floating-point time series data
generated at an unprecedentedly high rate. An efficient, compact and lossless
compression for time series data is of great importance for a wide range of
scenarios. Most existing lossless floating-point compression methods are based
on the XOR operation, but they do not fully exploit the trailing zeros, which
usually results in an unsatisfactory compression ratio. This paper proposes an
Erasing-based Lossless Floating-point compression algorithm, i.e., Elf. The
main idea of Elf is to erase the last few bits (i.e., set them to zero) of
floating-point values, so the XORed values are supposed to contain many
trailing zeros. The challenges of the erasing-based method are three-fold.
First, how to quickly determine the erased bits? Second, how to losslessly
recover the original data from the erased ones? Third, how to compactly encode
the erased data? Through rigorous mathematical analysis, Elf can directly
determine the erased bits and restore the original values without losing any
precision. To further improve the compression ratio, we propose a novel
encoding strategy for the XORed values with many trailing zeros. Furthermore,
observing the values in a time series usually have similar significand counts,
we propose an upgraded version of Elf named Elf+ by optimizing the significand
count encoding strategy, which improves the compression ratio and reduces the
running time further. Both Elf and Elf+ work in a streaming fashion. They take
only O(N) (where N is the length of a time series) in time and O(1) in space,
and achieve a notable compression ratio with a theoretical guarantee. Extensive
experiments using 22 datasets show the powerful performance of Elf and Elf+
compared with 9 advanced competitors for both double-precision and
single-precision floating-point values
Recovering Sign Bits of DCT Coefficients in Digital Images as an Optimization Problem
Recovering unknown, missing, damaged, distorted or lost information in DCT
coefficients is a common task in multiple applications of digital image
processing, including image compression, selective image encryption, and image
communications. This paper investigates recovery of a special type of
information in DCT coefficients of digital images: sign bits. This problem can
be modelled as a mixed integer linear programming (MILP) problem, which is
NP-hard in general. To efficiently solve the problem, we propose two
approximation methods: 1) a relaxation-based method that convert the MILP
problem to a linear programming (LP) problem; 2) a divide-and-conquer method
which splits the target image into sufficiently small regions, each of which
can be more efficiently solved as an MILP problem, and then conducts a global
optimization phase as a smaller MILP problem or an LP problem to maximize
smoothness across different regions. To the best of our knowledge, we are the
first who considered how to use global optimization to recover sign bits of DCT
coefficients. We considered how the proposed methods can be applied to
JPEG-encoded images and conducted extensive experiments to validate the
performances of our proposed methods. The experimental results showed that the
proposed methods worked well, especially when the number of unknown sign bits
per DCT block is not too large. Compared with other existing methods, which are
all based on simple error-concealment strategies, our proposed methods
outperformed them with a substantial margin, both according to objective
quality metrics (PSNR and SSIM) and also our subjective evaluation. Our work
has a number of profound implications, e.g., more sign bits can be discarded to
develop more efficient image compression methods, and image encryption methods
based on sign bit encryption can be less secure than we previously understood.Comment: 13 pages, 8 figure
Scalable Geometric Fracture Assembly via Co-creation Space among Assemblers
Geometric fracture assembly presents a challenging practical task in
archaeology and 3D computer vision. Previous methods have focused solely on
assembling fragments based on semantic information, which has limited the
quantity of objects that can be effectively assembled. Therefore, there is a
need to develop a scalable framework for geometric fracture assembly without
relying on semantic information. To improve the effectiveness of assembling
geometric fractures without semantic information, we propose a co-creation
space comprising several assemblers capable of gradually and unambiguously
assembling fractures. Additionally, we introduce a novel loss function, i.e.,
the geometric-based collision loss, to address collision issues during the
fracture assembly process and enhance the results. Our framework exhibits
better performance on both PartNet and Breaking Bad datasets compared to
existing state-of-the-art frameworks. Extensive experiments and quantitative
comparisons demonstrate the effectiveness of our proposed framework, which
features linear computational complexity, enhanced abstraction, and improved
generalization. Our code is publicly available at
https://github.com/Ruiyuan-Zhang/CCS.Comment: AAAI202
Real-time frequency measurement based on parallel pipeline FFT for time-stretched acquisition system
Real-time frequency measurement for non-repetitive and statistically rare
signals are challenging problems in the electronic measurement area, which
places high demands on the bandwidth, sampling rate, data processing and
transmission capabilities of the measurement system. The time-stretching
sampling system overcomes the bandwidth limitation and sampling rate limitation
of electronic digitizers, allowing continuous ultra-high-speed acquisition at
refresh rates of billions of frames per second. However, processing the high
sampling rate signals of hundreds of GHz is an extremely challenging task,
which becomes the bottleneck of the real-time analysis for non-stationary
signals. In this work, a real-time frequency measurement system is designed
based on a parallel pipelined FFT structure. Tens of FFT channels are pipelined
to process the incoming high sampling rate signals in sequence, and a
simplified parabola fitting algorithm is implemented in the FFT channel to
improve the frequency precision. The frequency results of these FFT channels
are reorganized and finally uploaded to an industrial personal computer for
visualization and offline data mining. A real-time transmission datapath is
designed to provide a high throughput rate transmission, ensuring the frequency
results are uploaded without interruption. Several experiments are performed to
evaluate the designed real-time frequency measurement system, the input signal
has a bandwidth of 4 GHz, and the repetition rate of frames is 22 MHz.
Experimental results show that the frequency of the signal can be measured at a
high sampling rate of 20 GSPS, and the frequency precision is better than 1
MHz.Comment: 11 pages, 14 figure
The compensation incentive effect of athletes: A structural equation model
This study explores the compensation incentive effect of athletes. Based on the related literature, we proposed theoretical hypotheses on the compensation incentive effect and established an assessment index system of the compensation incentive effect for athletes. A structural equation model was used to test the survey data of 352 athletes in six provinces to discover the truth of the compensation incentive effect. The results suggested that direct economic compensation satisfaction, direct non-economic compensation satisfaction, and indirect non-economic compensation satisfaction had significant positive effects on the compensation incentive effect of athletes, while indirect economic compensation satisfaction showed no significant effect. Moreover, the evaluation results of athletes’ compensation incentive effect showed that direct economic compensation satisfaction contributed the most to the influence factor of the compensation incentive effect. Therefore, the evaluation of athletes’ compensation incentive effect should focus on variables of direct economic compensation satisfaction, i.e., basic compensation satisfaction, bonus income satisfaction, and subsidy satisfaction. Finally, some strategies and recommendations were suggested to improve the compensation design for athletes
- …