148 research outputs found

    Study concept drift in 150-year english literature

    Get PDF
    The meaning of a concept or a word changes over time. Such concept drift reects the change of the social consensus as well. Studying concept drift over time is valuable for researchers who are interested in language or culture evolution. Recent word embedding technologies inspire us to automatically detect concept drift in large-scale corpora. However, comparing embeddings generated from different corpora is a complex task. In this paper, we propose to use a simple approach for detecting concept drift based on the change in word contexts from different time periods and apply it to subsequent time periods so that the detailed drift could be detected and visualised. We dive into certain words to track how the meaning of a word changes gradually over a long time span with relevant historical events which demonstrates the effect of our method

    Adaptive Encoding Strategies for Erasing-Based Lossless Floating-Point Compression

    Full text link
    Lossless floating-point time series compression is crucial for a wide range of critical scenarios. Nevertheless, it is a big challenge to compress time series losslessly due to the complex underlying layouts of floating-point values. The state-of-the-art erasing-based compression algorithm Elf demonstrates a rather impressive performance. We give an in-depth exploration of the encoding strategies of Elf, and find that there is still much room for improvement. In this paper, we propose Elf*, which employs a set of optimizations for leading zeros, center bits and sharing condition. Specifically, we develop a dynamic programming algorithm with a set of pruning strategies to compute the adaptive approximation rules efficiently. We theoretically prove that the adaptive approximation rules are globally optimal. We further extend Elf* to Streaming Elf*, i.e., SElf*, which achieves almost the same compression ratio as Elf*, while enjoying even higher efficiency in streaming scenarios. We compare Elf* and SElf* with 8 competitors using 22 datasets. The results demonstrate that SElf* achieves 9.2% relative compression ratio improvement over the best streaming competitor while maintaining similar efficiency, and that Elf* ranks among the most competitive batch compressors. All source codes are publicly released

    Erasing-based lossless compression method for streaming floating-point time series

    Full text link
    There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e., Elf. The main idea of Elf is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis, Elf can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros. Furthermore, observing the values in a time series usually have similar significand counts, we propose an upgraded version of Elf named Elf+ by optimizing the significand count encoding strategy, which improves the compression ratio and reduces the running time further. Both Elf and Elf+ work in a streaming fashion. They take only O(N) (where N is the length of a time series) in time and O(1) in space, and achieve a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of Elf and Elf+ compared with 9 advanced competitors for both double-precision and single-precision floating-point values

    Recovering Sign Bits of DCT Coefficients in Digital Images as an Optimization Problem

    Full text link
    Recovering unknown, missing, damaged, distorted or lost information in DCT coefficients is a common task in multiple applications of digital image processing, including image compression, selective image encryption, and image communications. This paper investigates recovery of a special type of information in DCT coefficients of digital images: sign bits. This problem can be modelled as a mixed integer linear programming (MILP) problem, which is NP-hard in general. To efficiently solve the problem, we propose two approximation methods: 1) a relaxation-based method that convert the MILP problem to a linear programming (LP) problem; 2) a divide-and-conquer method which splits the target image into sufficiently small regions, each of which can be more efficiently solved as an MILP problem, and then conducts a global optimization phase as a smaller MILP problem or an LP problem to maximize smoothness across different regions. To the best of our knowledge, we are the first who considered how to use global optimization to recover sign bits of DCT coefficients. We considered how the proposed methods can be applied to JPEG-encoded images and conducted extensive experiments to validate the performances of our proposed methods. The experimental results showed that the proposed methods worked well, especially when the number of unknown sign bits per DCT block is not too large. Compared with other existing methods, which are all based on simple error-concealment strategies, our proposed methods outperformed them with a substantial margin, both according to objective quality metrics (PSNR and SSIM) and also our subjective evaluation. Our work has a number of profound implications, e.g., more sign bits can be discarded to develop more efficient image compression methods, and image encryption methods based on sign bit encryption can be less secure than we previously understood.Comment: 13 pages, 8 figure

    Real-time frequency measurement based on parallel pipeline FFT for time-stretched acquisition system

    Full text link
    Real-time frequency measurement for non-repetitive and statistically rare signals are challenging problems in the electronic measurement area, which places high demands on the bandwidth, sampling rate, data processing and transmission capabilities of the measurement system. The time-stretching sampling system overcomes the bandwidth limitation and sampling rate limitation of electronic digitizers, allowing continuous ultra-high-speed acquisition at refresh rates of billions of frames per second. However, processing the high sampling rate signals of hundreds of GHz is an extremely challenging task, which becomes the bottleneck of the real-time analysis for non-stationary signals. In this work, a real-time frequency measurement system is designed based on a parallel pipelined FFT structure. Tens of FFT channels are pipelined to process the incoming high sampling rate signals in sequence, and a simplified parabola fitting algorithm is implemented in the FFT channel to improve the frequency precision. The frequency results of these FFT channels are reorganized and finally uploaded to an industrial personal computer for visualization and offline data mining. A real-time transmission datapath is designed to provide a high throughput rate transmission, ensuring the frequency results are uploaded without interruption. Several experiments are performed to evaluate the designed real-time frequency measurement system, the input signal has a bandwidth of 4 GHz, and the repetition rate of frames is 22 MHz. Experimental results show that the frequency of the signal can be measured at a high sampling rate of 20 GSPS, and the frequency precision is better than 1 MHz.Comment: 11 pages, 14 figure

    The compensation incentive effect of athletes: A structural equation model

    Get PDF
    This study explores the compensation incentive effect of athletes. Based on the related literature, we proposed theoretical hypotheses on the compensation incentive effect and established an assessment index system of the compensation incentive effect for athletes. A structural equation model was used to test the survey data of 352 athletes in six provinces to discover the truth of the compensation incentive effect. The results suggested that direct economic compensation satisfaction, direct non-economic compensation satisfaction, and indirect non-economic compensation satisfaction had significant positive effects on the compensation incentive effect of athletes, while indirect economic compensation satisfaction showed no significant effect. Moreover, the evaluation results of athletes’ compensation incentive effect showed that direct economic compensation satisfaction contributed the most to the influence factor of the compensation incentive effect. Therefore, the evaluation of athletes’ compensation incentive effect should focus on variables of direct economic compensation satisfaction, i.e., basic compensation satisfaction, bonus income satisfaction, and subsidy satisfaction. Finally, some strategies and recommendations were suggested to improve the compensation design for athletes
    corecore