5 research outputs found
A General SIMD-based Approach to Accelerating Compression Algorithms
Compression algorithms are important for data oriented tasks, especially in
the era of Big Data. Modern processors equipped with powerful SIMD instruction
sets, provide us an opportunity for achieving better compression performance.
Previous research has shown that SIMD-based optimizations can multiply decoding
speeds. Following these pioneering studies, we propose a general approach to
accelerate compression algorithms. By instantiating the approach, we have
developed several novel integer compression algorithms, called Group-Simple,
Group-Scheme, Group-AFOR, and Group-PFD, and implemented their corresponding
vectorized versions. We evaluate the proposed algorithms on two public TREC
datasets, a Wikipedia dataset and a Twitter dataset. With competitive
compression ratios and encoding speeds, our SIMD-based algorithms outperform
state-of-the-art non-vectorized algorithms with respect to decoding speeds
Statistical Modeling to Information Retrieval for Searching from Big Text Data and Higher Order Inference for Reliability
This thesis examined two research projects: probabilistic information retrieval modeling and third-order inference on reliability.
In the first part of this dissertation, two research topics in the information retrieval are carried out and experimented on large-scale text data set. First, we conduct an in-depth study of relationship between information of document length and document relevance to user need. Two statistical methods are proposed which incorporates document length as a substantial weighting factor to achieve higher retrieval performance. Second, we utilize the property of survival function to propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval, and to model the proximity between query terms to improve retrieval performance. Through extensive experiments on standard TREC collections, our proposed models perform significantly better than the classical probabilistic information retrieval models.
In the second part of this dissertation, a small sample asymptotic method is proposed for higher order inference in the stress-strength reliability model, R=P(Y<X), where X and Y are independently distributed. A penalized likelihood method is proposed to handle the numerical complications of maximizing the constrained likelihood model. Simulation studies are conducted on two distributions: Burr type X distribution and exponentiated exponential distribution. Results from simulation studies show that the proposed method is very accurate even when the sample sizes are small