2,308 research outputs found
Diversification Based Static Index Pruning - Application to Temporal Collections
Nowadays, web archives preserve the history of large portions of the web. As
medias are shifting from printed to digital editions, accessing these huge
information sources is drawing increasingly more attention from national and
international institutions, as well as from the research community. These
collections are intrinsically big, leading to index files that do not fit into
the memory and an increase query response time. Decreasing the index size is a
direct way to decrease this query response time.
Static index pruning methods reduce the size of indexes by removing a part of
the postings. In the context of web archives, it is necessary to remove
postings while preserving the temporal diversity of the archive. None of the
existing pruning approaches take (temporal) diversification into account.
In this paper, we propose a diversification-based static index pruning
method. It differs from the existing pruning approaches by integrating
diversification within the pruning context. We aim at pruning the index while
preserving retrieval effectiveness and diversity by pruning while maximizing a
given IR evaluation metric like DCG. We show how to apply this approach in the
context of web archives. Finally, we show on two collections that search
effectiveness in temporal collections after pruning can be improved using our
approach rather than diversity oblivious approaches
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Parallel Streaming Frequency-Based Aggregates
We present efficient parallel streaming algorithms for fundamental frequency-based aggregates in both the sliding window and the infinite window settings. In the sliding window setting, we give a parallel algorithm for maintaining a space-bounded block counter (SBBC). Using SBBC, we derive algorithms for basic counting, frequency estimation, and heavy hitters that perform no more work than their best sequential counterparts. In the infinite window setting, we present algorithms for frequency estimation, heavy hitters, and count-min sketch. For both the infinite window and sliding window settings, our parallel algorithms process a minibatch of items using linear work and polylog parallel depth. We also prove a lower bound showing that the work of the parallel algorithm is optimal in the case of heavy hitters and frequency estimation. To our knowledge, these are the first parallel algorithms for these problems that are provably work efficient and have low depth
XONN: XNOR-based Oblivious Deep Neural Network Inference
Advancements in deep learning enable cloud servers to provide
inference-as-a-service for clients. In this scenario, clients send their raw
data to the server to run the deep learning model and send back the results.
One standing challenge in this setting is to ensure the privacy of the clients'
sensitive data. Oblivious inference is the task of running the neural network
on the client's input without disclosing the input or the result to the server.
This paper introduces XONN, a novel end-to-end framework based on Yao's Garbled
Circuits (GC) protocol, that provides a paradigm shift in the conceptual and
practical realization of oblivious inference. In XONN, the costly
matrix-multiplication operations of the deep learning model are replaced with
XNOR operations that are essentially free in GC. We further provide a novel
algorithm that customizes the neural network such that the runtime of the GC
protocol is minimized without sacrificing the inference accuracy.
We design a user-friendly high-level API for XONN, allowing expression of the
deep learning model architecture in an unprecedented level of abstraction.
Extensive proof-of-concept evaluation on various neural network architectures
demonstrates that XONN outperforms prior art such as Gazelle (USENIX
Security'18) by up to 7x, MiniONN (ACM CCS'17) by 93x, and SecureML (IEEE
S&P'17) by 37x. State-of-the-art frameworks require one round of interaction
between the client and the server for each layer of the neural network,
whereas, XONN requires a constant round of interactions for any number of
layers in the model. XONN is first to perform oblivious inference on Fitnet
architectures with up to 21 layers, suggesting a new level of scalability
compared with state-of-the-art. Moreover, we evaluate XONN on four datasets to
perform privacy-preserving medical diagnosis.Comment: To appear in USENIX Security 201
Explainable adaptation of time series forecasting
A time series is a collection of data points captured over time, commonly found in many fields such as healthcare, manufacturing, and transportation. Accurately predicting the future behavior of a time series is crucial for decision-making, and several Machine Learning (ML) models have been applied to solve this task. However, changes in the time series, known as concept drift, can affect model generalization to future data, requiring thus online adaptive forecasting methods. This thesis aims to extend the State-of-the-Art (SoA) in the ML literature for time series forecasting by developing novel online adaptive methods.
The first part focuses on online time series forecasting, including a framework for selecting time series variables and developing ensemble models that are adaptive to changes in time series data and model performance. Empirical results show the usefulness and competitiveness of the developed methods and their contribution to the explainability of both model selection and ensemble pruning processes. Regarding the second part, the thesis contributes to the literature on online ML model-based quality prediction for three Industry 4.0 applications: NC-milling, bolt installation in the automotive industry, and Surface Mount Technology (SMT) in electronics manufacturing. The thesis shows how process simulation can be used to generate additional knowledge and how such knowledge can be integrated efficiently into the ML process. The thesis also presents two applications of explainable model-based quality prediction and their impact on smart industry practices
- …