Search CORE

819 research outputs found

Voronoi-Based Compact Image Descriptors: Efficient Region-of-Interest Retrieval With VLAD and Deep-Learning-Based Descriptors

Author: Andreopoulos Y
Chadha A
Publication venue
Publication date: 23/02/2017
Field of study

We investigate the problem of image retrieval based on visual queries when the latter comprise arbitrary regionsof- interest (ROI) rather than entire images. Our proposal is a compact image descriptor that combines the state-of-the-art in content-based descriptor extraction with a multi-level, Voronoibased spatial partitioning of each dataset image. The proposed multi-level Voronoi-based encoding uses a spatial hierarchical K-means over interest-point locations, and computes a contentbased descriptor over each cell. In order to reduce the matching complexity with minimal or no sacrifice in retrieval performance: (i) we utilize the tree structure of the spatial hierarchical Kmeans to perform a top-to-bottom pruning for local similarity maxima; (ii) we propose a new image similarity score that combines relevant information from all partition levels into a single measure for similarity; (iii) we combine our proposal with a novel and efficient approach for optimal bit allocation within quantized descriptor representations. By deriving both a Voronoi-based VLAD descriptor (termed as Fast-VVLAD) and a Voronoi-based deep convolutional neural network (CNN) descriptor (termed as Fast-VDCNN), we demonstrate that our Voronoi-based framework is agnostic to the descriptor basis, and can easily be slotted into existing frameworks. Via a range of ROI queries in two standard datasets, it is shown that the Voronoibased descriptors achieve comparable or higher mean Average Precision against conventional grid-based spatial search, while offering more than two-fold reduction in complexity. Finally, beyond ROI queries, we show that Voronoi partitioning improves the geometric invariance of compact CNN descriptors, thereby resulting in competitive performance to the current state-of-theart on whole image retrieval

arXiv.org e-Print Archive

UCL Discovery

Algorithms for continuous queries: A geometric approach

Author: Yu Albert
Publication venue
Publication date
Field of study

There has been an unprecedented growth in both the amount of data and the number of users interested in different types of data. Users often want to keep track of the data that match their interests over a period of time. A continuous query, once issued by a user, maintains the matching results for the user as new data (as well as updates to the existing data) continue to arrive in a stream. However, supporting potentially millions of continuous queries is a huge challenge. This dissertation addresses the problem of scalably processing a large number of continuous queries over a wide-area network. Conceptually, the task of supporting distributed continuous queries can be divided into two components--event processing (computing the set of affected users for each data update) and notification dissemination (notifying the set of affected users). The first part of this dissertation focuses on event processing. Since interacting with large-scale data can easily frustrate and overwhelm the users, top-k queries have attracted considerable interest from the database community as they allow users to focus on the top-ranked results only. However, it is nearly impossible to find a set of common top-ranked data that everyone is interested in, therefore, users are allowed to specify their interest in different forms of preferences, such as personalized ranking function and range selection. This dissertation presents geometric frameworks, data structures, and algorithms for answering several types of preference queries efficiently. Experimental evaluations show that our approaches outperform the previous ones by orders of magnitude.The second part of the dissertation presents comprehensive solutions to the problem of processing and notifying a large number of continuous range top-k queries across a wide-area network. Simple solutions include using a content-driven network to notify all continuous queries whose ranges contain the update (ignoring top-k), or using a server to compute only the affected continuous queries and notifying them individually. The former solution generates too much network traffic, while the latter overwhelms the server. This dissertation presents a geometric framework which allows the set of affected continuous queries to be described succinctly with messages that can be efficiently disseminated using content-driven networks. Fast algorithms are also developed to reformulate each update into a set of messages whose number is provably optimal, with or without knowing all continuous queries. The final component of this dissertation is the design of a wide-area dissemination network for continuous range queries. In particular, this dissertation addresses the problem of assigning users to servers in a wide-area content-based publish/subscribe system. A good assignment should consider both users' interests and locations, and balance multiple performance criteria including bandwidth, delay, and load balance. This dissertation presents a Monte Carlo approximation algorithm as well as a simple greedy algorithm. The Monte Carlo algorithm jointly considers multiple performance criteria to find a broker-subscriber assignment and provides theoretical performance guarantees. Using this algorithm as a yardstick, the greedy algorithm is also concluded to work well across a wide range of workloads.Dissertatio

DukeSpace

Recommended from our members

Information Losses in Neural Classifiers With Applications to Training Data Selection Strategies and Cyber Physical Systems

Author: Foggo Brandon James
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

This dissertation considers the subject of information losses arising from finite datasets used in the training of neural classifiers. It proves a relationship between such losses and the product of the expected total variation of the estimated neural model with the information about the feature space contained in the hidden representation of that model. It then bounds this expected total variation as a function of the size of randomly sampled datasets in a fairly general setting, and without bringing in any additional dependence on model complexity. It ultimately obtains bounds on information losses that are less sensitive to input compression and much tighter than existing bounds. It then uses these bounds to explain some recent experimental findings of information compression in neural networks which cannot be explained by previous work. The dissertation goes on to provide analytical derivations for the relationship between neural architectures and the mutual information contained in their representations, which can be useful for guided architecture selection schemes. It then uses these developments to propose and illustrate a new framework for analyzing training data selection methods. The dissertation use this framework to prove that facility location methods reduce these losses, and then derive a new data dependent bound on them. This bound can be used to evaluate datasets and acts as an additional analytical tool for the study of data selection techniques. The dissertation then applies this theory to the problem of Phase Identification in power distribution systems. In particular, it focuses on improving supervised learning accuracies by exploiting some of the problem's information theoretic properties. This focus, along with the advances developed earlier in this work, helps us create two new Phase Identification techniques. The first transforms the bound on information losses into a data selection technique. This is important because phase identification data labels are difficult to obtain in practice. The second interprets the properties of distribution systems in the terms of the information losses developed earlier in the dissertation. This allows us to obtain an improvement in the representation learned by any classifier applied to the problem. Furthermore, since many problems in cyber-physical systems share similarities to the physical properties of phase identification exploited in this dissertation, the techniques can be applied to a wide range of similar problems

eScholarship - University of California

Pervasive Data Access in Wireless and Mobile Computing Environments

Author: Lee Ken C. K.
Lee Wang-Chien
Madria Sanjay Kumar
Publication venue: Scholars\u27 Mine
Publication date: 12/09/2006
Field of study

The rapid advance of wireless and portable computing technology has brought a lot of research interests and momentum to the area of mobile computing. One of the research focus is on pervasive data access. with wireless connections, users can access information at any place at any time. However, various constraints such as limited client capability, limited bandwidth, weak connectivity, and client mobility impose many challenging technical issues. In the past years, tremendous research efforts have been put forth to address the issues related to pervasive data access. A number of interesting research results were reported in the literature. This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching. In addition, data access techniques aiming at various application requirements (such as time, location, semantics and reliability) are covered

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Multidimensional access methods

Author: ABEL D. J.
ABEL D. J.
ANG C.
AREF W. G.
BAYER R.
BAYER R.
BECKER B.
BECKMANN N.
BELUSSI A.
BENTLEY J. L.
BERCHTOLD S.
BLANKEN H.
BRINKHOFF T.
BRINKHOFF T.
BRINKHOFF T.
BRINKHOFF T.
BRODSKY A.
BURKHARD W.
BURKHARD W.A.
EVANGELIDIS G.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FINKEL R.
FLAJOLET P.
FRANK A.
FREESTON M.
FREESTON M.
FREESTON M.
FREESTON M.
FREESTON M.
GAEDE V.
GAEDE V.
GAEDE V.
GAEDE V.
GREENE D.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUTING R. H.
GUTING R. H.
GUTTMAN A.
HELLERSTEIN J. M.
HELLERSTEIN J. M.
HENRICH A.
HENRICH A.
HENRICH A.
HENRICH A.
HOEL E. G.
HOEL E. G.
HUTFLESZ A.
HUTFLESZ A.
HUTFLESZ A.
HUTFLESZ A.
JAGADISH H. V.
JAGADISH H. V.
JAGADISH H.V.
KAMEL I.
KAMEL I.
KAMEL I.
KAMEL I.
KANELLAKIS P. C.
KEDEM G.
KLINGER A.
KNOTT G.
KOLOVSON C.
KORNACKER M.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KUMAR A.
LARSON P.A.
LIN K.-I.
LITWIN W.
LOMET D. B.
LOMET D. B.
LOMET D.B.
MATSUYAMA T.
MCDONELL K. J.
NELSON R.
NG R. T.
NG V.
NG V.
NIEVERGELT
NIEVERGELT ICHS
OHSAWA Y.
OHSAWA Y.
Oliver Günther
OoI
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J. A.
OTOO E. J.
OTOO E. J.
OTOO E. J.
OUKSEL M.
OUKSEL M.
PAGEL B. U.
PAGEL B. U.
PAGEL B. U.
PAPADIAS D.
PAPADOPOULOS A.
RAVISHANKAR C.
ROBINSON J.T.
ROTEM D.
ROUSSOPOULOS N.
ROUSSOPOULOS N.
SCHNEIDER R.
SCHOLL M.
SEEGER B.
SEEGER B.
SEEGER B.
SELLIS T.
SEVCIK K.
SEXTON P.
SHEKHAR S.
SIEMENS
SIX H.
SMITH T. R.
STONEBRAKER M.
STUCKEY P.
SUBRAMANIAN S.
TAMMINEN M.
TAMMINEN M.
THEODORIDIS Y.
TROPF H.
Volker Gaede
WHITE M.
WIDMAYER P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Data Management for Dynamic Multimedia Analytics and Retrieval

Author: Gasser Ralph Marc Philipp
Publication venue
Publication date: 01/01/2023
Field of study

Multimedia data in its various manifestations poses a unique challenge from a data storage and data management perspective, especially if search, analysis and analytics in large data corpora is considered. The inherently unstructured nature of the data itself and the curse of dimensionality that afflicts the representations we typically work with in its stead are cause for a broad range of issues that require sophisticated solutions at different levels. This has given rise to a huge corpus of research that puts focus on techniques that allow for effective and efficient multimedia search and exploration. Many of these contributions have led to an array of purpose-built, multimedia search systems. However, recent progress in multimedia analytics and interactive multimedia retrieval, has demonstrated that several of the assumptions usually made for such multimedia search workloads do not hold once a session has a human user in the loop. Firstly, many of the required query operations cannot be expressed by mere similarity search and since the concrete requirement cannot always be anticipated, one needs a flexible and adaptable data management and query framework. Secondly, the widespread notion of staticity of data collections does not hold if one considers analytics workloads, whose purpose is to produce and store new insights and information. And finally, it is impossible even for an expert user to specify exactly how a data management system should produce and arrive at the desired outcomes of the potentially many different queries. Guided by these shortcomings and motivated by the fact that similar questions have once been answered for structured data in classical database research, this Thesis presents three contributions that seek to mitigate the aforementioned issues. We present a query model that generalises the notion of proximity-based query operations and formalises the connection between those queries and high-dimensional indexing. We complement this by a cost-model that makes the often implicit trade-off between query execution speed and results quality transparent to the system and the user. And we describe a model for the transactional and durable maintenance of high-dimensional index structures. All contributions are implemented in the open-source multimedia database system Cottontail DB, on top of which we present an evaluation that demonstrates the effectiveness of the proposed models. We conclude by discussing avenues for future research in the quest for converging the fields of databases on the one hand and (interactive) multimedia retrieval and analytics on the other

edoc