88 research outputs found
Local selection of features and its applications to image search and annotation
In multimedia applications, direct representations of data objects typically involve hundreds or thousands of features. Given a query object, the similarity between the query object and a database object can be computed as the distance between their feature vectors. The neighborhood of the query object consists of those database objects that are close to the query object. The semantic quality of the neighborhood, which can be measured as the proportion of neighboring objects that share the same class label as the query object, is crucial for many applications, such as content-based image retrieval and automated image annotation. However, due to the existence of noisy or irrelevant features, errors introduced into similarity measurements are detrimental to the neighborhood quality of data objects.
One way to alleviate the negative impact of noisy features is to use feature selection techniques in data preprocessing. From the original vector space, feature selection techniques select a subset of features, which can be used subsequently in supervised or unsupervised learning algorithms for better performance. However, their performance on improving the quality of data neighborhoods is rarely evaluated in the literature. In addition, most traditional feature selection techniques are global, in the sense that they compute a single set of features across the entire database. As a consequence, the possibility that the feature importance may vary across different data objects or classes of objects is neglected.
To compute a better neighborhood structure for objects in high-dimensional feature spaces, this dissertation proposes several techniques for selecting features that are important to the local neighborhood of individual objects. These techniques are then applied to image applications such as content-based image retrieval and image label propagation. Firstly, an iterative K-NN graph construction method for image databases is proposed. A local variant of the Laplacian Score is designed for the selection of features for individual images. Noisy features are detected and sparsified iteratively from the original standardized feature vectors. This technique is incorporated into an approximate K-NN graph construction method so as to improve the semantic quality of the graph. Secondly, in a content-based image retrieval system, a generalized version of the Laplacian Score is used to compute different feature subspaces for images in the database. For online search, a query image is ranked in the feature spaces of database images. Those database images for which the query image is ranked highly are selected as the query results. Finally, a supervised method for the local selection of image features is proposed, for refining the similarity graph used in an image label propagation framework. By using only the selected features to compute the edges leading from labeled image nodes to unlabeled image nodes, better annotation accuracy can be achieved.
Experimental results on several datasets are provided in this dissertation, to demonstrate the effectiveness of the proposed techniques for the local selection of features, and for the image applications under consideration
Combinatorial Bayesian Optimization using the Graph Cartesian Product
This paper focuses on Bayesian Optimization (BO) for objectives on
combinatorial search spaces, including ordinal and categorical variables.
Despite the abundance of potential applications of Combinatorial BO, including
chipset configuration search and neural architecture search, only a handful of
methods have been proposed. We introduce COMBO, a new Gaussian Process (GP) BO.
COMBO quantifies "smoothness" of functions on combinatorial search spaces by
utilizing a combinatorial graph. The vertex set of the combinatorial graph
consists of all possible joint assignments of the variables, while edges are
constructed using the graph Cartesian product of the sub-graphs that represent
the individual variables. On this combinatorial graph, we propose an ARD
diffusion kernel with which the GP is able to model high-order interactions
between variables leading to better performance. Moreover, using the Horseshoe
prior for the scale parameter in the ARD diffusion kernel results in an
effective variable selection procedure, making COMBO suitable for high
dimensional problems. Computationally, in COMBO the graph Cartesian product
allows the Graph Fourier Transform calculation to scale linearly instead of
exponentially. We validate COMBO in a wide array of realistic benchmarks,
including weighted maximum satisfiability problems and neural architecture
search. COMBO outperforms consistently the latest state-of-the-art while
maintaining computational and statistical efficiency.Comment: Accepted to NeurIPS 2019, code: https://github.com/QUVA-Lab/COMB
IMPROVING EFFICIENCY AND SCALABILITY IN VISUAL SURVEILLANCE APPLICATIONS
We present four contributions to visual surveillance: (a) an action recognition method based on the characteristics of human motion in image space; (b) a study of the strengths of five regression techniques for monocular pose estimation that highlights the advantages of kernel PLS; (c) a learning-based method for detecting objects carried by humans requiring minimal annotation; (d) an interactive video segmentation system that reduces supervision by using occlusion and long term spatio-temporal structure information.
We propose a representation for human actions that is based solely on motion information and that leverages the characteristics of human movement in the image space. The representation is best suited to visual surveillance settings in which the actions of interest are highly constrained, but also works on more general problems if the actions are ballistic in nature. Our computationally efficient representation achieves good recognition performance on both a commonly used action recognition dataset and on a dataset we collected to simulate a checkout counter.
We study discriminative methods for 3D human pose estimation from single images, which build a map from image features to pose. The main difficulty with these methods is the insufficiency of training data due to the high dimensionality of the pose space. However, real datasets can be augmented with data from character animation software, so the scalability of existing approaches becomes important. We argue that Kernel Partial Least Squares approximates Gaussian Process regression robustly, enabling the use of larger datasets, and we show in experiments that kPLS outperforms two state-of-the-art methods based on GP.
The high variability in the appearance of carried objects suggests using their relation to the human silhouette to detect them. We adopt a generate-and-test approach that produces candidate regions from protrusion, color contrast and occlusion boundary cues and then filters them with a kernel SVM classifier on context features. Our method exceeds state of the art accuracy and has good generalization capability. We also propose a Multiple Instance Learning framework for the classifier that reduces annotation effort by two orders of magnitude while maintaining comparable accuracy.
Finally, we present an interactive video segmentation system that trades off a small amount of segmentation quality for significantly less supervision than necessary in systems in the literature. While applications like video editing could not directly use the output of our system, reasoning about the trajectories of objects in a scene or learning coarse appearance models is still possible. The unsupervised segmentation component at the base of our system effectively employs occlusion boundary cues and achieves competitive results on an unsupervised segmentation dataset. On videos used to evaluate interactive methods, our system requires less interaction time than others, does not rely on appearance information and can extract multiple objects at the same time
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Transformer-based Large Language Models (LLMs) have been applied in diverse
areas such as knowledge bases, human interfaces, and dynamic agents, and
marking a stride towards achieving Artificial General Intelligence (AGI).
However, current LLMs are predominantly pretrained on short text snippets,
which compromises their effectiveness in processing the long-context prompts
that are frequently encountered in practical scenarios. This article offers a
comprehensive survey of the recent advancement in Transformer-based LLM
architectures aimed at enhancing the long-context capabilities of LLMs
throughout the entire model lifecycle, from pre-training through to inference.
We first delineate and analyze the problems of handling long-context input and
output with the current Transformer-based models. We then provide a taxonomy
and the landscape of upgrades on Transformer architecture to solve these
problems. Afterwards, we provide an investigation on wildly used evaluation
necessities tailored for long-context LLMs, including datasets, metrics, and
baseline models, as well as optimization toolkits such as libraries,
frameworks, and compilers to boost the efficacy of LLMs across different stages
in runtime. Finally, we discuss the challenges and potential avenues for future
research. A curated repository of relevant literature, continuously updated, is
available at https://github.com/Strivin0311/long-llms-learning.Comment: 40 pages, 3 figures, 4 table
Selection of cortical dynamics for motor behaviour by the basal ganglia
The basal ganglia and cortex are strongly implicated in the control of motor preparation and execution. Re-entrant loops between these two brain areas are thought to determine the selection of motor repertoires for instrumental action. The nature of neural encoding and processing in the motor cortex as well as the way in which selection by the basal ganglia acts on them is currently debated. The classic view of the motor cortex implementing a direct mapping of information from perception to muscular responses is challenged by proposals viewing it as a set of dynamical systems controlling muscles. Consequently, the common idea that a competition between relatively segregated cortico-striato-nigro-thalamo-cortical channels selects patterns of activity in the motor cortex is no more suf?cient to explain how action selection works. Here, we contribute to develop the dynamical view of the basal ganglia-cortical system by proposing a computational model in which a thalamo-cortical dynamical neural reservoir is modulated by disinhibitory selection of the basal ganglia guided by top-down information, so that it responds with different dynamics to the same bottom-up input. The model shows how different motor trajectories can so be produced by controlling the same set of joint actuators. Furthermore, the model shows how the basal ganglia might modulate cortical dynamics by preserving coarse-grained spatiotemporal information throughout cortico-cortical pathways
Multilevel Combinatorial Optimization Across Quantum Architectures
Emerging quantum processors provide an opportunity to explore new approaches
for solving traditional problems in the post Moore's law supercomputing era.
However, the limited number of qubits makes it infeasible to tackle massive
real-world datasets directly in the near future, leading to new challenges in
utilizing these quantum processors for practical purposes. Hybrid
quantum-classical algorithms that leverage both quantum and classical types of
devices are considered as one of the main strategies to apply quantum computing
to large-scale problems. In this paper, we advocate the use of multilevel
frameworks for combinatorial optimization as a promising general paradigm for
designing hybrid quantum-classical algorithms. In order to demonstrate this
approach, we apply this method to two well-known combinatorial optimization
problems, namely, the Graph Partitioning Problem, and the Community Detection
Problem. We develop hybrid multilevel solvers with quantum local search on
D-Wave's quantum annealer and IBM's gate-model based quantum processor. We
carry out experiments on graphs that are orders of magnitudes larger than the
current quantum hardware size, and we observe results comparable to
state-of-the-art solvers in terms of quality of the solution
- …