32 research outputs found
Second generation sparse models
Sparse data models, where data is assumed to be well represented as a linear combination of a few elements from a learned dictionary, have gained considerable attention in recent years, and their use has led to state-of-the-art results in many applications. The success of these models is largely attributed to two critical features: the use of sparsity as a robust mechanism for regularizing the linear coefficients that represent the data, and the flexibility provided by overcomplete dictionaries that are learned from the data. These features are controlled by two critical hyper-parameters: the desired sparsity of the coefficients, and the size of the dictionaries to be learned. However, lacking theoretical guidelines for selecting these critical parameters, applications based on sparse models often require hand-tuning and cross-validation to select them, for each application, and each data set. This can be both inefficient and ineffective. On the other hand, there are multiple scenarios in which imposing additional constraints to the produced representations, including the sparse codes and the dictionary itself, can result in further improvements. This thesis is about improving and/or extending current sparse models by addressing the two issues discussed above, providing the elements for a new generation of more powerful and flexible sparse models. First, we seek to gain a better understanding of sparse models as data modeling tools, so that critical parameters can be selected automatically, efficiently, and in a principled way. Secondly, we explore new sparse modeling formulations for effectively exploiting the prior information present in different scenarios. In order to achieve these goals, we combine ideas and tools from information theory, statistics, machine learning, and optimization theory. The theoretical contributions are complemented with applications in audio, image and video processing
A Comparison Study of Saliency Models for Fixation Prediction on Infants and Adults
Various saliency models have been developed over the years. The performance of saliency models is typically evaluated based on databases of experimentally recorded adult eye fixations. Although studies on infant gaze patterns have attracted much attention recently, saliency based models have not been widely applied for prediction of infant gaze patterns. In this study, we conduct a comprehensive comparison study of eight state-ofthe- art saliency models on predictions of experimentally captured fixations from infants and adults. Seven evaluation metrics are used to evaluate and compare the performance of saliency models. The results demonstrate a consistent performance of saliency models predicting adult fixations over infant fixations in terms of overlap, center fitting, intersection, information loss of approximation, and spatial distance between the distributions of saliency map and fixation map. In saliency and baselines models performance ranking, the results show that GBVS and Itti models are among the top three contenders, infants and adults have bias toward the centers of images, and all models and the center baseline model outperformed the chance baseline model
Learning effective binary representation with deep hashing technique for large-scale multimedia similarity search
The explosive growth of multimedia data in modern times inspires the research of performing an efficient large-scale multimedia similarity search in the existing information retrieval systems. In the past decades, the hashing-based nearest neighbor search methods draw extensive attention in this research field. By representing the original data with compact hash code, it enables the efficient similarity retrieval by only conducting bitwise operation when computing the Hamming distance. Moreover, less memory space is required to process and store the massive amounts of features for the search engines owing to the nature of compact binary code. These advantages make hashing a competitive option in large-scale visual-related retrieval tasks. Motivated by the previous dedicated works, this thesis focuses on learning compact binary representation via hashing techniques for the large-scale multimedia similarity search tasks. Particularly, several novel frameworks are proposed for popular hashing-based applications like a local binary descriptor for patch-level matching (Chapter 3), video-to-video retrieval (Chapter 4) and cross-modality retrieval (Chapter 5). This thesis starts by addressing the problem of learning local binary descriptor for better patch/image matching performance. To this end, we propose a novel local descriptor termed Unsupervised Deep Binary Descriptor (UDBD) for the patch-level matching tasks, which learns the transformation invariant binary descriptor via embedding the original visual data and their transformed sets into a common Hamming space. By imposing a l2,1-norm regularizer on the objective function, the learned binary descriptor gains robustness against noises. Moreover, a weak bit scheme is applied to address the ambiguous matching in the local binary descriptor, where the best match is determined for each query by comparing a series of weak bits between the query instance and the candidates, thus improving the matching performance. Furthermore, Unsupervised Deep Video Hashing (UDVH) is proposed to facilitate large-scale video-to-video retrieval. To tackle the imbalanced distribution issue in the video feature, balanced rotation is developed to identify a proper projection matrix such that the information of each dimension can be balanced in the fixed-bit quantization, thus improving the retrieval performance dramatically with better code quality. To provide comprehensive insights on the proposed rotation, two different video feature learning structures: stacked LSTM units (UDVH-LSTM) and Temporal Segment Network (UDVH-TSN) are presented in Chapter 4. Lastly, we extend the research topic from single-modality to cross-modality retrieval, where Self-Supervised Deep Multimodal Hashing (SSDMH) based on matrix factorization is proposed to learn unified binary code for different modalities directly without the need for relaxation. By minimizing graph regularization loss, it is prone to produce discriminative hash code via preserving the original data structure. Moreover, Binary Gradient Descent (BGD) accelerates the discrete optimization against the bit-by-bit fashion. Besides, an unsupervised version termed Unsupervised Deep Cross-Modal Hashing (UDCMH) is proposed to tackle the large-scale cross-modality retrieval when prior knowledge is unavailable
The Telecommunications and Data Acquisition Report
Archival reports on developments in programs managed by the Jet Propulsion Laboratory's (JPL) Office of Telecommunications and Data Acquisition (TDA) are given. Space communications, radio navigation, radio science, and ground-based radio and radar astronomy, activities of the Deep Space Network (DSN) and its associated Ground Communications Facility (GCF) in planning, supporting research and technology, implementation, and operations are reported. Also included is TDA-funded activity at JPL on data and information systems and reimbursable Deep Space Network (DSN) work performed for other space agencies through NASA
Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey
Oriented object detection is one of the most fundamental and challenging
tasks in remote sensing, aiming at locating the oriented objects of numerous
predefined object categories. Recently, deep learning based methods have
achieved remarkable performance in detecting oriented objects in optical remote
sensing imagery. However, a thorough review of the literature in remote sensing
has not yet emerged. Therefore, we give a comprehensive survey of recent
advances and cover many aspects of oriented object detection, including problem
definition, commonly used datasets, evaluation protocols, detection frameworks,
oriented object representations, and feature representations. Besides, the
state-of-the-art methods are analyzed and discussed. We finally discuss future
research directions to put forward some useful research guidance. We believe
that this survey shall be valuable to researchers across academia and industr
Multimedia
The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications
Semantics-Empowered Communication: A Tutorial-cum-Survey
Along with the springing up of the semantics-empowered communication (SemCom)
research, it is now witnessing an unprecedentedly growing interest towards a
wide range of aspects (e.g., theories, applications, metrics and
implementations) in both academia and industry. In this work, we primarily aim
to provide a comprehensive survey on both the background and research taxonomy,
as well as a detailed technical tutorial. Specifically, we start by reviewing
the literature and answering the "what" and "why" questions in semantic
transmissions. Afterwards, we present the ecosystems of SemCom, including
history, theories, metrics, datasets and toolkits, on top of which the taxonomy
for research directions is presented. Furthermore, we propose to categorize the
critical enabling techniques by explicit and implicit reasoning-based methods,
and elaborate on how they evolve and contribute to modern content & channel
semantics-empowered communications. Besides reviewing and summarizing the
latest efforts in SemCom, we discuss the relations with other communication
levels (e.g., conventional communications) from a holistic and unified
viewpoint. Subsequently, in order to facilitate future developments and
industrial applications, we also highlight advanced practical techniques for
boosting semantic accuracy, robustness, and large-scale scalability, just to
mention a few. Finally, we discuss the technical challenges that shed light on
future research opportunities.Comment: Submitted to an IEEE journal. Copyright might be transferred without
further notic
Wave Front Sensing and Correction Using Spatial Modulation and Digitally Enhanced Heterodyne Interferometry
This thesis is about light. Specifically it explores a new way
sensing the spatial distribution
of amplitude and phase across the wavefront of a propagating
laser. It uses spatial
light modulators to tag spatially distinct regions of the beam, a
single diode to collect
the resulting light and digitally enhanced heterodyne
interferometry to decode the phase
and amplitude information across the wavefront. It also
demonstrates how using these
methods can be used to maximise the transmission of light through
a cavity and shows
how minor aberrations in the beam can be corrected in real time.
Finally it demonstrate
the preferential transmission of higher order modes.
Wavefront sensing is becoming increasingly important as the
demands on modern interferometers
increase. Land based systems such as the Laser Interferometer
Gravitational-Wave
Observatory (LIGO) use it to maximise the amount of power in the
arm cavities during
operation and reduce noise, while space based missions such as
the Laser Interferometer
Space Antenna (LISA) will use it to align distant partner
satellites and ensure that the
maximum amount of signal is exchanged. Conventionally wavefront
sensing is accomplished
using either Hartmann Sensors or multi-element diodes. These are
well proven
and very effective techniques but bring with them a number of
well understood limitations.
Critically, while they can map a wavefront in detail, they are
strictly sensors and
can do nothing to correct it.
Our new technique is based on a single-element photo-diode and
the spatial modulation
of the local oscillator beam. We encode orthogonal codes
spatially onto this light and use
these to separate the phases and amplitudes of different parts of
the signal beam in post
processing. This technique shifts complexity from the optical
hardware into deterministic
digital signal processing. Notably, the use of a single analogue
channel (photo-diode,
connections and analogue to digital converter) avoids some
low-frequency error sources.
The technique can also sense the wavefront phase at many points,
limited only by the
number of actuators on the spatial light modulator in contrast to
the standard 4 points
from a quadrant photo-diode. For ground-based systems, our
technique could be used to
identify and eliminate higher-order modes, while, for space-based
systems, it provides a
measure of wavefront tilt which is less susceptible to low
frequency noise.
In the future it may be possible to couple the technique with an
artificial intelligence
engine to automate more of the beam alignment process in
arrangements involving multiple
cavities, preferentially select (or reject) specific higher order
modes and start to reduce
the burgeoning requirements for human control of these complex
instruments
Online summarization of dynamic graphs using subjective interestingness for sequential data
Algorithms and the Foundations of Software technolog
Learning Discriminative Feature Representations for Visual Categorization
Learning discriminative feature representations has attracted a great deal of attention due to its potential value and wide usage in a variety of areas, such as image/video recognition and retrieval, human activities analysis, intelligent surveillance and human-computer
interaction.
In this thesis we first introduce a new boosted key-frame selection scheme for action recognition. Specifically, we propose to select a subset of key poses for the representation of each action via AdaBoost and a new classifier, namely WLNBNN, is then developed for final classification. The experimental results of the proposed method are 0.6% - 13.2% better than previous work. After that, a domain-adaptive learning approach based on multiobjective genetic programming (MOGP) has been developed for image classification. In this method, a set of primitive 2-D operators are randomly combined to construct feature descriptors through the MOGP evolving and then evaluated by two objective fitness criteria,
i.e., the classification error and the tree complexity. Later, the (near-)optimal feature descriptor can be obtained. The proposed approach can achieve 0.9% ∼ 25.9% better performance compared with state-of-the-art methods. Moreover, effective dimensionality reduction algorithms have also been widely used for obtaining better representations. In this thesis, we have proposed a novel linear unsupervised algorithm, termed Discriminative Partition Sparsity Analysis (DPSA), explicitly considering different probabilistic distributions that exist over the data points, simultaneously preserving the natural locality relationship among the data. All these above methods have been systematically evaluated on several public datasets, showing their accurate and robust performance (0.44% - 6.69% better than the previous) for action and image categorization. Targeting efficient image classification
, we also introduce a novel unsupervised framework termed evolutionary compact embedding (ECE) which can automatically learn the task-specific binary hash codes. It is regarded as an optimization algorithm which combines the genetic programming (GP) and a boosting trick. The experimental results manifest ECE significantly outperform others by 1.58% - 2.19% for classification tasks. In addition, a supervised framework, bilinear local feature hashing (BLFH), has also been proposed to learn highly discriminative binary codes on the local descriptors for large-scale image similarity search. We address it as a nonconvex optimization problem to seek orthogonal projection matrices for hashing, which can successfully preserve the pairwise similarity between different local features and simultaneously take image-to-class (I2C) distances into consideration. BLFH produces outstanding results (0.017% - 0.149% better) compared to the state-of-the-art hashing techniques