32 research outputs found

    Second generation sparse models

    Get PDF
    Sparse data models, where data is assumed to be well represented as a linear combination of a few elements from a learned dictionary, have gained considerable attention in recent years, and their use has led to state-of-the-art results in many applications. The success of these models is largely attributed to two critical features: the use of sparsity as a robust mechanism for regularizing the linear coefficients that represent the data, and the flexibility provided by overcomplete dictionaries that are learned from the data. These features are controlled by two critical hyper-parameters: the desired sparsity of the coefficients, and the size of the dictionaries to be learned. However, lacking theoretical guidelines for selecting these critical parameters, applications based on sparse models often require hand-tuning and cross-validation to select them, for each application, and each data set. This can be both inefficient and ineffective. On the other hand, there are multiple scenarios in which imposing additional constraints to the produced representations, including the sparse codes and the dictionary itself, can result in further improvements. This thesis is about improving and/or extending current sparse models by addressing the two issues discussed above, providing the elements for a new generation of more powerful and flexible sparse models. First, we seek to gain a better understanding of sparse models as data modeling tools, so that critical parameters can be selected automatically, efficiently, and in a principled way. Secondly, we explore new sparse modeling formulations for effectively exploiting the prior information present in different scenarios. In order to achieve these goals, we combine ideas and tools from information theory, statistics, machine learning, and optimization theory. The theoretical contributions are complemented with applications in audio, image and video processing

    A Comparison Study of Saliency Models for Fixation Prediction on Infants and Adults

    Get PDF
    Various saliency models have been developed over the years. The performance of saliency models is typically evaluated based on databases of experimentally recorded adult eye fixations. Although studies on infant gaze patterns have attracted much attention recently, saliency based models have not been widely applied for prediction of infant gaze patterns. In this study, we conduct a comprehensive comparison study of eight state-ofthe- art saliency models on predictions of experimentally captured fixations from infants and adults. Seven evaluation metrics are used to evaluate and compare the performance of saliency models. The results demonstrate a consistent performance of saliency models predicting adult fixations over infant fixations in terms of overlap, center fitting, intersection, information loss of approximation, and spatial distance between the distributions of saliency map and fixation map. In saliency and baselines models performance ranking, the results show that GBVS and Itti models are among the top three contenders, infants and adults have bias toward the centers of images, and all models and the center baseline model outperformed the chance baseline model

    Learning effective binary representation with deep hashing technique for large-scale multimedia similarity search

    Get PDF
    The explosive growth of multimedia data in modern times inspires the research of performing an efficient large-scale multimedia similarity search in the existing information retrieval systems. In the past decades, the hashing-based nearest neighbor search methods draw extensive attention in this research field. By representing the original data with compact hash code, it enables the efficient similarity retrieval by only conducting bitwise operation when computing the Hamming distance. Moreover, less memory space is required to process and store the massive amounts of features for the search engines owing to the nature of compact binary code. These advantages make hashing a competitive option in large-scale visual-related retrieval tasks. Motivated by the previous dedicated works, this thesis focuses on learning compact binary representation via hashing techniques for the large-scale multimedia similarity search tasks. Particularly, several novel frameworks are proposed for popular hashing-based applications like a local binary descriptor for patch-level matching (Chapter 3), video-to-video retrieval (Chapter 4) and cross-modality retrieval (Chapter 5). This thesis starts by addressing the problem of learning local binary descriptor for better patch/image matching performance. To this end, we propose a novel local descriptor termed Unsupervised Deep Binary Descriptor (UDBD) for the patch-level matching tasks, which learns the transformation invariant binary descriptor via embedding the original visual data and their transformed sets into a common Hamming space. By imposing a l2,1-norm regularizer on the objective function, the learned binary descriptor gains robustness against noises. Moreover, a weak bit scheme is applied to address the ambiguous matching in the local binary descriptor, where the best match is determined for each query by comparing a series of weak bits between the query instance and the candidates, thus improving the matching performance. Furthermore, Unsupervised Deep Video Hashing (UDVH) is proposed to facilitate large-scale video-to-video retrieval. To tackle the imbalanced distribution issue in the video feature, balanced rotation is developed to identify a proper projection matrix such that the information of each dimension can be balanced in the fixed-bit quantization, thus improving the retrieval performance dramatically with better code quality. To provide comprehensive insights on the proposed rotation, two different video feature learning structures: stacked LSTM units (UDVH-LSTM) and Temporal Segment Network (UDVH-TSN) are presented in Chapter 4. Lastly, we extend the research topic from single-modality to cross-modality retrieval, where Self-Supervised Deep Multimodal Hashing (SSDMH) based on matrix factorization is proposed to learn unified binary code for different modalities directly without the need for relaxation. By minimizing graph regularization loss, it is prone to produce discriminative hash code via preserving the original data structure. Moreover, Binary Gradient Descent (BGD) accelerates the discrete optimization against the bit-by-bit fashion. Besides, an unsupervised version termed Unsupervised Deep Cross-Modal Hashing (UDCMH) is proposed to tackle the large-scale cross-modality retrieval when prior knowledge is unavailable

    The Telecommunications and Data Acquisition Report

    Get PDF
    Archival reports on developments in programs managed by the Jet Propulsion Laboratory's (JPL) Office of Telecommunications and Data Acquisition (TDA) are given. Space communications, radio navigation, radio science, and ground-based radio and radar astronomy, activities of the Deep Space Network (DSN) and its associated Ground Communications Facility (GCF) in planning, supporting research and technology, implementation, and operations are reported. Also included is TDA-funded activity at JPL on data and information systems and reimbursable Deep Space Network (DSN) work performed for other space agencies through NASA

    Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey

    Full text link
    Oriented object detection is one of the most fundamental and challenging tasks in remote sensing, aiming at locating the oriented objects of numerous predefined object categories. Recently, deep learning based methods have achieved remarkable performance in detecting oriented objects in optical remote sensing imagery. However, a thorough review of the literature in remote sensing has not yet emerged. Therefore, we give a comprehensive survey of recent advances and cover many aspects of oriented object detection, including problem definition, commonly used datasets, evaluation protocols, detection frameworks, oriented object representations, and feature representations. Besides, the state-of-the-art methods are analyzed and discussed. We finally discuss future research directions to put forward some useful research guidance. We believe that this survey shall be valuable to researchers across academia and industr

    Multimedia

    Get PDF
    The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications

    Semantics-Empowered Communication: A Tutorial-cum-Survey

    Full text link
    Along with the springing up of the semantics-empowered communication (SemCom) research, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed technical tutorial. Specifically, we start by reviewing the literature and answering the "what" and "why" questions in semantic transmissions. Afterwards, we present the ecosystems of SemCom, including history, theories, metrics, datasets and toolkits, on top of which the taxonomy for research directions is presented. Furthermore, we propose to categorize the critical enabling techniques by explicit and implicit reasoning-based methods, and elaborate on how they evolve and contribute to modern content & channel semantics-empowered communications. Besides reviewing and summarizing the latest efforts in SemCom, we discuss the relations with other communication levels (e.g., conventional communications) from a holistic and unified viewpoint. Subsequently, in order to facilitate future developments and industrial applications, we also highlight advanced practical techniques for boosting semantic accuracy, robustness, and large-scale scalability, just to mention a few. Finally, we discuss the technical challenges that shed light on future research opportunities.Comment: Submitted to an IEEE journal. Copyright might be transferred without further notic

    Wave Front Sensing and Correction Using Spatial Modulation and Digitally Enhanced Heterodyne Interferometry

    Get PDF
    This thesis is about light. Specifically it explores a new way sensing the spatial distribution of amplitude and phase across the wavefront of a propagating laser. It uses spatial light modulators to tag spatially distinct regions of the beam, a single diode to collect the resulting light and digitally enhanced heterodyne interferometry to decode the phase and amplitude information across the wavefront. It also demonstrates how using these methods can be used to maximise the transmission of light through a cavity and shows how minor aberrations in the beam can be corrected in real time. Finally it demonstrate the preferential transmission of higher order modes. Wavefront sensing is becoming increasingly important as the demands on modern interferometers increase. Land based systems such as the Laser Interferometer Gravitational-Wave Observatory (LIGO) use it to maximise the amount of power in the arm cavities during operation and reduce noise, while space based missions such as the Laser Interferometer Space Antenna (LISA) will use it to align distant partner satellites and ensure that the maximum amount of signal is exchanged. Conventionally wavefront sensing is accomplished using either Hartmann Sensors or multi-element diodes. These are well proven and very effective techniques but bring with them a number of well understood limitations. Critically, while they can map a wavefront in detail, they are strictly sensors and can do nothing to correct it. Our new technique is based on a single-element photo-diode and the spatial modulation of the local oscillator beam. We encode orthogonal codes spatially onto this light and use these to separate the phases and amplitudes of different parts of the signal beam in post processing. This technique shifts complexity from the optical hardware into deterministic digital signal processing. Notably, the use of a single analogue channel (photo-diode, connections and analogue to digital converter) avoids some low-frequency error sources. The technique can also sense the wavefront phase at many points, limited only by the number of actuators on the spatial light modulator in contrast to the standard 4 points from a quadrant photo-diode. For ground-based systems, our technique could be used to identify and eliminate higher-order modes, while, for space-based systems, it provides a measure of wavefront tilt which is less susceptible to low frequency noise. In the future it may be possible to couple the technique with an artificial intelligence engine to automate more of the beam alignment process in arrangements involving multiple cavities, preferentially select (or reject) specific higher order modes and start to reduce the burgeoning requirements for human control of these complex instruments

    Online summarization of dynamic graphs using subjective interestingness for sequential data

    Get PDF
    Algorithms and the Foundations of Software technolog

    Learning Discriminative Feature Representations for Visual Categorization

    Get PDF
    Learning discriminative feature representations has attracted a great deal of attention due to its potential value and wide usage in a variety of areas, such as image/video recognition and retrieval, human activities analysis, intelligent surveillance and human-computer interaction. In this thesis we first introduce a new boosted key-frame selection scheme for action recognition. Specifically, we propose to select a subset of key poses for the representation of each action via AdaBoost and a new classifier, namely WLNBNN, is then developed for final classification. The experimental results of the proposed method are 0.6% - 13.2% better than previous work. After that, a domain-adaptive learning approach based on multiobjective genetic programming (MOGP) has been developed for image classification. In this method, a set of primitive 2-D operators are randomly combined to construct feature descriptors through the MOGP evolving and then evaluated by two objective fitness criteria, i.e., the classification error and the tree complexity. Later, the (near-)optimal feature descriptor can be obtained. The proposed approach can achieve 0.9% ∼ 25.9% better performance compared with state-of-the-art methods. Moreover, effective dimensionality reduction algorithms have also been widely used for obtaining better representations. In this thesis, we have proposed a novel linear unsupervised algorithm, termed Discriminative Partition Sparsity Analysis (DPSA), explicitly considering different probabilistic distributions that exist over the data points, simultaneously preserving the natural locality relationship among the data. All these above methods have been systematically evaluated on several public datasets, showing their accurate and robust performance (0.44% - 6.69% better than the previous) for action and image categorization. Targeting efficient image classification , we also introduce a novel unsupervised framework termed evolutionary compact embedding (ECE) which can automatically learn the task-specific binary hash codes. It is regarded as an optimization algorithm which combines the genetic programming (GP) and a boosting trick. The experimental results manifest ECE significantly outperform others by 1.58% - 2.19% for classification tasks. In addition, a supervised framework, bilinear local feature hashing (BLFH), has also been proposed to learn highly discriminative binary codes on the local descriptors for large-scale image similarity search. We address it as a nonconvex optimization problem to seek orthogonal projection matrices for hashing, which can successfully preserve the pairwise similarity between different local features and simultaneously take image-to-class (I2C) distances into consideration. BLFH produces outstanding results (0.017% - 0.149% better) compared to the state-of-the-art hashing techniques
    corecore