140 research outputs found

    Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery

    Full text link
    PCA is one of the most widely used dimension reduction techniques. A related easier problem is "subspace learning" or "subspace estimation". Given relatively clean data, both are easily solved via singular value decomposition (SVD). The problem of subspace learning or PCA in the presence of outliers is called robust subspace learning or robust PCA (RPCA). For long data sequences, if one tries to use a single lower dimensional subspace to represent the data, the required subspace dimension may end up being quite large. For such data, a better model is to assume that it lies in a low-dimensional subspace that can change over time, albeit gradually. The problem of tracking such data (and the subspaces) while being robust to outliers is called robust subspace tracking (RST). This article provides a magazine-style overview of the entire field of robust subspace learning and tracking. In particular solutions for three problems are discussed in detail: RPCA via sparse+low-rank matrix decomposition (S+LR), RST via S+LR, and "robust subspace recovery (RSR)". RSR assumes that an entire data vector is either an outlier or an inlier. The S+LR formulation instead assumes that outliers occur on only a few data vector indices and hence are well modeled as sparse corruptions.Comment: To appear, IEEE Signal Processing Magazine, July 201

    Exploratory search through large video corpora

    Get PDF
    Activity retrieval is a growing field in electrical engineering that specializes in the search and retrieval of relevant activities and events in video corpora. With the affordability and popularity of cameras for government, personal and retail use, the quantity of available video data is rapidly outscaling our ability to reason over it. Towards the end of empowering users to navigate and interact with the contents of these video corpora, we propose a framework for exploratory search that emphasizes activity structure and search space reduction over complex feature representations. Exploratory search is a user driven process wherein a person provides a system with a query describing the activity, event, or object he is interested in finding. Typically, this description takes the implicit form of one or more exemplar videos, but it can also involve an explicit description. The system returns candidate matches, followed by query refinement and iteration. System performance is judged by the run-time of the system and the precision/recall curve of of the query matches returned. Scaling is one of the primary challenges in video search. From vast web-video archives like youtube (1 billion videos and counting) to the 30 million active surveillance cameras shooting an estimated 4 billion hours of footage every week in the United States, trying to find a set of matches can be like looking for a needle in a haystack. Our goal is to create an efficient archival representation of video corpora that can be calculated in real-time as video streams in, and then enables a user to quickly get a set of results that match. First, we design a system for rapidly identifying simple queries in large-scale video corpora. Instead of focusing on feature design, our system focuses on the spatiotemporal relationships between those features as a means of disambiguating an activity of interest from background. We define a semantic feature vocabulary of concepts that are both readily extracted from video and easily understood by an operator. As data streams in, features are hashed to an inverted index and retrieved in constant time after the system is presented with a user's query. We take a zero-shot approach to exploratory search: the user manually assembles vocabulary elements like color, speed, size and type into a graph. Given that information, we perform an initial downsampling of the archived data, and design a novel dynamic programming approach based on genome-sequencing to search for similar patterns. Experimental results indicate that this approach outperforms other methods for detecting activities in surveillance video datasets. Second, we address the problem of representing complex activities that take place over long spans of space and time. Subgraph and graph matching methods have seen limited use in exploratory search because both problems are provably NP-hard. In this work, we render these problems computationally tractable by identifying the maximally discriminative spanning tree (MDST), and using dynamic programming to optimally reduce the archive data based on a custom algorithm for tree-matching in attributed relational graphs. We demonstrate the efficacy of this approach on popular surveillance video datasets in several modalities. Finally, we design an approach for successive search space reduction in subgraph matching problems. Given a query graph and archival data, our algorithm iteratively selects spanning trees from the query graph that optimize the expected search space reduction at each step until the archive converges. We use this approach to efficiently reason over video surveillance datasets, simulated data, as well as large graphs of protein data

    Dagstuhl News January - December 2008

    Get PDF
    "Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

    Reconstructing Graph Diffusion History from a Single Snapshot

    Full text link
    Diffusion on graphs is ubiquitous with numerous high-impact applications. In these applications, complete diffusion histories play an essential role in terms of identifying dynamical patterns, reflecting on precaution actions, and forecasting intervention effects. Despite their importance, complete diffusion histories are rarely available and are highly challenging to reconstruct due to ill-posedness, explosive search space, and scarcity of training data. To date, few methods exist for diffusion history reconstruction. They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters. In this paper, we study an even harder problem, namely reconstructing Diffusion history from A single SnapsHot} (DASH), where we seek to reconstruct the history from only the final snapshot without knowing true diffusion parameters. We start with theoretical analyses that reveal a fundamental limitation of the MLE formulation. We prove: (a) estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation, and (b) the MLE formulation is sensitive to estimation error of diffusion parameters. To overcome the inherent limitation of the MLE formulation, we propose a novel barycenter formulation: finding the barycenter of the posterior distribution of histories, which is provably stable against the estimation error of diffusion parameters. We further develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO) by reducing the problem to estimating posterior expected hitting times via the Metropolis--Hastings Markov chain Monte Carlo method (M--H MCMC) and employing an unsupervised graph neural network to learn an optimal proposal to accelerate the convergence of M--H MCMC. We conduct extensive experiments to demonstrate the efficacy of the proposed method.Comment: Full version of the KDD 2023 paper. Our code is available at https://github.com/q-rz/KDD23-DITT

    A Survey of Self-supervised Learning from Multiple Perspectives: Algorithms, Applications and Future Trends

    Full text link
    Deep supervised learning algorithms generally require large numbers of labeled examples to achieve satisfactory performance. However, collecting and labeling too many examples can be costly and time-consuming. As a subset of unsupervised learning, self-supervised learning (SSL) aims to learn useful features from unlabeled examples without any human-annotated labels. SSL has recently attracted much attention and many related algorithms have been developed. However, there are few comprehensive studies that explain the connections and evolution of different SSL variants. In this paper, we provide a review of various SSL methods from the perspectives of algorithms, applications, three main trends, and open questions. First, the motivations of most SSL algorithms are introduced in detail, and their commonalities and differences are compared. Second, typical applications of SSL in domains such as image processing and computer vision (CV), as well as natural language processing (NLP), are discussed. Finally, the three main trends of SSL and the open research questions are discussed. A collection of useful materials is available at https://github.com/guijiejie/SSL

    Shape Representations Using Nested Descriptors

    Get PDF
    The problem of shape representation is a core problem in computer vision. It can be argued that shape representation is the most central representational problem for computer vision, since unlike texture or color, shape alone can be used for perceptual tasks such as image matching, object detection and object categorization. This dissertation introduces a new shape representation called the nested descriptor. A nested descriptor represents shape both globally and locally by pooling salient scaled and oriented complex gradients in a large nested support set. We show that this nesting property introduces a nested correlation structure that enables a new local distance function called the nesting distance, which provides a provably robust similarity function for image matching. Furthermore, the nesting property suggests an elegant flower like normalization strategy called a log-spiral difference. We show that this normalization enables a compact binary representation and is equivalent to a form a bottom up saliency. This suggests that the nested descriptor representational power is due to representing salient edges, which makes a fundamental connection between the saliency and local feature descriptor literature. In this dissertation, we introduce three examples of shape representation using nested descriptors: nested shape descriptors for imagery, nested motion descriptors for video and nested pooling for activities. We show evaluation results for these representations that demonstrate state-of-the-art performance for image matching, wide baseline stereo and activity recognition tasks

    Spectral Embedding Norm: Looking Deep into the Spectrum of the Graph Laplacian

    Full text link
    The extraction of clusters from a dataset which includes multiple clusters and a significant background component is a non-trivial task of practical importance. In image analysis this manifests for example in anomaly detection and target detection. The traditional spectral clustering algorithm, which relies on the leading KK eigenvectors to detect KK clusters, fails in such cases. In this paper we propose the {\it spectral embedding norm} which sums the squared values of the first II normalized eigenvectors, where II can be significantly larger than KK. We prove that this quantity can be used to separate clusters from the background in unbalanced settings, including extreme cases such as outlier detection. The performance of the algorithm is not sensitive to the choice of II, and we demonstrate its application on synthetic and real-world remote sensing and neuroimaging datasets
    • …
    corecore