1,047 research outputs found
Vision systems with the human in the loop
The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed
Graceful Forgetting II. Data as a Process
Data are rapidly growing in size and importance for society, a trend
motivated by their enabling power. The accumulation of new data, sustained by
progress in technology, leads to a boundless expansion of stored data, in some
cases with an exponential increase in the accrual rate itself. Massive data are
hard to process, transmit, store, and exploit, and it is particularly hard to
keep abreast of the data store as a whole. This paper distinguishes three
phases in the life of data: acquisition, curation, and exploitation. Each
involves a distinct process, that may be separated from the others in time,
with a different set of priorities. The function of the second phase, curation,
is to maximize the future value of the data given limited storage. I argue that
this requires that (a) the data take the form of summary statistics and (b)
these statistics follow an endless process of rescaling. The summary may be
more compact than the original data, but its data structure is more complex and
it requires an on-going computational process that is much more sophisticated
than mere storage. Rescaling results in dimensionality reduction that may be
beneficial for learning, but that must be carefully controlled to preserve
relevance. Rescaling may be tuned based on feedback from usage, with the
proviso that our memory of the past serves the future, the needs of which are
not fully known.Comment: 30 pages, 17 figure
Exploiting Emergence of New Topics via Anamoly Detection: A Survey
Detecting and generating new concepts has attracted much attention in data mining era, nowadays. The emergence of new topics in news data is a big challenge. The problem can be extended as âfinding breaking newsâ. Years ago the emergence of new stories were detected and followed up by domain experts. But manually reading stories and concluding the misbehaviors is a critical and time consuming task. Further mapping these misbehaviors to various stories needs excellent knowledge about the news and old concepts. So automatically modeling breaking news has much interest in data mining. The anomalies in news published in newspapers are the basic clues for concluding the emergence of a new story(s). The anomalies are the keywords or phrases which doesnât match the whole concept of the news. These anomalies then processed and mapped to the stories where these keywords and phrases doesnât behave as anomalies. After mapping these anomalies one can conclude that these mapped topic by anomaly linking can generate a new concept which eventually can be modeled as emerging story. We survey some techniques which can be used to efficiently model the new concept. News Classification, Anomaly Detection, Concept Detection and Generation are some of those techniques which collectively can be the basics of modeling breaking news. We further discussed some data sources which can process and used as input stories or news for modeling emergence of new stories
Object detection and activity recognition in digital image and video libraries
This thesis is a comprehensive study of object-based image and video retrieval, specifically for car and human detection and activity recognition purposes. The thesis focuses on the problem of connecting low level features to high level semantics by developing relational object and activity presentations. With the rapid growth of multimedia information in forms of digital image and video libraries, there is an increasing need for intelligent database management tools. The traditional text based query systems based on manual annotation process are impractical for today\u27s large libraries requiring an efficient information retrieval system. For this purpose, a hierarchical information retrieval system is proposed where shape, color and motion characteristics of objects of interest are captured in compressed and uncompressed domains. The proposed retrieval method provides object detection and activity recognition at different resolution levels from low complexity to low false rates.
The thesis first examines extraction of low level features from images and videos using intensity, color and motion of pixels and blocks. Local consistency based on these features and geometrical characteristics of the regions is used to group object parts. The problem of managing the segmentation process is solved by a new approach that uses object based knowledge in order to group the regions according to a global consistency. A new model-based segmentation algorithm is introduced that uses a feedback from relational representation of the object. The selected unary and binary attributes are further extended for application specific algorithms. Object detection is achieved by matching the relational graphs of objects with the reference model. The major advantages of the algorithm can be summarized as improving the object extraction by reducing the dependence on the low level segmentation process and combining the boundary and region properties.
The thesis then addresses the problem of object detection and activity recognition in compressed domain in order to reduce computational complexity. New algorithms for object detection and activity recognition in JPEG images and MPEG videos are developed. It is shown that significant information can be obtained from the compressed domain in order to connect to high level semantics. Since our aim is to retrieve information from images and videos compressed using standard algorithms such as JPEG and MPEG, our approach differentiates from previous compressed domain object detection techniques where the compression algorithms are governed by characteristics of object of interest to be retrieved. An algorithm is developed using the principal component analysis of MPEG motion vectors to detect the human activities; namely, walking, running, and kicking. Object detection in JPEG compressed still images and MPEG I frames is achieved by using DC-DCT coefficients of the luminance and chrominance values in the graph based object detection algorithm. The thesis finally addresses the problem of object detection in lower resolution and monochrome images. Specifically, it is demonstrated that the structural information of human silhouettes can be captured from AC-DCT coefficients
Recommended from our members
Learning to See with Minimal Human Supervision
Deep learning has significantly advanced computer vision in the past decade, paving the way for practical applications such as facial recognition and autonomous driving. However, current techniques depend heavily on human supervision, limiting their broader deployment. This dissertation tackles this problem by introducing algorithms and theories to minimize human supervision in three key areas: data, annotations, and neural network architectures, in the context of various visual understanding tasks such as object detection, image restoration, and 3D generation.
First, we present self-supervised learning algorithms to handle in-the-wild images and videos that traditionally require time-consuming manual curation and labeling. We demonstrate that when a deep network is trained to be invariant to geometric and photometric transformations, representations from its intermediate layers are highly predictive of object semantic parts such as eyes and noses. This insight offers a simple unsupervised learning framework that significantly improves the efficiency and accuracy of few-shot landmark prediction and matching. We then present a technique for learning single-view 3D object pose estimation models by utilizing in-the-wild videos where objects turn (e.g., cars in roundabouts). This technique achieves competitive performance with respect to existing state-of-the-art without requiring any manual labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur, and illumination changes that serve as a benchmark for 3D pose estimation.
Second, we address variations in labeling styles across different annotators, which leads to a type of noisy label referred to as heterogeneous label. This variability in human annotation can cause subpar performance during both the training and testing phases. To mitigate this, we have developed a framework that models the labeling styles of individual annotators, reducing the impact of human annotation variations and enhancing the performance of standard object detection models. We have also applied this framework to analyze ecological data, which are often collected opportunistically across different case studies without consistent annotation guidelines. Through this application, we have obtained several insightful observations into large-scale bird migration behaviors and their relationship to climate change.
Our next study explores the challenges of designing neural networks, an area that lacks a comprehensive theoretical understanding. By linking deep neural networks with Gaussian processes, we propose a novel Bayesian interpretation of the deep image prior, which parameterizes a natural image as the output of a convolutional network with random parameters and random input. This approach offers valuable insights to optimize the design of neural networks for various image restoration tasks.
Lastly, we introduce several machine-learning techniques to reconstruct and edit 3D shapes from 2D images with minimal human effort. We first present a generic multi-modal generative model that bridges 2D images and 3D shapes via a shared latent space, and demonstrate its applications on versatile 3D shape generation and manipulation tasks. Additionally, we develop a framework for joint estimation of 3D neural scene representation and camera poses. This approach outperforms prior works and allows us to operate in the general SE(3) camera pose setting, unlike the baselines. The results also indicate this method can be complementary to classical structure-from-motion (SfM) pipelines as it compares favorably to SfM on low-texture and low-resolution images
Continual Learning in Practice
This paper describes a reference architecture for self-maintaining systems
that can learn continually, as data arrives. In environments where data
evolves, we need architectures that manage Machine Learning (ML) models in
production, adapt to shifting data distributions, cope with outliers, retrain
when necessary, and adapt to new tasks. This represents continual AutoML or
Automatically Adaptive Machine Learning. We describe the challenges and
proposes a reference architecture.Comment: Presented at the NeurIPS 2018 workshop on Continual Learning
https://sites.google.com/view/continual2018/hom
The theory of multiple measurements techniques in distributed parameter systems
A comprehensive theory of multiple measurements for the optimum on-line state estimation and parameter identification in a class of noisy, dynamic distributed systems, is developed in this study. Often in practical monitoring and control problems, accurate measurements of a critical variable are not available in a desired form or at a desired sampling rate. Rather, noisy independent measurements of related forms of the variable may be available at different sampling rates. Multiple measurements theory thus involves the optimum weighting and combination of different types of available measurements. One of the contributions of this work is the development of a unique measurement projection method by which off-line measurements may be optimally utilized for on-line estimation and control.
The analysis of distributed systems often requires the establishment of monitoring stations. Another contribution of this study is the development of a measurement strategy, based on statistical experimental design techniques, for the optimum spatial monitoring stations in a class of distributed systems.
By incorporating in the optimization criterion, terms representing the realistic costs of making observations, an algorithm is developed for an estimator indicator whose values dictate an observation strategy for the optimum number and temporal intervals of observations. This, along with the optimum measurement stations thus provides a comprehensive monitoring policy on which the estimation and control of a distributed system may be based.
By employing the measurement projection scheme and the monitoring policy, algorithms are further developed for Kalmantype distributed filters for the estimation of the state profiles based on all available on-line and off-line measurements.
In the interest of a realistic engineering application, the developments in this study are based on a specific class of distributed systems representable by the mass transport models in environmental pollution systems. However, the techniques developed are equally applicable to a broader class of systems, including process control, where measurements may be characterized by noisy on-line instrumentation and off-line empirical laboratory tests.
Although pertinent field data were not available for the research, the multiple measurements techniques developed were applied to several simulated numerical examples that do represent typical engineering problems. The results obtained demonstrate the consistent superiority of the techniques over existing estimation methods. Methods by which the results of this work may be integrated into real engineering problems are also discussed
- âŚ