85 research outputs found
Quantitative Analysis of Saliency Models
Previous saliency detection research required the reader to evaluate
performance qualitatively, based on renderings of saliency maps on a few
shapes. This qualitative approach meant it was unclear which saliency models
were better, or how well they compared to human perception. This paper provides
a quantitative evaluation framework that addresses this issue. In the first
quantitative analysis of 3D computational saliency models, we evaluate four
computational saliency models and two baseline models against ground-truth
saliency collected in previous work.Comment: 10 page
SonoNet: Real-Time Detection and Localisation of Fetal Standard Scan Planes in Freehand Ultrasound
Identifying and interpreting fetal standard scan planes during 2D ultrasound
mid-pregnancy examinations are highly complex tasks which require years of
training. Apart from guiding the probe to the correct location, it can be
equally difficult for a non-expert to identify relevant structures within the
image. Automatic image processing can provide tools to help experienced as well
as inexperienced operators with these tasks. In this paper, we propose a novel
method based on convolutional neural networks which can automatically detect 13
fetal standard views in freehand 2D ultrasound data as well as provide a
localisation of the fetal structures via a bounding box. An important
contribution is that the network learns to localise the target anatomy using
weak supervision based on image-level labels only. The network architecture is
designed to operate in real-time while providing optimal output for the
localisation task. We present results for real-time annotation, retrospective
frame retrieval from saved videos, and localisation on a very large and
challenging dataset consisting of images and video recordings of full clinical
anomaly screenings. We found that the proposed method achieved an average
F1-score of 0.798 in a realistic classification experiment modelling real-time
detection, and obtained a 90.09% accuracy for retrospective frame retrieval.
Moreover, an accuracy of 77.8% was achieved on the localisation task.Comment: 12 pages, 8 figures, published in IEEE Transactions in Medical
Imagin
The role of saliencey and error propagation in visual object recognition
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 1995.Includes bibliographical references (p. 162-171).by Tao Daniel Alter.Ph.D
Content Recognition and Context Modeling for Document Analysis and Retrieval
The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge.
In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting.
Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification.
Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features.
Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance
Positive Data Clustering based on Generalized Inverted Dirichlet Mixture Model
Recent advances in processing and networking capabilities of computers have caused an accumulation
of immense amounts of multimodal multimedia data (image, text, video). These data
are generally presented as high-dimensional vectors of features. The availability of these highdimensional
data sets has provided the input to a large variety of statistical learning applications
including clustering, classification, feature selection, outlier detection and density estimation. In
this context, a finite mixture offers a formal approach to clustering and a powerful tool to tackle
the problem of data modeling. A mixture model assumes that the data is generated by a set of
parametric probability distributions. The main learning process of a mixture model consists of the
following two parts: parameter estimation and model selection (estimation the number of components).
In addition, other issues may be considered during the learning process of mixture models
such as the: a) feature selection and b) outlier detection. The main objective of this thesis is to
work with different kinds of estimation criteria and to incorporate those challenges into a single
framework.
The first contribution of this thesis is to propose a statistical framework which can tackle the problem
of parameter estimation, model selection, feature selection, and outlier rejection in a unified
model. We propose to use feature saliency and introduce an expectation-maximization (EM) algorithm
for the estimation of the Generalized Inverted Dirichlet (GID) mixture model. By using
the Minimum Message Length (MML), we can identify how much each feature contributes to
our model as well as determine the number of components. The presence of outliers is an added
challenge and is handled by incorporating an auxiliary outlier component, to which we associate a uniform density. Experimental results on synthetic data, as well as real world applications involving
visual scenes and object classification, indicates that the proposed approach was promising,
even though low-dimensional representation of the data was applied. In addition, it showed
the importance of embedding an outlier component to the proposed model. EM learning suffers
from significant drawbacks. In order to overcome those drawbacks, a learning approach using a
Bayesian framework is proposed as our second contribution. This learning is based on the estimation
of the parameters posteriors and by considering the prior knowledge about these parameters.
Calculation of the posterior distribution of each parameter in the model is done by using Markov
chain Monte Carlo (MCMC) simulation methods - namely, the Gibbs sampling and the Metropolis-
Hastings methods. The Bayesian Information Criterion (BIC) was used for model selection. The
proposed model was validated on object classification and forgery detection applications. For the
first two contributions, we developed a finite GID mixture. However, in the third contribution,
we propose an infinite GID mixture model. The proposed model simutaneously tackles the clustering
and feature selection problems. The proposed learning model is based on Gibbs sampling.
The effectiveness of the proposed method is shown using image categorization application. Our
last contribution in this thesis is another fully Bayesian approach for a finite GID mixture learning
model using the Reversible Jump Markov Chain Monte Carlo (RJMCMC) technique. The
proposed algorithm allows for the simultaneously handling of the model selection and parameter estimation for high dimensional data. The merits of this approach are investigated using synthetic
data, and data generated from a challenging namely object detection
- …