56,120 research outputs found
Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences
Given the lack of word delimiters in written Japanese, word segmentation is
generally considered a crucial first step in processing Japanese texts. Typical
Japanese segmentation algorithms rely either on a lexicon and syntactic
analysis or on pre-segmented data; but these are labor-intensive, and the
lexico-syntactic techniques are vulnerable to the unknown word problem. In
contrast, we introduce a novel, more robust statistical method utilizing
unsegmented training data. Despite its simplicity, the algorithm yields
performance on long kanji sequences comparable to and sometimes surpassing that
of state-of-the-art morphological analyzers over a variety of error metrics.
The algorithm also outperforms another mostly-unsupervised statistical
algorithm previously proposed for Chinese.
Additionally, we present a two-level annotation scheme for Japanese to
incorporate multiple segmentation granularities, and introduce two novel
evaluation metrics, both based on the notion of a compatible bracket, that can
account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin
Quickshift++: Provably Good Initializations for Sample-Based Mean Shift
We provide initial seedings to the Quick Shift clustering algorithm, which
approximate the locally high-density regions of the data. Such seedings act as
more stable and expressive cluster-cores than the singleton modes found by
Quick Shift. We establish statistical consistency guarantees for this
modification. We then show strong clustering performance on real datasets as
well as promising applications to image segmentation.Comment: ICML 2018. Code release: https://github.com/google/quickshif
UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system
This paper describes the UPC-BMIC-VMU participation in the IWSLT 2010 evaluation campaign. The SMT system is a standard phrase-based enriched with novel segmentations. These novel segmentations are computed using statistical measures such as Log-likelihood, T-score, Chi-squared, Dice, Mutual Information or Gravity-Counts. The analysis of translation results allows to divide measures into three groups. First, Log-likelihood, Chi-squared and T-score tend to combine high frequency words and collocation segments are very short. They improve the SMT system by adding new translation units. Second, Mutual Information and Dice tend to combine low frequency words and collocation segments are short. They improve the SMT system by smoothing the translation units. And third, Gravity- Counts tends to combine high and low frequency words and collocation segments are long. However, in this case, the SMT system is not improved. Thus, the road-map for translation system improvement is to introduce new phrases with either low frequency or high frequency words. It is hard to introduce new phrases with low and high frequency words in order to improve translation quality. Experimental results are reported in the Frenchto- English IWSLT 2010 evaluation where our system was ranked 3rd out of nine systems.Postprint (published version
Shape and data-driven texture segmentation using local binary patterns
We propose a shape and data driven texture segmentation method using local binary patterns (LBP) and active contours. In particular, we pass textured images through a new LBP-based filter, which produces non-textured images. In this “filtered” domain each textured region of the original image exhibits a characteristic intensity distribution. In this domain we pose the segmentation problem as an optimization problem in a Bayesian framework. The cost functional contains a data-driven term, as well as a term that brings in information about the shapes of the objects to be segmented. We solve the optimization problem using level set-based active contours. Our experimental results on synthetic and real textures demonstrate the effectiveness of our approach in segmenting challenging textures as well as its robustness to missing data and occlusions
Mesh-to-raster based non-rigid registration of multi-modal images
Region of interest (ROI) alignment in medical images plays a crucial role in
diagnostics, procedure planning, treatment, and follow-up. Frequently, a model
is represented as triangulated mesh while the patient data is provided from CAT
scanners as pixel or voxel data. Previously, we presented a 2D method for
curve-to-pixel registration. This paper contributes (i) a general
mesh-to-raster (M2R) framework to register ROIs in multi-modal images; (ii) a
3D surface-to-voxel application, and (iii) a comprehensive quantitative
evaluation in 2D using ground truth provided by the simultaneous truth and
performance level estimation (STAPLE) method. The registration is formulated as
a minimization problem where the objective consists of a data term, which
involves the signed distance function of the ROI from the reference image, and
a higher order elastic regularizer for the deformation. The evaluation is based
on quantitative light-induced fluoroscopy (QLF) and digital photography (DP) of
decalcified teeth. STAPLE is computed on 150 image pairs from 32 subjects, each
showing one corresponding tooth in both modalities. The ROI in each image is
manually marked by three experts (900 curves in total). In the QLF-DP setting,
our approach significantly outperforms the mutual information-based
registration algorithm implemented with the Insight Segmentation and
Registration Toolkit (ITK) and Elastix
Rapid Online Analysis of Local Feature Detectors and Their Complementarity
A vision system that can assess its own performance and take appropriate actions online to maximize its effectiveness would be a step towards achieving the long-cherished goal of imitating humans. This paper proposes a method for performing an online performance analysis of local feature detectors, the primary stage of many practical vision systems. It advocates the spatial distribution of local image features as a good performance indicator and presents a metric that can be calculated rapidly, concurs with human visual assessments and is complementary to existing offline measures such as repeatability. The metric is shown to provide a measure of complementarity for combinations of detectors, correctly reflecting the underlying principles of individual detectors. Qualitative results on well-established datasets for several state-of-the-art detectors are presented based on the proposed measure. Using a hypothesis testing approach and a newly-acquired, larger image database, statistically-significant performance differences are identified. Different detector pairs and triplets are examined quantitatively and the results provide a useful guideline for combining detectors in applications that require a reasonable spatial distribution of image features. A principled framework for combining feature detectors in these applications is also presented. Timing results reveal the potential of the metric for online applications. © 2013 by the authors; licensee MDPI, Basel, Switzerland
Prostate MR image segmentation using 3D active appearance models
This paper presents a method for automatic segmentation of the prostate from transversal T2-weighted images based on 3D Active Appearance Models (AAM). The algorithm consist of two stages. Firstly, Shape Context based non-rigid surface registration of the manual segmented images is used to obtain the point correspondence between the given training cases. Subsequently, an AAM is used to segment the prostate on 50 training cases. The method is evaluated using a 5-fold cross validation over 5 repetitions. The mean Dice similarity coefficient and 95% Hausdorff distance are 0.78 and 7.32 mm respectively
Sequential pattern formation governed by signaling gradients
Rhythmic and sequential segmentation of the embryonic body plan is a vital
developmental patterning process in all vertebrate species. However, a
theoretical framework capturing the emergence of dynamic patterns of gene
expression from the interplay of cell oscillations with tissue elongation and
shortening and with signaling gradients, is still missing. Here we show that a
set of coupled genetic oscillators in an elongating tissue that is regulated by
diffusing and advected signaling molecules can account for segmentation as a
self-organized patterning process. This system can form a finite number of
segments and the dynamics of segmentation and the total number of segments
formed depend strongly on kinetic parameters describing tissue elongation and
signaling molecules. The model accounts for existing experimental perturbations
to signaling gradients, and makes testable predictions about novel
perturbations. The variety of different patterns formed in our model can
account for the variability of segmentation between different animal species.Comment: 12 pages, 5 figure
A Replica Inference Approach to Unsupervised Multi-Scale Image Segmentation
We apply a replica inference based Potts model method to unsupervised image
segmentation on multiple scales. This approach was inspired by the statistical
mechanics problem of "community detection" and its phase diagram. Specifically,
the problem is cast as identifying tightly bound clusters ("communities" or
"solutes") against a background or "solvent". Within our multiresolution
approach, we compute information theory based correlations among multiple
solutions ("replicas") of the same graph over a range of resolutions.
Significant multiresolution structures are identified by replica correlations
as manifest in information theory overlaps. With the aid of these correlations
as well as thermodynamic measures, the phase diagram of the corresponding Potts
model is analyzed both at zero and finite temperatures. Optimal parameters
corresponding to a sensible unsupervised segmentation correspond to the "easy
phase" of the Potts model. Our algorithm is fast and shown to be at least as
accurate as the best algorithms to date and to be especially suited to the
detection of camouflaged images.Comment: 26 pages, 22 figure
- …