95,992 research outputs found
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for
the automatic segmentation of speech into topically coherent units. We propose
two methods for combining lexical and prosodic information using hidden Markov
models and decision trees. Lexical information is obtained from a speech
recognizer, and prosodic features are extracted automatically from speech
waveforms. We evaluate our approach on the Broadcast News corpus, using the
DARPA-TDT evaluation metrics. Results show that the prosodic model alone is
competitive with word-based segmentation methods. Furthermore, we achieve a
significant reduction in error by combining the prosodic and word-based
knowledge sources.Comment: 27 pages, 8 figure
Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery
Thanks to recent advances in CNNs, solid improvements have been made in
semantic segmentation of high resolution remote sensing imagery. However, most
of the previous works have not fully taken into account the specific
difficulties that exist in remote sensing tasks. One of such difficulties is
that objects are small and crowded in remote sensing imagery. To tackle with
this challenging task we have proposed a novel architecture called local
feature extraction (LFE) module attached on top of dilated front-end module.
The LFE module is based on our findings that aggressively increasing dilation
factors fails to aggregate local features due to sparsity of the kernel, and
detrimental to small objects. The proposed LFE module solves this problem by
aggregating local features with decreasing dilation factor. We tested our
network on three remote sensing datasets and acquired remarkably good results
for all datasets especially for small objects
GeoSay: A Geometric Saliency for Extracting Buildings in Remote Sensing Images
Automatic extraction of buildings in remote sensing images is an important
but challenging task and finds many applications in different fields such as
urban planning, navigation and so on. This paper addresses the problem of
buildings extraction in very high-spatial-resolution (VHSR) remote sensing (RS)
images, whose spatial resolution is often up to half meters and provides rich
information about buildings. Based on the observation that buildings in VHSR-RS
images are always more distinguishable in geometry than in texture or spectral
domain, this paper proposes a geometric building index (GBI) for accurate
building extraction, by computing the geometric saliency from VHSR-RS images.
More precisely, given an image, the geometric saliency is derived from a
mid-level geometric representations based on meaningful junctions that can
locally describe geometrical structures of images. The resulting GBI is finally
measured by integrating the derived geometric saliency of buildings.
Experiments on three public and commonly used datasets demonstrate that the
proposed GBI achieves the state-of-the-art performance and shows impressive
generalization capability. Additionally, GBI preserves both the exact position
and accurate shape of single buildings compared to existing methods
Computerized Analysis of Magnetic Resonance Images to Study Cerebral Anatomy in Developing Neonates
The study of cerebral anatomy in developing neonates is of great importance for
the understanding of brain development during the early period of life. This
dissertation therefore focuses on three challenges in the modelling of cerebral
anatomy in neonates during brain development. The methods that have been
developed all use Magnetic Resonance Images (MRI) as source data.
To facilitate study of vascular development in the neonatal period, a set of image
analysis algorithms are developed to automatically extract and model cerebral
vessel trees. The whole process consists of cerebral vessel tracking from
automatically placed seed points, vessel tree generation, and vasculature
registration and matching. These algorithms have been tested on clinical Time-of-
Flight (TOF) MR angiographic datasets.
To facilitate study of the neonatal cortex a complete cerebral cortex segmentation
and reconstruction pipeline has been developed. Segmentation of the neonatal
cortex is not effectively done by existing algorithms designed for the adult brain
because the contrast between grey and white matter is reversed. This causes pixels
containing tissue mixtures to be incorrectly labelled by conventional methods. The
neonatal cortical segmentation method that has been developed is based on a novel
expectation-maximization (EM) method with explicit correction for mislabelled
partial volume voxels. Based on the resulting cortical segmentation, an implicit
surface evolution technique is adopted for the reconstruction of the cortex in
neonates. The performance of the method is investigated by performing a detailed
landmark study.
To facilitate study of cortical development, a cortical surface registration algorithm
for aligning the cortical surface is developed. The method first inflates extracted
cortical surfaces and then performs a non-rigid surface registration using free-form
deformations (FFDs) to remove residual alignment. Validation experiments using
data labelled by an expert observer demonstrate that the method can capture local
changes and follow the growth of specific sulcus
Correcting curvature-density effects in the Hamilton-Jacobi skeleton
The Hainilton-Jacobi approach has proven to be a powerful and elegant method for extracting the skeleton of two-dimensional (2-D) shapes. The approach is based on the observation that the normalized flux associated with the inward evolution of the object boundary at nonskeletal points tends to zero as the size of the integration area tends to zero, while the flux is negative at the locations of skeletal points. Nonetheless, the error in calculating the flux on the image lattice is both limited by the pixel resolution and also proportional to the curvature of the boundary evolution front and, hence, unbounded near endpoints. This makes the exact location of endpoints difficult and renders the performance of the skeleton extraction algorithm dependent on a threshold parameter. This problem can be overcome by using interpolation techniques to calculate the flux with subpixel precision. However, here, we develop a method for 2-D skeleton extraction that circumvents the problem by eliminating the curvature contribution to the error. This is done by taking into account variations of density due to boundary curvature. This yields a skeletonization algorithm that gives both better localization and less susceptibility to boundary noise and parameter choice than the Hamilton-Jacobi method
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics
A crucial step in processing speech audio data for information extraction,
topic detection, or browsing/playback is to segment the input into sentence and
topic units. Speech segmentation is challenging, since the cues typically
present for segmenting text (headers, paragraphs, punctuation) are absent in
spoken language. We investigate the use of prosody (information gleaned from
the timing and melody of speech) for these tasks. Using decision tree and
hidden Markov modeling techniques, we combine prosodic cues with word-based
approaches, and evaluate performance on two speech corpora, Broadcast News and
Switchboard. Results show that the prosodic model alone performs on par with,
or better than, word-based statistical language models -- for both true and
automatically recognized words in news speech. The prosodic model achieves
comparable performance with significantly less training data, and requires no
hand-labeling of prosodic events. Across tasks and corpora, we obtain a
significant improvement over word-only models using a probabilistic combination
of prosodic and lexical information. Inspection reveals that the prosodic
models capture language-independent boundary indicators described in the
literature. Finally, cue usage is task and corpus dependent. For example, pause
and pitch features are highly informative for segmenting news speech, whereas
pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2),
Special Issue on Accessing Information in Spoken Audio, September 200
- …