325,014 research outputs found
Topic-based mixture language modelling
This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling.
A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (latent semantic analysis). Test set perplexity results using the British National Corpus indicate that the approach can improve the potential of statistical language modelling. Using an adaptive procedure, the conventional model may be tuned to track text data with a slight increase in computational cost
Bidirectional-Convolutional LSTM Based Spectral-Spatial Feature Learning for Hyperspectral Image Classification
This paper proposes a novel deep learning framework named
bidirectional-convolutional long short term memory (Bi-CLSTM) network to
automatically learn the spectral-spatial feature from hyperspectral images
(HSIs). In the network, the issue of spectral feature extraction is considered
as a sequence learning problem, and a recurrent connection operator across the
spectral domain is used to address it. Meanwhile, inspired from the widely used
convolutional neural network (CNN), a convolution operator across the spatial
domain is incorporated into the network to extract the spatial feature.
Besides, to sufficiently capture the spectral information, a bidirectional
recurrent connection is proposed. In the classification phase, the learned
features are concatenated into a vector and fed to a softmax classifier via a
fully-connected operator. To validate the effectiveness of the proposed
Bi-CLSTM framework, we compare it with several state-of-the-art methods,
including the CNN framework, on three widely used HSIs. The obtained results
show that Bi-CLSTM can improve the classification performance as compared to
other methods
Recognition and reconstruction of coherent energy with application to deep seismic reflection data
Reflections in deep seismic reflection data tend to be
visible on only a limited number of traces in a common
midpoint gather. To prevent stack degeneration,
any noncoherent reflection energy has to be removed.
In this paper, a standard classification technique in
remote sensing is presented to enhance data quality. It
consists of a recognition technique to detect and extract
coherent energy in both common shot gathers and fi-
nal stacks. This technique uses the statistics of a picked
seismic phase to obtain the likelihood distribution of its
presence. Multiplication of this likelihood distribution
with the original data results in a âcleaned upâ section.
Application of the technique to data from a deep seismic
reflection experiment enhanced the visibility of all
reflectors considerably.
Because the recognition technique cannot produce an
estimate of âmissingâ data, it is extended with a reconstruction
method. Two methods are proposed: application
of semblance weighted local slant stacks after recognition,
and direct recognition in the linear tau-p domain.
In both cases, the power of the stacking process to increase the signal-to-noise ratio is combined with the direct selection of only specific seismic phases. The joint
application of recognition and reconstruction resulted in
data images which showed reflectors more clearly than
application of a single technique
Advanced correlation-based character recognition applied to the Archimedes Palimpsest
The Archimedes Palimpsest is a manuscript containing the partial text of seven treatises by Archimedes that were copied onto parchment and bound in the tenth-century AD. This work is aimed at providing tools that allow scholars of ancient Greek mathematics to retrieve as much information as possible from images of the remaining degraded text. Acorrelation pattern recognition (CPR) system has been developed to recognize distorted versions of Greek characters in problematic regions of the palimpsest imagery, which have been obscured by damage from mold and fire, overtext, and natural aging. Feature vectors for each class of characters are constructed using a series of spatial correlation algorithms and corresponding performance metrics. Principal components analysis (PCA) is employed prior to classification to remove features corresponding to filtering schemes that performed poorly for the spatial characteristics of the selected region-of-interest. A probability is then assigned to each class, forming a character probability distribution based on relative distances from the class feature vectors to the ROI feature vector in principal component (PC) space. However, the current CPR system does not produce a single classification decision, as is common in most target detection problems, but instead has been designed to provide intermediate results that allow the user to apply his or her own decisions (or evidence) to arrive at a conclusion. To achieve this result, a probabilistic network has been incorporated into the recognition system. A probabilistic network represents a method for modeling the uncertainty in a system, and for this application, it allows information from the existing iv partial transcription and contextual knowledge from the user to be an integral part of the decision-making process. The CPR system was designed to provide a framework for future research in the area of spatial pattern recognition by accommodating a broad range of applications and the development of new filtering methods. For example, during preliminary testing, the CPR system was used to confirm the publication date of a fifteenth-century Hebrew colophon, and demonstrated success in the detection of registration markers in three-dimensional MRI breast imaging. In addition, a new correlation algorithm that exploits the benefits of linear discriminant analysis (LDA) and the inherent shift invariance of spatial correlation has been derived, implemented, and tested. Results show that this composite filtering method provides a high level of class discrimination while maintaining tolerance to withinclass distortions. With the integration of this algorithm into the existing filter library, this work completes each stage of a cyclic workflow using the developed CPR system, and provides the necessary tools for continued experimentation
Damage and repair classification in reinforced concrete beams using frequency domain data
This research aims at developing a new vibration-based damage classification technique that can efficiently be applied to a real-time large data. Statistical pattern recognition paradigm is relevant to perform a reliable site-location damage diagnosis system. By adopting such paradigm, the finite element and other inverse models with their intensive computations, corrections and inherent inaccuracies can be avoided. In this research, a two-stage combination between principal component analysis and Karhunen-Loéve transformation (also known as canonical correlation analysis) was proposed as a statistical-based damage classification technique. Vibration measurements from frequency domain were tested as possible damage-sensitive features. The performance of the proposed system was tested and verified on real vibration measurements collected from five laboratory-scale reinforced concrete beams modelled with various ranges of defects. The results of the system helped in distinguishing between normal and damaged patterns in structural vibration data. Most importantly, the system further dissected reasonably each main damage group into subgroups according to their severity of damage. Its efficiency was conclusively proved on data from both frequency response functions and response-only functions. The outcomes of this two-stage system showed a realistic detection and classification and outperform results from the principal component analysis-only. The success of this classification model is substantially tenable because the observed clusters come from well-controlled and known state conditions
- âŠ