66,853 research outputs found

    Topic-based mixture language modelling

    Get PDF
    This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling. A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (latent semantic analysis). Test set perplexity results using the British National Corpus indicate that the approach can improve the potential of statistical language modelling. Using an adaptive procedure, the conventional model may be tuned to track text data with a slight increase in computational cost

    Neural networks and spectra feature selection for retrival of hot gases temperature profiles

    Get PDF
    Proceeding of: International Conference on Computational Intelligence for Modelling, Control and Automation, 2005 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, Vienna, Austria 28-30 Nov. 2005Neural networks appear to be a promising tool to solve the so-called inverse problems focused to obtain a retrieval of certain physical properties related to the radiative transference of energy. In this paper the capability of neural networks to retrieve the temperature profile in a combustion environment is proposed. Temperature profile retrieval will be obtained from the measurement of the spectral distribution of energy radiated by the hot gases (combustion products) at wavelengths corresponding to the infrared region. High spectral resolution is usually needed to gain a certain accuracy in the retrieval process. However, this great amount of information makes mandatory a reduction of the dimensionality of the problem. In this sense a careful selection of wavelengths in the spectrum must be performed. With this purpose principal component analysis technique is used to automatically determine those wavelengths in the spectrum that carry relevant information on temperature distribution. A multilayer perceptron will be trained with the different energies associated to the selected wavelengths. The results presented show that multilayer perceptron combined with principal component analysis is a suitable alternative in this field.Publicad

    Web news classification using neural networks based on PCA

    Get PDF
    In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features (CPBF). The fixed number of regular words from each class will be used as a feature vectors with the reduced features from the PCA. These feature vectors are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the WPCM provides acceptable classification accuracy with the sports news datasets

    A D.C. Programming Approach to the Sparse Generalized Eigenvalue Problem

    Full text link
    In this paper, we consider the sparse eigenvalue problem wherein the goal is to obtain a sparse solution to the generalized eigenvalue problem. We achieve this by constraining the cardinality of the solution to the generalized eigenvalue problem and obtain sparse principal component analysis (PCA), sparse canonical correlation analysis (CCA) and sparse Fisher discriminant analysis (FDA) as special cases. Unlike the 1\ell_1-norm approximation to the cardinality constraint, which previous methods have used in the context of sparse PCA, we propose a tighter approximation that is related to the negative log-likelihood of a Student's t-distribution. The problem is then framed as a d.c. (difference of convex functions) program and is solved as a sequence of convex programs by invoking the majorization-minimization method. The resulting algorithm is proved to exhibit \emph{global convergence} behavior, i.e., for any random initialization, the sequence (subsequence) of iterates generated by the algorithm converges to a stationary point of the d.c. program. The performance of the algorithm is empirically demonstrated on both sparse PCA (finding few relevant genes that explain as much variance as possible in a high-dimensional gene dataset) and sparse CCA (cross-language document retrieval and vocabulary selection for music retrieval) applications.Comment: 40 page

    Spectral high resolution feature selection for retrieval of combustion temperature profiles

    Get PDF
    Proceeding of: 7th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2006 (Burgos, Spain, September 20-23, 2006)The use of high spectral resolution measurements to obtain a retrieval of certain physical properties related with the radiative transfer of energy leads a priori to a better accuracy. But this improvement in accuracy is not easy to achieve due to the great amount of data which makes difficult any treatment over it and it's redundancies. To solve this problem, a pick selection based on principal component analysis has been adopted in order to make the mandatory feature selection over the different channels. In this paper, the capability to retrieve the temperature profile in a combustion environment using neural networks jointly with this spectral high resolution feature selection method is studied.Publicad

    Multivariate texture discrimination using a principal geodesic classifier

    Get PDF
    A new texture discrimination method is presented for classification and retrieval of colored textures represented in the wavelet domain. The interband correlation structure is modeled by multivariate probability models which constitute a Riemannian manifold. The presented method considers the shape of the class on the manifold by determining the principal geodesic of each class. The method, which we call principal geodesic classification, then determines the shortest distance from a test texture to the principal geodesic of each class. We use the Rao geodesic distance (GD) for calculating distances on the manifold. We compare the performance of the proposed method with distance-to-centroid and knearest neighbor classifiers and of the GD with the Euclidean distance. The principal geodesic classifier coupled with the GD yields better results, indicating the usefulness of effectively and concisely quantifying the variability of the classes in the probabilistic feature space

    Semantic distillation: a method for clustering objects by their contextual specificity

    Full text link
    Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism. The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed \textit{semantic distillation} -- strongly inspired from the theory of quantum measurement --, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence, Springer-Verla
    corecore