30 research outputs found

    Novelty Detection by Latent Semantic Indexing

    Get PDF
    As a new topic in text mining, novelty detection is a natural extension of information retrieval systems, or search engines. Aiming at refining raw search results by filtering out old news and saving only the novel messages, it saves modern people from the nightmare of information overload. One of the difficulties in novelty detection is the inherent ambiguity of language, which is the carrier of information. Among the sources of ambiguity, synonymy proves to be a notable factor. To address this issue, previous studies mainly employed WordNet, a lexical database which can be perceived as a thesaurus. Rather than borrowing a dictionary, we proposed a statistical approach employing Latent Semantic Indexing (LSI) to learn semantic relationship automatically with the help of language resources. To apply LSI which involves matrix factorization, an immediate problem is that the dataset in novelty detection is dynamic and changing constantly. As an imitation of real-world scenario, texts are ranked in chronological order and examined one by one. Each text is only compared with those having appeared earlier, while later ones remain unknown. As a result, the data matrix starts as a one-row vector representing the first report, and has a new row added at the bottom every time we read a new document. Such a changing dataset makes it hard to employ matrix methods directly. Although LSI has long been acknowledged as an effective text mining method when considering semantic structure, it has never been used in novelty detection, nor have other statistical treatments. We tried to change this situation by introducing external text source to build the latent semantic space, onto which the incoming news vectors were projected. We used the Reuters-21578 dataset and the TREC data as sources of latent semantic information. Topics were divided into years and types in order to take the differences between them into account. Results showed that LSI, though very effective in traditional information retrieval tasks, had only a slight improvement to the performances for some data types. The extent of improvement depended on the similarity between news data and external information. A probing into the co-occurrence matrix attributed such a limited performance to the unique features of microblogs. Their short sentence lengths and restricted dictionary made it very hard to recover and exploit latent semantic information via traditional data structure

    Interfacial instability and spray heat transfer problems of two phase flow.

    Get PDF
    This thesis describes detailed investigations of two different problems in gas-liquid two-phase flow, namely, a study of interfacial stability in a partially filled cylinder subjected to vertical oscillations and a study of heat and mass transfer from hot spray droplets injected into an closed vessel. The interfacial instability study considers experimental data taken from the author's previous work. Cylinders of various diameters, partially filled with water, ethanol or glycerol were subjected to a sinusoidal vertical motion. The critical acceleration, causing the interfacial wave to grow unstable, was found to be approximately constant for a given cylinder diameter, independent on the amplitude of the forcing oscillations. The experiments also indicate that the critical Acceleration always decreases with increasing cylinder diameter. A mathematical analysis of the interfacial instability is based on a stability investigation of a Mathieu equation. It is shown that the experimental data fall into unstable regions for a single, first mode of oscillations. This finding is supported by the experimental analysis given by Cilliberto and Gollub. The analysis shows the effects of the liquid column height on the interfacial instability to be dependent on tanh (k..l.). This multiplier is equal to 1 for the column heights of 250mm, 500 mm and 750 mm, investigated, and a given cylinder diameter, thus having no effect on the results. Computational analysis of the interfacial problem is developed which is based on the simplified MAC method incorporating the Continuum Surface Force (CSF) model for simulating the effects of surface tension. Computational experiments were run for water and glycerol, the two liquids of significantly different properties. The results are presented in the form of time sequenced plots showing the interfacial positions and graphs relating the interfacial wave amplitude and time. Stability of the interface is found to be dependent on the initial surface disturbance. Growth of the interfacial wave is observed in some cases. In the range of situations investigated, surface tension effects are found to have only a small influence both on the stability and frequency of the interfacial oscillations. The period of interfacial oscillations with no forcing vibrations is found to be in good agreement with the period predicted by mathematical analysis. Influence of the initial disturbance profile was also investigated. The results indicate that the interfacial wave adopts oscillatory behaviour similar to the other cases. The oscillation frequency of the interfacial wave undergoing forcing vibrations is found to match the findings of the mathematical analysis. The wave oscillates with an angular velocity equal to the multiples of the half the forcing vibration angular velocity, co/2. In the second investigation a testing rig was constructed to investigate the heat and mass transfer processes in dense hot sprays injected into an enclosed cylindrical vessel. Heat and mass transfer rates were investigated indirectly from the measurements of the gas - vapour mixture pressure rise in the cylinder. The experiments covered different combinations of the parameters influencing the processes. The number and size of spray nozzles, the vessel volume, the type of gas and the initial pressure level in the cylinder were investigated. The experimental results indicate that, for the range of solid cone nozzles tested, the heat and mass transfer characteristics are, to a first approximation independent of the size of the nozzles. The results also show that the rise of spray chamber internal pressure is directly proportional to liquid temperature and flowrate. An analysis, based on energy balances for the whole cylinder, has yielded a new dimensionless group incorporating the important parameters of droplet heat transfer namely the droplet velocity and radius, spray chamber dimensions, gravity, conductivity and convectivity. A good match has been found between the analytical results and experimental findings. An improved analysis, incorporating the effect of evaporation from drops, is also presented. It is based on simultaneous solution of energy and mass balance equations for a single droplet. Again, good agreement with the experimental results is found. Both analyses indicate that, for this particular case of dense, evaporative spray, the Nusselt number tends to have a value equal to I

    Design and analysis of numerical algorithms for the solution of linear systems on parallel and distributed architectures

    Get PDF
    The increasing availability of parallel computers is having a very significant impact on all aspects of scientific computation, including algorithm research and software development in numerical linear algebra. In particular, the solution of linear systems, which lies at the heart of most calculations in scientific computing is an important computation found in many engineering and scientific applications. In this thesis, well-known parallel algorithms for the solution of linear systems are compared with implicit parallel algorithms or the Quadrant Interlocking (QI) class of algorithms to solve linear systems. These implicit algorithms are (2x2) block algorithms expressed in explicit point form notation. [Continues.

    Optimization based clustering and classification algorithms in analysis of microarray gene expression data sets

    Get PDF
    Doctor of PhilosophyBioinformatics and computational biology are relatively new areas that involve the use of different techniques including computer science, informatics, biochemistry, applied math and etc., to solve biological problems. In recent years the development of new molecular genetics technologies, such as DNA microarrays led to the simultaneous measurement of expression levels of thousands and even tens of thousands of genes. Microarray gene expression technology has facilitated the study of genomic structure and investigation of biological systems. Numerical output of this technology is shown as microarray gene expression data sets. These data sets contain a very large number of genes and a relatively small number of samples and their precise analysis requires a robust and suitable computer software. Due to this, only a few existing algorithms are applicable to them, so more efficient methods for solving clustering, gene selection and classification problems of gene expression data sets are required and those methods need to be computationally applicable and less expensive. The aim of this thesis is to develop new algorithms for solving clustering, gene selection and data classification problems on gene expression data sets. Clustering in gene expression data sets is a challenging problem. The increasing use of DNA microarray-based tumour gene expression profiles for cancer diagnosis requires more efficient methods to solve clustering problems of these profiles. Different algorithms for clustering of genes have been proposed, however few algorithms can be applied to the clustering of samples. k-means algorithm, among very few clustering algorithms is applicable to microarray gene expression data sets, however these are not efficient for solving clustering problems when the number of genes is thousands and this algorithm is very sensitive to the choice of a starting point. Additionally, when the number of clusters is relatively large, this algorithm gives local minima which can differ significantly from the global solution. Over the last several years different approaches have been proposed to improve global ii Abstract Abstract search properties of k-means algorithm. One of them is the global k-means algorithm, however this algorithm is not efficient when data are sparse. In this thesis we developed a new version of the global k-means algorithm, the modified global k-means algorithm which is effective for solving clustering problems in gene expression data sets. In a microarray gene expression data set, in many cases only a small fraction of genes are informative whereas most of them are non-informative and make noise. Therefore the development of gene selection algorithms that allow us to remove as many non-informative genes as possible is very important. In this thesis we developed a new overlapping gene selection algorithm. This algorithm is based on calculating overlaps of different genes. It considerably reduces the number of genes and is efficient in finding a subset of informative genes. Over the last decade different approaches have been proposed to solve supervised data classification problems in gene expression data sets. In this thesis we developed a new approach which is based on the so-called max-min separability and is compared with the other approaches. The max-min separability algorithm is an equivalent of piecewise linear separability. An incremental algorithm is presented to compute piecewise linear functions separating two sets. This algorithm is applied along with a special gene selection algorithm. In this thesis, all new algorithms have been tested on 10 publicly available gene expression data sets and our numerical results demonstrate the efficiency of the new algorithms that were developed in the framework of this researc

    Integrated models approach for the prediction of aerosols in biomass power generation systems

    Get PDF
    The present work is aimed for the prediction of gaseous alkali sulphates with RNA approach. In the last year, also FLUENT has developed a version of the reactor network and this thesis focuses mainly on the formation of the reactors and the implementation of the kinetic. The first part of the work was dedicated to the study in the literature of experimental apparatus IPFR and 500 kW, both located at ENEL laboratories in Livorno. It was carried out an analysis of the experimental data with regard to the major pollutants and the production of particulate, with the purpose of being able to derive any tendency for the prediction of particulates by varying the experimental apparatus and the fuel. The second part of the work was dedicated to the CFD modeling of the experimental apparatus KVSA, biomass powered with 500 kW, located at the University of Stuttgart. With the use of package ANSYS 16.0 has built geometry, grid domain and simulations were carried out. This modeling has adopted some strategy to simplify the calculation of the kinetics reaction, such as the decrease of the computational cost, the high number of computation cells (about 106) and equations (neccesary for the description of a multiphase turbulent reactive system) make unworkable adoption of complex kinetic scheme needed to describe the formation of aerosols. The goal is to get the field of temperature, chemical species and density, these data are the input for the RNA models. The third part of the work was dedicated to the post-processing, that is search in literature and write by FLUENT requested format the kinetic schemes and thermodymanic data. The technique RNA divided the domain into a series of macro-regions almost homogeneous in terms of temperature, density and major chemical species. These regions are treated as perfectly stirred reactors in which the formation of aerosols can be calculated with detailed kinetic, that includes hundreds of species and thousands of chemical reactions. Once obtained the results, varying the number of reactors made a comparison between the varoius chemical species. The last part of the work was dedicated to the formulation of detailed parametric models of depositions the convection zone of the combustion chambers for all examined experimental apparatus, such as IPFR, 500 kW and KVSA. The objective is to be able to study the deposition tendency of alkali on tube bank surfaces by applying a mechanism of condensation. The deposition models can quantify the deposition rate of alakali compounds in specific conditions of the convective pass and give an estimation of the most favourable factors to depositions

    Nonclassicality detection and communication bounds in quantum networks

    Get PDF
    Quantum information investigates the possibility of enhancing our ability to process and transmit information by directly exploiting quantum mechanical laws. When searching for improvement opportunities, one typically starts by assessing the range of outcomes classically attainable, and then investigates to what extent control over the quantum features of the system could be helpful, as well as the best performance that could be achieved. In this thesis we provide examples of these aspects, in linear optics, quantum metrology, and quantum communication. We start by providing a criterion able to certify whether the outcome of a linear optical evolution cannot be explained by the classical wave-like theory of light. We do so by identifying a tight lower bound on the amount of correlations that could be detected among output intensities, when classical electrodynamics theory is used to describe the fields. Rather than simply detecting nonclassicality, we then focus on its quantification. In particular, we consider the characterisation of the amount of squeezing encoded on selected quantum probes by an unknown external device, without prior information on the direction of application. We identify the single-mode Gaussian probes leading to the largest average precision in noiseless and noisy conditions, and discuss the advantages arising from the use of correlated two-mode probes. Finally, we improve current bounds on the ultimate performance attainable in a quantum communication scenario. Specifically, we bound the number of maximally entangled qubits, or private bits, shared by two parties after a communication protocol over a quantum network, without restrictions on their classical communication. As in previous investigations, our approach is based on the evaluation of the maximum amount of entanglement that could be generated by the channels in the network, but it includes the possibility of changing entanglement measure on a channel-by-channel basis. Examples where this is advantageous are discussed.Open Acces

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
    corecore