15 research outputs found

    Statistical Modelling

    Get PDF
    The book collects the proceedings of the 19th International Workshop on Statistical Modelling held in Florence on July 2004. Statistical modelling is an important cornerstone in many scientific disciplines, and the workshop has provided a rich environment for cross-fertilization of ideas from different disciplines. It consists in four invited lectures, 48 contributed papers and 47 posters. The contributions are arranged in sessions: Statistical Modelling; Statistical Modelling in Genomics; Semi-parametric Regression Models; Generalized Linear Mixed Models; Correlated Data Modelling; Missing Data, Measurement of Error and Survival Analysis; Spatial Data Modelling and Time Series and Econometrics

    Molecular insights to crustacean phylogeny

    Get PDF
    This thesis aims to resolve internal relationships of the major crustacean groups inferring phylogenies with molecular data. New molecular and neuroanatomical data support the scenario that the Hexapoda might have evolved from Crustacea. Most molecular studies of crustaceans relied on single gene or multigene analyses in which for most cases partly sequenced rRNA genes were used. However, intensive data quality and alignment assessments prior to phylogenetic reconstructions are not conducted in most studies. One methodological aim in this thesis was to implement new tools to infer data quality, to improve alignment quality and to test the impact of complex modeling of the data. Two of the three phylogenetic analyses in this thesis are also based on rRNA genes. In analysis (A) 16S rRNA, 18S rRNA and COI sequences were analyzed. RY coding of the COI fragment, an alignment procedure that considers the secondary structure of RNA molecules and the exclusion of alignment positions of ambiguous positional homology was performed to improve data quality. Anyhow, by extensive network reconstructions it was shown that the signal quality in the chosen and commonly used markers is not suitable to infer crustacean phylogeny, despite the extensive data processing and optimization. This result draws a new light on previous studies relying on these markers. In analyses (B) completely sequenced 18S and 28S rRNA genes were used to reconstruct the phylogeny. Base compositional heterogeneity was taken into account based on the finding of analysis (A), additionally to secondary structure alignment optimization and alignment assessment. The complex modeling to compare time-heterogeneous versus time-homogenous processes in combination with mixed models for an implementation of secondary structures was only possible applying the Bayesian software package PHASE. The results clearly demonstrated that complex modeling counts and that ignoring time-heterogeneous processes can mislead phylogenetic reconstructions. Some results enlight the phylogeny of Crustaceans, for the first time the Cephalocarida (Hutchinsoniella macracantha) were placed in a clade with the Branchiopoda, which morphologically is plausible. Compared to the time-homogeneous tree the time-heterogeneous tree gives lower support values for some nodes. It can be suggested, that the incorporation of base compositional heterogeneity in phylogenetic analysis improves the reliability of the topology. The Pancrustacea are supported maximally in both approaches, but internal relations are not reliably reconstructed. One result of this analysis is that the phylogenetic signal in rRNA data might be eroded for crustaceans. Recent publications presented analyses based on phylogenomic data, to reconstruct mainly metazoan phylogeny. The supermatrix method seems to outperform the supertree approach. In this analysis the supermatrix approach was applied. Crustaceans were collected to conduct EST sequencing projects and to include the resulting sequences combined with public sequence data into a phylogenomic analysis (C). New and innovative reduction heuristics were performed to condense the dataset. The results showed that the matrix implementation of the reduced dataset ends in a more reliable topology in which most node values are highly supported. In analysis (C) the Branchiopoda were positioned as sister-group to Hexapoda, a differing result to analysis (A) and (B), but that is in line with other phylogenomic studies

    Annual Report

    Get PDF

    Separability between signal and noise components using the distribution of scaled Hankel matrix eigenvalues with application in biomedical signals.

    Get PDF
    Biomedical signals are records from human and animal bodies. These records are considered as nonlinear time series, which hold important information about the physiological activities of organisms, and embrace many subjects of interest. However, biomedical signals are often corrupted by artifacts and noise, which require separation or signal extraction before any statistical evaluation. Another challenge in analysing biomedical signals is that their data is often non-stationary, particularly when there is an abnormal event observed within the signal, such as epileptic seizure, and can also present chaotic behaviour. The literature suggests that distinguishing chaos from noise continues to remain a highly contentious issue in the modern age, as it has been historically. This is because chaos and noise share common properties, which in turn make them indistinguishable. We seek to provide a viable solution to this problem by presenting a novel approach for the separability between signal and noise components and the diļ¬€erentiation of noise from chaos. Several methods have been used for the analysis of and discrimination between different categories of biomedical signals, but many of these are based on restrictive assumptions of the normality, stationarity and linearity of the observed data. Therefore, an improved technique which is robust in its analysis of non-stationary time series is of paramount importance in accurate diagnosis of human diseases. The SSA (Singular Spectrum Analysis) technique does not depend on these assumptions, which could be very helpful for analysing and modelling biomedical data. Therefore, the main aim of the thesis is to provide a novel approach for developing the SSA technique, and then apply it to the analysis of biomedical signals. SSA is a reliable technique for separating an arbitrary signal from a noisy time series (signal+noise). It is based upon two main selections: window length, L; and the number of eigenvalues, r. These values play an important role in the reconstruction and forecasting stages. However, the main issue in extracting signals using the SSA procedure lies in identifying the optimal values of L and r required for signal reconstruction. The aim of this thesis is to develop theoretical and methodological aspects of the SSA technique, to present a novel approach to distinguishing between deterministic and stochastic processes, and to present an algorithm for identifying the eigenvalues corresponding to the noise component, and thereby choosing the optimal value of r relating to the desired signal for separability between signal and noise. The algorithm used is considered as an enhanced version of the SSA method, which decomposes a noisy signal into the sum of a signal and noise. Although the main focus of this thesis is on the selection of the optimal value of r, we also provide some results and recommendations to the choice of L for separability. Several criteria are introduced which characterise this separability. The proposed approach is based on the distribution of the eigenvalues of a scaled Hankel matrix, and on dynamical systems, embedding theorem, matrix algebra and statistical theory. The research demonstrates that the proposed approach can be considered as an alternative and promising technique for choosing the optimal values of r and L in SSA, especially for biomedical signals and genetic time series. For the theoretical development of the approach, we present new theoretical results on the eigenvalues of a scaled Hankel matrix, provide some properties of the eigenvalues, and show the eļ¬€ect of the window length and the rank of the Hankel matrix on the eigenvalues. The new theoretical results are examined using simulated and real time series. Furthermore, the eļ¬€ect of window length on the distribution of the largest and smallest eigenvalues of the scaled Hankel matrix is also considered for the white noise process. The results indicate that the distribution of the largest eigenvalue for the white noise process has a positive skewed distribution for diļ¬€erent series lengths and diļ¬€erent values of window length, whereas the distribution of the smallest eigenvalue has a diļ¬€erent pattern with L; the distribution changes from left to right when L increases. These results, together with other results obtained by the diļ¬€erent criteria introduced and used in this research, are very promising for the identiļ¬cation of the signal subspace. For the practical aspect and empirical results, various biomedical signals and genetics time series are used. First, to achieve the objectives of the thesis, a comprehensive study has been made on the distribution, pattern; and behaviour of scaled Furthermore, the normal distribution with diļ¬€erent parameters is considered and the eļ¬€ect of scale and shape parameters are evaluated. The correlation between eigenvalues is also assessed, using parametric and non-parametric association criteria. In addition, the distribution of eigenvalues for synthetic time series generated from some well known low dimensional chaotic systems are analysed in-depth. The results yield several important properties with broad application, enabling the distinction between chaos and noise in time series analysis. At this stage, the main result of the simulation study is that the ļ¬ndings related to the series generated from normal distribution with mean zero (white noise process) are totally diļ¬€erent from those obtained for other series considered in this research, which makes a novel contribution to the area of signal processing and noise reduction. Second, the proposed approach and its criteria are applied to a number of simulated and real data with diļ¬€erent levels of noise and structures. Our results are compared with those obtained by common and well known criteria in order to evaluate, enhance and conļ¬rm the accuracy of the approach and its criteria. The results indicate that the proposed approach has the potential to split the eigenvalues into two groups; the ļ¬rst corresponding to the signal and the second to the noise component. In addition, based on the results, the optimal value of L that one needs for the reconstruction of a noise free signal from a noisy series should be the median of the series length. The results conļ¬rm that the performance of the proposed approach can improve the quality of the reconstruction step for signal extraction. Finally, the thesis seeks to explore the applicability of the proposed approach for discriminating between normal and epileptic seizure electroencephalography (EEG) signals, and ļ¬ltering the signal segments to make them free from noise. Various criteria based on the largest eigenvalue are also presented and used as features to distinguish between normal and epileptic EEG segments. These features can be considered as useful information to classify brain signals. In addition, the approach is applied to the removal of nonspeciļ¬c noise from Drosophila segmentation genes. Our ļ¬ndings indicate that when extracting signal from diļ¬€erent genes, for optimised signal and noise separation, a diļ¬€erent number of eigenvalues need to be chosen for each gene

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    Book of abstracts

    Get PDF

    Wheat Improvement

    Get PDF
    This open-access textbook provides a comprehensive, up-to-date guide for students and practitioners wishing to access in a single volume the key disciplines and principles of wheat breeding. Wheat is a cornerstone of food security: it is the most widely grown of any crop and provides 20% of all human calories and protein. The authorship of this book includes world class researchers and breeders whose expertise spans cutting-edge academic science all the way to impacts in farmersā€™ fields. The bookā€™s themes and authors were selected to provide a didactic work that considers the background to wheat improvement, current mainstream breeding approaches, and translational research and avant garde technologies that enable new breakthroughs in science to impact productivity. While the volume provides an overview for professionals interested in wheat, many of the ideas and methods presented are equally relevant to small grain cereals and crop improvement in general. The book is affordable, and because it is open access, can be readily shared and translated -- in whole or in part -- to university classes, members of breeding teams (from directors to technicians), conference participants, extension agents and farmers. Given the challenges currently faced by academia, industry and national wheat programs to produce higher crop yields --- often with less inputs and under increasingly harsher climates -- this volume is a timely addition to their toolkit
    corecore