671 research outputs found

    Application of Wavelet Packet Transform to detect genetic polymorphisms by the analysis of inter-Alu PCR patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of Inter-Alu PCR patterns obtained from human genomic DNA samples is a promising technique for a simultaneous analysis of many genomic loci flanked by Alu repetitive sequences in order to detect the presence of genetic polymorphisms. Inter-Alu PCR products may be separated and analyzed by capillary electrophoresis using an automatic sequencer that generates a complex pattern of peaks. We propose an algorithmic method based on the Haar-Walsh Wavelet Packet Transformation (WPT) for an efficient detection of fingerprint-type patterns generated by PCR-based methodologies. We have tested our algorithmic approach on inter-Alu patterns obtained from the genomic DNA of three couples of monozygotic twins, expecting that the inter-Alu patterns of each twins couple will show differences due to unavoidable experimental variability. On the contrary the differences among samples of different twins are supposed to originate from genetic variability. Our goal is to automatically detect regions in the inter-Alu pattern likely associated to the presence of genetic polymorphisms.</p> <p>Results</p> <p>We show that the WPT algorithm provides a reliable tool to identify sample to sample differences in complex peak patterns, reducing the possible errors and limits associated to a subjective evaluation. The redundant decomposition of the WPT algorithm allows for a procedure of best basis selection which maximizes the pattern differences at the lowest possible scale. Our analysis points out few classifying signal regions that could indicate the presence of possible genetic polymorphisms.</p> <p>Conclusions</p> <p>The WPT algorithm based on the Haar-Walsh wavelet is an efficient tool for a non-supervised pattern classification of inter-ALU signals provided by a genetic analyzer, even if it was not possible to estimate the power and false positive rate due to the lacking of a suitable data base. The identification of non-reproducible peaks is usually accomplished comparing different experimental replicates of each sample. Moreover, we remark that, albeit we developed and optimized an algorithm able to analyze patterns obtained through inter-Alu PCR, the method is theoretically applicable to whatever fingerprint-type pattern obtained analyzing anonymous DNA fragments through capillary electrophoresis, and it could be usefully applied on a wide range of fingerprint-type methodologies.</p

    Use of wavelet-packet transforms to develop an engineering model for multifractal characterization of mutation dynamics in pathological and nonpathological gene sequences

    Get PDF
    This study uses dynamical analysis to examine in a quantitative fashion the information coding mechanism in DNA sequences. This exceeds the simple dichotomy of either modeling the mechanism by comparing DNA sequence walks as Fractal Brownian Motion (fbm) processes. The 2-D mappings of the DNA sequences for this research are from Iterated Function System (IFS) (Also known as the Chaos Game Representation (CGR)) mappings of the DNA sequences. This technique converts a 1-D sequence into a 2-D representation that preserves subsequence structure and provides a visual representation. The second step of this analysis involves the application of Wavelet Packet Transforms, a recently developed technique from the field of signal processing. A multi-fractal model is built by using wavelet transforms to estimate the Hurst exponent, H. The Hurst exponent is a non-parametric measurement of the dynamism of a system. This procedure is used to evaluate gene-coding events in the DNA sequence of cystic fibrosis mutations. The H exponent is calculated for various mutation sites in this gene. The results of this study indicate the presence of anti-persistent, random walks and persistent sub-periods in the sequence. This indicates the hypothesis of a multi-fractal model of DNA information encoding warrants further consideration.;This work examines the model\u27s behavior in both pathological (mutations) and non-pathological (healthy) base pair sequences of the cystic fibrosis gene. These mutations both natural and synthetic were introduced by computer manipulation of the original base pair text files. The results show that disease severity and system information dynamics correlate. These results have implications for genetic engineering as well as in mathematical biology. They suggest that there is scope for more multi-fractal models to be developed

    Wavelet Transform-Based Phylogenetic Analysis of Protein Sequences

    Get PDF
    With the acceleration of gene sequencing studies, many biological data emerges. By analyzing these data, it contributes greatly to the studies on understanding the metabolic disorders in the organism and increasing the efficiency of the drugs. For this purpose, it is critical to classify the data in a way that is accurate, fast and low-cost according to its characteristics and relationships. Besides experimental methods, machine learning and bioinformatics methods are used. Artificial neural networks, support vector machines, flexible calculation methods are frequently used methods. However, the effectiveness of these methods on biosecence data depends on the method of using the method with the most appropriate parameters and converting protein sequences into numerical sequences. When the sequences are transformed with amino acid frequencies, the properties of amino acids are ignored. For this purpose, handling the physicochemical (hydrophobicity, hydrophilicity ...) properties of amino acids increases the performance of classification techniques. The phylogenetic tree is the best method to visualize the classification among species. In the project, the wavelet transform used in the analysis of digital signals has been adapted to protein sequences defined by hydrophobicity values. Each protein sequence was defined to correspond to a signal, the wavelet transform was divided into approach and detail components, and the similarities between them were calculated, and the phylogenetic tree of the species was created. As an application, phylogenetic trees of ND5 protein sequences of 22 species were created in the MatlabR2017 program of NeighborJoining (NJ) and Unweighed Pair Group Method of Aritmetic Averages (UPGMA) methods

    The Bioinformatics Tools for Discovery of Genetic Diversity by Means of Elastic Net and Hurst Exponent

    Get PDF
    The genome era allowed us to evaluate different aspects on genetic variation, with a precise manner followed by a valuable tip to guide the improvement of knowledge and direct to upgrade to human life. In order to scrutinize these treasured resources, some bioinformatics tools permit us a deep exploration of these data. Among them, we show the importance of the discrete non-decimated wavelet transform (NDWT). The wavelets have a better ability to capture hidden components of biological data and an efficient link between biological systems and the mathematical objects used to describe them. The decomposition of signals/sequences at different levels of resolution allows obtaining distinct characteristics in each level. The analysis using technique of wavelets has been growing increasingly in the study of genomes. One of the great advantages associated to this method corresponds to the computational gain, that is, the analyses are processed almost in real time. The applicability is in several areas of science, such as physics, mathematics, engineering, and genetics, among others. In this context, we believe that using R software and applied NDWT coupled with elastic net domains and Hurst exponent will be of valuable guideline to researchers of genetics in the investigation of the genetic variability

    WAVELET ANALYSIS OF SHORT GLOBULAR HOMOLOGOUS PROTEINS IN MESOPHILE AND THERMOPHILE PROKARYOTES

    Get PDF
    This study looked to identify features related to thermal stability and function in the amino acid chains of short globular proteins from mesophile and thermophile species, within the constraint that the protein fold to perform a speci_c function. To do so 540 homologous pairs of proteins were studied. The amino acid chains were con-verted to hydrophobicity signals by assigning a hydropathy score to each residue in the polypeptide. The hydrophobicity signals were passed through a wavelet packet transform and the resulting spectra analyzed. Bootstrapping was used to gener-ate a control data set to determine if the true ordering of amino acids codes for a non-random uctuation in hydropathy along the length of the polypeptide. A method to relate the spectral characteristics to the function of a protein making use of gene ontologies was developed as a proof of concept. As a group, mesophile and thermophile proteins have very similar total power. However, on a protein-to-protein basis the thermophile contains a greater total power in 489 of the 540 pairs (90.56%). The hydrophobicity scale used in this study is strongly correlated with Gibbs free energy. The total power of a protein is also strongly correlated to the Gibbs free energy, so that the thermophile protein contains a greater free energy than its corresponding mesophile partner. It has been noted in the experimental literature that thermophile proteins are stabilized by increasing their Gibbs free en-ergy. The statistical measures skew and kurtosis were adapted so that a spectrum of skew and kurtosis values were generated for each protein. These values indicate that the uctuation in hydropathy is non random and position dependent. Thermophile proteins have larger power at frequency bands 21 through 31 (average intervals of 100 to 77 amino acids), and 44 to 56 (on average 46 to 19 amino acids), which may contribute to their having greater total power in 90.56% of the pairs. Increases to the uctuation in hydropathy within certain lengths throughout the total amino acid chain of a protein may be a means of raising the temperature at which a protein denatures

    Human Promoter Prediction Using DNA Numerical Representation

    Get PDF
    With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system

    An investigation into the requirements for an efficient image transmission system over an ATM network

    Get PDF
    This thesis looks into the problems arising in an image transmission system when transmitting over an A TM network. Two main areas were investigated: (i) an alternative coding technique to reduce the bit rate required; and (ii) concealment of errors due to cell loss, with emphasis on processing in the transform domain of DCT-based images. [Continues.

    Multimodal Biometrics Enhancement Recognition System based on Fusion of Fingerprint and PalmPrint: A Review

    Get PDF
    This article is an overview of a current multimodal biometrics research based on fingerprint and palm-print. It explains the pervious study for each modal separately and its fusion technique with another biometric modal. The basic biometric system consists of four stages: firstly, the sensor which is used for enrolmen
    • …
    corecore