1,231 research outputs found

    Information decomposition of symbolic sequences

    Full text link
    We developed a non-parametric method of Information Decomposition (ID) of a content of any symbolical sequence. The method is based on the calculation of Shannon mutual information between analyzed and artificial symbolical sequences, and allows the revealing of latent periodicity in any symbolical sequence. We show the stability of the ID method in the case of a large number of random letter changes in an analyzed symbolic sequence. We demonstrate the possibilities of the method, analyzing both poems, and DNA and protein sequences. In DNA and protein sequences we show the existence of many DNA and amino acid sequences with different types and lengths of latent periodicity. The possible origin of latent periodicity for different symbolical sequences is discussed.Comment: 18 pages, 8 figure

    Detecting short adjacent repeats in multiple sequences: a Bayesian approach.

    Get PDF
    Li, Qiwei.Thesis (M.Phil.)--Chinese University of Hong Kong, 2010.Includes bibliographical references (p. 75-85).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Repetitive DNA Sequence --- p.3Chapter 1.1.1 --- Definition and Categorization of Repeti- tive DNA Sequence --- p.3Chapter 1.1.2 --- Definition and Categorization of Tandem Repeats --- p.4Chapter 1.1.3 --- Definition and Categorization of Interspersed Repeats --- p.6Chapter 1.2 --- Research Significance --- p.7Chapter 1.3 --- Contributions --- p.9Chapter 1.4 --- Thesis Organization --- p.11Chapter 2 --- Literature Review and Overview of Our Method --- p.13Chapter 2.1 --- Existing Methods --- p.14Chapter 2.2 --- Overview of Our Method --- p.17Chapter 3 --- Theoretical Background --- p.22Chapter 3.1 --- Multinomial Distributions --- p.23Chapter 3.2 --- Dirichlet Distribution --- p.23Chapter 3.3 --- Metropolis-Hastings Sampling --- p.25Chapter 3.4 --- Gibbs Sampling --- p.26Chapter 4 --- Problem Description --- p.28Chapter 4.1 --- Generative Model --- p.29Chapter 4.1.1 --- Input Data R --- p.31Chapter 4.1.2 --- Parameters A (Repeat Segment Starting Positions) --- p.32Chapter 4.1.3 --- Parameters S (Repeat Segment Structures) --- p.33Chapter 4.1.4 --- Parameters θ(Motif Matrix) --- p.35Chapter 4.1.5 --- Parameters Φ (Background Distribution) . --- p.36Chapter 4.1.6 --- An Example of the Model Schematic Di- agram --- p.37Chapter 4.2 --- Parameter Structure --- p.38Chapter 4.3 --- Posterior Distribution --- p.40Chapter 4.3.1 --- The Full Posterior Distribution --- p.41Chapter 4.3.2 --- The Collapsed Posterior Distribution --- p.42Chapter 4.4 --- Conclusion --- p.43Chapter 5 --- Methodology --- p.45Chapter 5.1 --- Schematic Procedure --- p.46Chapter 5.1.1 --- The Basic Schematic Procedure --- p.46Chapter 5.1.2 --- The Improved Schematic Procedure --- p.47Chapter 5.2 --- Initialization --- p.49Chapter 5.3 --- Predictive Update Step for θn and Φn --- p.50Chapter 5.4 --- Gibbs Sampling Step for an --- p.50Chapter 5.5 --- Metropolis-Hastings Sampling Step for sn --- p.51Chapter 5.5.1 --- Rear Indel Move --- p.53Chapter 5.5.2 --- Partial Shift Move --- p.56Chapter 5.5.3 --- Front Indel Move --- p.56Chapter 5.6 --- Phase Shifts --- p.57Chapter 5.7 --- Conclusion --- p.58Chapter 6 --- Results and Discussion --- p.60Chapter 6.1 --- Settings --- p.61Chapter 6.2 --- Experiment on Synthetic Data --- p.63Chapter 6.3 --- Experiment on Real Data --- p.69Chapter 7 --- Conclusion and Future Work --- p.72Chapter 7.1 --- Conclusion --- p.72Chapter 7.2 --- Future Work --- p.74Bibliography --- p.7

    Statistical methods for differential proteomics at peptide and protein level

    Get PDF

    Concept of Template Synthesis of Proteoglycans

    Get PDF

    MIDAS: Aplicación informática para la identificación de microsatélites exactos e inexactos en secuencias genómicas

    Get PDF
    Los microsatélites son secuencias cortas repetidas en tándem, frecuentes y diversas en los genomas de todas las especies, constituyendo importantes marcadores en múltiples áreas de investigación basadas en la genómica. Se han encontrado asociaciones de estos marcadores a un número importante de enfermedades en humanos. En el desarrollo de vacunas se ha demostrado cómo los patógenos pueden evadir la respuesta inmune simplemente alterando la composición de las secuencias repetidas en sus genes. Existen numerosas aplicaciones informáticas destinadas a la detección de estas secuencias, no obstante éstas no cubren todas las expectativas debido a la divergencia de criterios y enfoques aplicados a la solución del problema de su detección. MIDAS implementa una solución no heurística basada en dos algoritmos combinatorios en serie: el primero detecta microsatélites exactos, y el segundo, de permitirlo los parámetros del modelo, extiende las secuencias a su versión inexacta óptima. La aplicación tiene como entrada la secuencia genómica en formato GBFF o FASTA y su salida brinda las posiciones de los microsatélites en la secuencia genómica, así como tamaños, alineamientos, flancos, posiciones, etc. El algoritmo tiene una elevada eficiencia y es exhaustivo, detectando todas las posibles secuencias repetidas independientemente de su composición nucleotídica.Palabras clave: SSR; marcador molecular; microsatélite; minería de datos; algoritmo</p

    Searching repetitive DNA in nucleotide sequences

    Get PDF
    V této práci je rozebrána problematika repetitivních DNA a algoritmů pro vyhledávání tandemových repetic. Tandemové repetice hrají důležitou roli v biologickém průmyslu. Slouží jako genetické markery pro tvoření genetických map, profilů DNA pro určování otcovství a ve forenzní oblasti. Dalším důvodem pro jejich vyhledávání je, že mají za následek několik závažných onemocnění člověka. Algoritmy pro jejich vyhledávání jsou proto předmětem mnoha studií. Algoritmy dělíme do dvou hlavních skupin – algoritmy porovnávající řetězce DNA a algoritmy založené na zpracování numericky reprezentované DNA. Úkolem této bakalářské práce je vybrat si zástupce z každé skupiny, navrhnout jejich realizaci a tu poté také předvést v programovém prostředí Matlab. Výsledkem práce by mělo být porovnání obou programů na základě vybraných kritérií s pomocí několika sekvencí. Těchto několik vybraných sekvencí budou mít za úkol tyto programy zpracovat a výsledky z těchto zpracování budou mezi sebou porovnány.This bachelor thesis deals with problem of repetitive DNA and algorithms for searching tandem repeats. Tandem repeats are important for biological industry. They are used as gene marker for creating genetic maps, profiles of DNA for paternity testing and in forensic sphere. Tandem repeats wreak several sever humen illness and it is another reason for their searching. Therefore are algorithms for searching of tandem repeats objects of many studies. We can divided algorithms to two main groups – algorithms based on string matching and algorithms based on digital signal processing. Task of this bachelor thesis is choose one member of each group and propose their implementation and than implement them in program Matlab. Result of this thesis should be comparison both programs. This comparison pass off on the basis of chosen criterion and several sequences. Both program transform these sequences and than can be programs compare.

    New label-free methods for protein relative quantification applied to the investigation of an animal model of Huntington Disease

    Get PDF
    Spectral Counts approaches (SpCs) are largely employed for the comparison of protein expression profiles in label-free (LF) differential proteomics applications. Similarly, to other comparative methods, also SpCs based approaches require a normalization procedure before Fold Changes (FC) calculation. Here, we propose new Complexity Based Normalization (CBN) methods that introduced a variable adjustment factor (f), related to the complexity of the sample, both in terms of total number of identified proteins (CBN(P)) and as total number of spectral counts (CBN(S)). Both these new methods were compared with the Normalized Spectral Abundance Factor (NSAF) and the Spectral Counts log Ratio (Rsc), by using standard protein mixtures. Finally, to test the robustness and the effectiveness of the CBNs methods, they were employed for the comparative analysis of cortical protein extract from zQ175 mouse brains, model of Huntington Disease (HD), and control animals (raw data available via ProteomeXchange with identifier PXD017471). LF data were also validated by western blot and MRM based experiments. On standard mixtures, both CBN methods showed an excellent behavior in terms of reproducibility and coefficients of variation (CVs) in comparison to the other SpCs approaches. Overall, the CBN(P) method was demonstrated to be the most reliable and sensitive in detecting small differences in protein amounts when applied to biological samples

    Detecting exact and approximate repeats in DNA based on string matching

    Get PDF
    V této práci je v teoretické části popsána repetitivní DNA, jeji typy a metody pro jejich vyhledávání v sekvenci DNA. V praktické části je navržen a popsán algoritmus pro vyhledávání tandemových repetic s využitím Hammingové vzdálenosti a následně jeho zhodnocení a využití v budoucnosti.This work is described in the theoretical part of repetitive DNA, their types and methods to search in DNA sequence. In the practical part is designed and described an algorithm for finding tandem repeats using Hamming distance and consequently its evaluation and use in the future.

    LONG SPAN DNA PAIRED-END TAGS (DNA-PET) FOR UNRAVELING GENOMIC REARRANGEMENTS IN CANCER GENOMES

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    The metabolic enzyme hexokinase 2 localizes to the nucleus in AML and normal haematopoietic stem and progenitor cells to maintain stemness

    Get PDF
    Thomas, Egan et al. report that hexokinase 2 localizes to the nucleus of leukaemic and normal haematopoietic cells to maintain stemness by interacting with nuclear proteins and modulating chromatin accessibility independently of its kinase activity. Mitochondrial metabolites regulate leukaemic and normal stem cells by affecting epigenetic marks. How mitochondrial enzymes localize to the nucleus to control stem cell function is less understood. We discovered that the mitochondrial metabolic enzyme hexokinase 2 (HK2) localizes to the nucleus in leukaemic and normal haematopoietic stem cells. Overexpression of nuclear HK2 increases leukaemic stem cell properties and decreases differentiation, whereas selective nuclear HK2 knockdown promotes differentiation and decreases stem cell function. Nuclear HK2 localization is phosphorylation-dependent, requires active import and export, and regulates differentiation independently of its enzymatic activity. HK2 interacts with nuclear proteins regulating chromatin openness, increasing chromatin accessibilities at leukaemic stem cell-positive signature and DNA-repair sites. Nuclear HK2 overexpression decreases double-strand breaks and confers chemoresistance, which may contribute to the mechanism by which leukaemic stem cells resist DNA-damaging agents. Thus, we describe a non-canonical mechanism by which mitochondrial enzymes influence stem cell function independently of their metabolic function
    corecore