5,198 research outputs found

    Phylogenetic evidence based on Trypanosoma cruzi nuclear gene sequences and information entropy suggest that inter-strain intragenic recombination is a basic mechanism underlying the allele diversity of hybrid strains

    Get PDF
    The diversity of Trypanosoma cruzi is categorized into six discrete typing units (DTUs) T. cruzi I to VI. Several studies indicate that T. cruzi I and II are ancestors of T. cruzi which are considered products of independent hybridization events. the individual haplotypes or alleles of these hybrids cluster in three groups, either closer to T. cruzi I or T. cruzi II or forming a midpoint clade between T. cruzi I and II in network phylogenies. To understand the origins of these different sets of haplotypes and test the hypothesis of a direct correlation between high entropy and positive selection, we analyzed four nuclear protein coding genes. We show that hybrid strains contain haplotypes that are mosaics probably originated by intragenic recombination. Accordingly, in phylogenies, the hybrid haplotypes are closer to one or both parentals (T. cruzi I and II) depending on the proportion of parental sequences composing the mosaics. in addition, Shannon entropy, used to measure sequence diversity, is highly correlated with positive selection in the four genes here analyzed. Our data on recombination patterns also support the hypothesis of two hybridization events in the hybrid structures of T. cruzi Data presented and discussed here are consistent with a scenario where TcI and TcII are phylogenetically divergent forming a hybrid zone in between (T. cruzi III-VI). We predict that because of the quasi-random nature of T. cruzi I and II hybridization more DTUs, with different haplotype combinations, will be discovered in the hybrid zone. (C) 2012 Elsevier B.V. All rights reserved.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Howard Hughes Medical InstituteUniversidade Federal de São Paulo, Dept Microbiol Imunol & Parasitol, BR-04023062 São Paulo, BrazilUniversidade Federal de São Paulo, Dept Med, Disciplina Infectol, BR-04023900 São Paulo, BrazilUniversidade Federal de São Paulo, Lab Genom Evolut & Biocomplexidade, BR-04039032 São Paulo, BrazilUniversidade Federal de São Paulo, Dept Microbiol Imunol & Parasitol, BR-04023062 São Paulo, BrazilUniversidade Federal de São Paulo, Dept Med, Disciplina Infectol, BR-04023900 São Paulo, BrazilUniversidade Federal de São Paulo, Lab Genom Evolut & Biocomplexidade, BR-04039032 São Paulo, BrazilWeb of Scienc

    Computational representation and discovery of transcription factor binding sites

    Get PDF
    Tesi per compendi de publicacions.The information about how, when, and where are produced the proteins has been one of the major challenge in molecular biology. The studies about the control of the gene expression are essential in order to have a better knowledge about the protein synthesis. The gene regulation is a highly controlled process that starts with the DNA transcription. This process operates at the gene level, hereditary basic units, which will be copied into primary ribonucleic acid (RNA). This first step is controlled by the binding of specific proteins, called as Transcription Factors (TF), with a sequence of the DNA (Deoxyribonucleic Acid) in the regulatory region of the gene. These DNA sequences are known as binding sites (BS). The binding sites motifs are usually very short (5 to 20 bp long) and highly degenerate. These sequences are expected to occur at random every few hundred base pairs. Besides, a TF can bind among different sites. Due to its highly variability, it is difficult to establish a consensus sequence. The study and identification binding sites is important to clarify the control of the gene expression. Due to the importance of identifying binding sites sequences, projects such as ENCODE (Encyclopedia of DNA elements), have dedicated efforts to map binding sites for large set of transcription factor to identify regulatory regions. In this thesis, we have approached the problem of the binding site detection from another angle. We have developed a set of toolkit for motif binding detection based on linear and non-linear models. First of all, we have been able to characterize binding sites using different approaches. The first one is based on the information that there is in each binding sites position. The second one is based on the covariance model of an aligned set of binding sites sequences. From these motif characterizations, we have proposed a new set of computational methods to detect binding sites. First, it was developed a new method based on parametric uncertainty measurement (Rényi entropy). This detection algorithm evaluates the variation on the total Rényi entropy of a set of sequences when a candidate sequence is assumed to be a true binding site belonging to the set. This method was found to perform especially well on transcription factors that the correlation among binding sites was null. The correlation among binding sites positions was considered through linear, Q-residuals, and non-linear models, alpha-Divergence and SIGMA. Q-residuals is a novel motif finding method which constructs a subspace based on the covariance of numerical DNA sequences. When the number of available sequences was small, The Q-residuals performance was significantly better and faster than all the others methodologies. Alpha-Divergence was based on the variation of the total parametric divergence in a set of aligned sequenced with binding evidence when a candidate sequence is added. Given an optimal q-value, the alpha-Divergence performance had a better behavior than the others methodologies in most of the studied transcription factor binding sites. And finally, a new computational tool, SIGMA, was developed as a trade-off between the good generalisation properties of pure entropy methods and the ability of position-dependency metrics to improve detection power. In approximately 70% of the cases considered, SIGMA exhibited better performance properties, at comparable levels of computational resources, than the methods which it was compared. This set of toolkits and the models for the detection of a set of transcription factor binding sites (TFBS) has been included in an R-package called MEET.La informació sobre com, quan i on es produeixen les proteïnes ha estat un dels majors reptes en la biologia molecular. Els estudis sobre el control de l'expressió gènica són essencials per conèixer millor el procés de síntesis d'una proteïna. La regulació gènica és un procés altament controlat que s'inicia amb la transcripció de l'ADN. En aquest procés, els gens, unitat bàsica d'herència, són copiats a àcid ribonucleic (RNA). El primer pas és controlat per la unió de proteïnes, anomenades factors de transcripció (TF), amb una seqüència d'ADN (àcid desoxiribonucleic) en la regió reguladora del gen. Aquestes seqüències s'anomenen punts d'unió i són específiques de cada proteïna. La unió dels factors de transcripció amb el seu corresponent punt d'unió és l'inici de la transcripció. Els punts d'unió són seqüències molt curtes (5 a 20 parells de bases de llargada) i altament degenerades. Aquestes seqüències poden succeir de forma aleatòria cada centenar de parells de bases. A més a més, un factor de transcripció pot unir-se a diferents punts. A conseqüència de l'alta variabilitat, és difícil establir una seqüència consensus. Per tant, l'estudi i la identificació del punts d'unió és important per entendre el control de l'expressió gènica. La importància d'identificar seqüències reguladores ha portat a projectes com l'ENCODE (Encyclopedia of DNA Elements) a dedicar grans esforços a mapejar les seqüències d'unió d'un gran conjunt de factors de transcripció per identificar regions reguladores. L'accés a seqüències genòmiques i els avanços en les tecnologies d'anàlisi de l'expressió gènica han permès també el desenvolupament dels mètodes computacionals per la recerca de motius. Gràcies aquests avenços, en els últims anys, un gran nombre de algorismes han sigut aplicats en la recerca de motius en organismes procariotes i eucariotes simples. Tot i la simplicitat dels organismes, l'índex de falsos positius és alt respecte als veritables positius. Per tant, per estudiar organismes més complexes és necessari mètodes amb més sensibilitat. En aquesta tesi ens hem apropat al problema de la detecció de les seqüències d'unió des de diferents angles. Concretament, hem desenvolupat un conjunt d'eines per la detecció de motius basats en models lineals i no-lineals. Les seqüències d'unió dels factors de transcripció han sigut caracteritzades mitjançant dues aproximacions. La primera està basada en la informació inherent continguda en cada posició de les seqüències d'unió. En canvi, la segona aproximació caracteritza la seqüència d'unió mitjançant un model de covariància. A partir d'ambdues caracteritzacions, hem proposat un nou conjunt de mètodes computacionals per la detecció de seqüències d'unió. Primer, es va desenvolupar un nou mètode basat en la mesura paramètrica de la incertesa (entropia de Rényi). Aquest algorisme de detecció avalua la variació total de l'entropia de Rényi d'un conjunt de seqüències d'unió quan una seqüència candidata és afegida al conjunt. Aquest mètode va obtenir un bon rendiment per aquells seqüències d'unió amb poca o nul.la correlació entre posicions. La correlació entre posicions fou considerada a través d'un model lineal, Qresiduals, i dos models no-lineals, alpha-Divergence i SIGMA. Q-residuals és una nova metodologia per la recerca de motius basada en la construcció d'un subespai a partir de la covariància de les seqüències d'ADN numèriques. Quan el nombre de seqüències disponible és petit, el rendiment de Q-residuals fou significant millor i més ràpid que en les metodologies comparades. Alpha-Divergence avalua la variació total de la divergència paramètrica en un conjunt de seqüències d'unió quan una seqüència candidata és afegida. Donat un q-valor òptim, alpha-Divergence va tenir un millor rendiment que les metodologies comparades en la majoria de seqüències d'unió dels factors de transcripció considerats. Finalment, un nou mètode computacional, SIGMA, va ser desenvolupat per tal millorar la potència de deteccióPostprint (published version

    Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017

    Get PDF
    Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work. In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland). The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success

    CAD Tools for DNA Micro-Array Design, Manufacture and Application

    Get PDF
    Motivation: As the human genome project progresses and some microbial and eukaryotic genomes are recognized, numerous biotechnological processes have attracted increasing number of biologists, bioengineers and computer scientists recently. Biotechnological processes profoundly involve production and analysis of highthroughput experimental data. Numerous sequence libraries of DNA and protein structures of a large number of micro-organisms and a variety of other databases related to biology and chemistry are available. For example, microarray technology, a novel biotechnology, promises to monitor the whole genome at once, so that researchers can study the whole genome on the global level and have a better picture of the expressions among millions of genes simultaneously. Today, it is widely used in many fields- disease diagnosis, gene classification, gene regulatory network, and drug discovery. For example, designing organism specific microarray and analysis of experimental data require combining heterogeneous computational tools that usually differ in the data format; such as, GeneMark for ORF extraction, Promide for DNA probe selection, Chip for probe placement on microarray chip, BLAST to compare sequences, MEGA for phylogenetic analysis, and ClustalX for multiple alignments. Solution: Surprisingly enough, despite huge research efforts invested in DNA array applications, very few works are devoted to computer-aided optimization of DNA array design and manufacturing. Current design practices are dominated by ad-hoc heuristics incorporated in proprietary tools with unknown suboptimality. This will soon become a bottleneck for the new generation of high-density arrays, such as the ones currently being designed at Perlegen [109]. The goal of the already accomplished research was to develop highly scalable tools, with predictable runtime and quality, for cost-effective, computer-aided design and manufacturing of DNA probe arrays. We illustrate the utility of our approach by taking a concrete example of combining the design tools of microarray technology for Harpes B virus DNA data

    Characterization of the longitudinal HIV-1 quasispecies evolution in HIV-1 infected individuals co-infected with Mycobacterium tuberculosis

    Get PDF
    One of the earliest and most striking observations made about HIV is the extensive genetic variation that the virus has within individual hosts, particularly in the hypervariable regions of the env gene which is divided into 5 variable regions (V1-V5) and 5 more constant (C1-C5) regions. HIV evolves at any time over the course of an individual’s infection and infected individuals harbours a population of genetically related but non-identical viruses that are under constant change and ready to adapt to changes in their environment. These genetically heterogeneous populations of closely related genomes are called quasispecies [65]. Tuberculosis or tubercle forming disease is an acute and/or chronic bacterial infection that primarily attacks the lungs, but which may also affect the kidneys, bones, lymph nodes, and brain. The disease is caused by Mycobacterium tuberculosis (MTB), a slow growing rod-shaped, acid fast bacterium. It is transmitted from person to person through inhalation of bacteria-carrying air droplets. Worldwide, one person out of three is infected with Mycobacterium tuberculosis – two billion people in total. TB currently holds the seventh place in the global ranking of causes of death [73]. In 2008, there were an estimated 9.4 (range, 8.9–9.9 million) million incident cases (equivalent to 139 cases per 100 000 population) of TB globally [75]. A complex biological interplay occurs between M. tuberculosis and HIV in coinfected host that results in the worsening of both pathologies. HIV promotes progression of M. tuberculosis either by endogenous reactivation or exogenous reinfection [77, 78] and, the course of HIV-1 infection is accelerated subsequent to the development of TB [80]. Active TB is associated with an increase in intra-patient HIV-1 diversity both systemically and at the infected lung sites [64,122]. The sustainability or reversal of the HIV-1 quasispecies heterogeneity after TB treatment is not known. Tetanus toxoid vaccinated HIV-1 infected patients developed a transient increase in HIV-1 heterogeneity which was reversed after few weeks [121]. Emergence of a heterogeneous HIV-1 population within a patient may be one of the mechanisms to escape strong immune or drug pressure [65,128]. The existence of better fitting and/or immune escape HIV-variants can lead to an increase in HIV-1 replication [129,130]. It might be that TB favourably selected HIV-1 variants which are sources for consistent HIV-1 replication. Understanding the mechanisms underlying the impacts of TB on HIV-1 is essential for the development of effective measures to reduce TB related morbidity and mortality in HIV-1 infected individuals. In the present study we studied whether the increase in HIV-1 quasispecies diversity during active TB is reversed or preserved throughout the course of antituberculous chemotherapy. For this purpose Two time point HIV-1 quasispecies were evaluated by comparing HIV-1 infected patients with active tuberculosis (HIV-1/TB) and HIV-1 infected patients without tuberculosis (HIV-1/non TB). Plasma samples were obtained from the Frankfurt HIV cohort and HIV-1 RNA was isolated. C2V5 env was amplified by PCR and molecular cloning was performed. Eight to twenty five clones were sequenced from each patient. Various phylogenetic analyses were performed including tree inferences, intra-patient viral diversity and divergence, selective pressure, co-receptor usage prediction and two time point identity of quasispecies comparison using Mantel’s test. We found out from this study that: 1) Active TB sustains HIV-1 quasispecies diversity for longer period 2. Active TB increases the rate of HIV-1 divergence 3) TB might slow down evolution of X4 variants And we concluded that active TB has an impact on HIV-1 viral diversity and divergence over time. The influence of active TB on longitudinal evolution of HIV- 1 may be predominant for R5 viruses. The use of CCR5-coreceptor inhibitors for HIV-1/TB patients as therapeutic approach needs further investigation.Eine der ersten und überraschenden Beobachtungen, welche bei der Analyse des HI-Virus gemacht wurden ist seine ausgeprägte Genetische Variabilität besonders die hypervariable Region des env Genes betreffen. Dieses wird in 5 variable Regionen (V1-V5) sowie 5 stärker konservierte Regionen (C1-C5) unterteilt. HIV wandelt sich zu jedem Zeitpunkt im Verlauf der Infektion und jedes infizierte Individuum ist Träger einer Population von genetisch verwandten jedoch nicht identischen Viren, welche sich kontinuierlich verändern und an die Erfordernisse innerhalb der Umgebung anpassen. Diese genetisch heterogenen, jedoch eng verwandten Populationen werden Quasispecies genannt. Tuberkulose ist eine mykobakterielle Infektion, welche sowohl akute als auch chronische Verläufe zeigt. Neben den Lungen als primärem Manifestationsort können auch die Nieren, Knochen und andere Organe befallen sein. Eine von drei Personen weltweit ist mit Mycobacterium tuberculosis infiziert, insgesamt 2 Milliarden Menschen. In HIV/TB Co-Inifzierten Menschen entsteht ein komplexes Zusammenspiel zwischen HIV und M. tuberculosis, welches zu einer Verschlechterung beider Krankheitsbilder führt. HIV führt durch endogene Rekativierung oder exogene Re-Infektion zu einer Progression der Tuberkulose, welche im weiteren Verlauf die Krankheitsprogression von HIV beschleunigt. Sowohl Morbidität als auch Mortalität sind in HIV-1/TB Co-Infizierten Menschen erhöht. Aktive Lungentuberkulose und Miliartuberkulose gehen mit dem Anstieg der Diversifität der HIV Viren innerhalb eines Wirtes einher. Wie lange diese erhöhte Heterogenität der HIV Quasispecies nach der erfolgreichen Behandlung einer Tuberkulose bestehen bleibt ist bisher noch unklar. Das Verständnis des dem Zusammenspiel von HIV und TB zugrundeliegenden Mechanismus ist essentiell für die Entwicklung von effektiven Massnahmen zur Senkung der Morbidität und Mortalität in HIV/TB Co-infizierten Menschen. Die gegenwärtige Forschungsarbeit folgte daher der Frage, ob wärend einer aktiven TB Infektion eine Zunahme der Diversität der HIV-1 Quasispecies zu beobachten ist und ob diese Diversität während einer TB Therapie erhalten bleibt oder sich zurück bildet. Hierfür wurden die HIV-1 Quasispecies zu zwei Zeitpunkten untersucht, wobei Proben von HIV-1 infizierten Patienten mit aktiver Tuberkulose (HIV-1/TB) und HIV infizierte Patienten ohne Tuberkulose (HIV-1/non TB) verglichen wurden. Aus Plasmaproben der Frankfurter HIV Cohorte wurde HIV-1 RNA isoliert. C2V5 env wurde durch PCR amplifiziert und molekular cloniert. Acht bis fünfundzwanzig Clone wurden für jeden Patienten sequenziert. Mehrere phylogenetische Analysen wurden durchgeführt, welche tree inferences, Intra-Patienten- und virale Diversität und Divergenz, Selektionsdruckanalysen, Vorhersage der Co-Rezeptornutzung sowie Zweipunktanalysen der Identität von Quasispecies mit Hilfe des Mantel’s Test miteinschlossen. Die Analysen ergaben die folgenden Ergebnisse: 1) Eine aktive TB erhält die Diversität von HIV-1 Quasispecies über einen längeren Zeitraum. 2. Eine aktive TB verstärkt die HIV -1 Divergenz 3) TB könnte zu einer langsameren Evolution von X4 Varianten führen. Schlussfolgerung: eine aktive TB beeinflusst die Entwicklung der viralen Diversität und Divergenz von HIV-1 im Verlauf der Krankheit. Der Einfluss der aktiven TB auf die longitudinale Evolution von HIV-1 könnte insbesondere R5 Viren betreffen. Der Einsatz von CCR5-Corezeptor Inhibitoren in HIV-1/TB coinifizerten Patienten sollte daher in Langzeitstudien untersucht werden

    Sequential Bottlenecks Drive Viral Evolution in Early Acute Hepatitis C Virus Infection

    Get PDF
    Hepatitis C is a pandemic human RNA virus, which commonly causes chronic infection and liver disease. The characterization of viral populations that successfully initiate infection, and also those that drive progression to chronicity is instrumental for understanding pathogenesis and vaccine design. A comprehensive and longitudinal analysis of the viral population was conducted in four subjects followed from very early acute infection to resolution of disease outcome. By means of next generation sequencing (NGS) and standard cloning/Sanger sequencing, genetic diversity and viral variants were quantified over the course of the infection at frequencies as low as 0.1%. Phylogenetic analysis of reassembled viral variants revealed acute infection was dominated by two sequential bottleneck events, irrespective of subsequent chronicity or clearance. The first bottleneck was associated with transmission, with one to two viral variants successfully establishing infection. The second occurred approximately 100 days post-infection, and was characterized by a decline in viral diversity. In the two subjects who developed chronic infection, this second bottleneck was followed by the emergence of a new viral population, which evolved from the founder variants via a selective sweep with fixation in a small number of mutated sites. The diversity at sites with non-synonymous mutation was higher in predicted cytotoxic T cell epitopes, suggesting immune-driven evolution. These results provide the first detailed analysis of early within-host evolution of HCV, indicating strong selective forces limit viral evolution in the acute phase of infection
    • …
    corecore