46 research outputs found

    Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping

    Get PDF
    Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times with at most one joker character per k-mer. By running our algorithm on a range of parameters and alphabets, we show that JokerCAKE produces near-optimal sequences. Moreover, through comparison with data from hundreds of DNA-protein binding experiments and with new experimental results for both standard and JokerCAKE libraries, we establish that accurate binding scores can be inferred for high-affinity k-mers using JokerCAKE libraries. JokerCAKE libraries allow researchers to search a significantly larger sequence space using the same number of experimental measurements and at the same cost. We present a new compact sequence design that covers all k-mers utilizing joker characters and develop an efficient algorithm to generate such designs. We show through simulations and experimental validation that these sequence designs are useful for identifying high-affinity binding sites at significantly reduced cost and space. Keywords: sequence libraries; microarray design; de Bruijn graphNational Institutes of Health (U.S.) (Grant R01GM081871

    BIOINFORMATIC TOOLS FOR NEXT GENERATION GENOMICS

    Get PDF
    New sequencing strategies have redefined the concept of \u201chigh-throughput sequencing\u201d and many companies, researchers, and recent reviews use the term \u201cNext-Generation Sequencing\u201d (NGS) instead of high-throughput sequencing. These advances have introduced a new era in genomics and bioinformatics\u2060\u2060. During my years as PhD student I have developed various software, algorithms and procedures for the analysis of Nest Generation sequencing data required for distinct biological research projects and collaborations in which our research group was involved. The tools and algorithms are thus presented in their appropriate biological contexts. Initially I dedicated myself to the development of scripts and pipelines which were used to assemble and annotate the mitochondrial genome of the model plant Vitis vinifera. The sequence was subsequently used as a reference to study the RNA editing of mitochondrial transcripts, using data produced by the Illumina and SOLiD platforms. I subsequently developed a new approach and a new software package for the detection of of relatively small indels between a donor and a reference genome, using NGS paired-end (PE) data and machine learning algorithms. I was able to show that, suitable Paired End data, contrary to previous assertions, can be used to detect, with high confidence, very small indels in low complexity genomic contexts. Finally I participated in a project aimed at the reconstruction of the genomic sequences of 2 distinct strains of the biotechnologically relevant fungus Fusarium. In this context I performed the sequence assembly to obtain the initial contigs and devised and implemented a new scaffolding algorithm which has proved to be particularly efficient

    Transcriptional Regulatory Logic of Cilium Formation in C. Elegans

    Full text link
    [ES] Los cilios son estructuras eucariotas complejas conservadas evolutivamente que, proyectando desde la superficie de las células, desempeñan un gran número de funciones biológicas. Los cilios se clasifican tradicionalmente en móviles o sensoriales y en su composición intervienen cientos de proteínas. Este conjunto de genes que codifican para los componentes ciliares se conoce como cilioma. Las mutaciones en el cilioma subyacen a un grupo cada vez mayor de enfermedades multisistémicas altamente pleiotrópicas denominadas globalmente como ciliopatías. Estas enfermedades se caracterizan, entre otros síntomas, por retraso mental, defectos sensoriales y/o trastornos metabólicos. A pesar de que se estima que 1 de cada 1.000 personas está afectada por estas enfermedades, las bases moleculares de las ciliopatías son todavía poco conocidas. El adecuado ensamblaje y funcionalidad del cilio requieren de la expresión estrechamente coordinada de los componentes del cilio; sin embargo, se sabe poco sobre la lógica reguladora que controla la transcripción del cilioma. La mayoría de los genes del cilioma son compartidos tanto por cilios móviles como sensoriales. Los factores de transcripción (FTs) de la familia RFX tienen un papel evolutivamente conservado en la regulación transcripcional del cilioma tanto móvil como sensorial. En los vertebrados, la transcripción del cilioma móvil también está regulada directamente por FoxJ1, un FT de la familia forkhead (FKH). Sin embargo, hasta la fecha, se desconocen los FTs que actúan junto a RFX en la transcripción del cilioma sensorial en cualquier organismo. En este trabajo, hemos identificado a FKH-8, un FT de la familia FKH, como selector terminal del cilioma sensorial de C. elegans. fkh-8 se expresa de forma consistente en las sesenta neuronas sensoriales ciliadas de C. elegans, se une a las regiones reguladoras de los genes del cilioma sensorial, también es necesario para la correcta expresión de los genes del cilioma y actúa de forma sinérgica con el conocido regulador maestro de la ciliogénesis DAF19/RFX. En consecuencia, los mutantes para fkh-8 muestran una amplia gama de defectos de comportamiento en una plétora de paradigmas sensoriales, incluyendo la olfacción, la gustación y la mecano-sensación. Así, hemos identificado, por primera vez, un FT que actúa junto con los FTs de la familia RFX en la regulación directa del cilioma sensorial. Además, nuestros resultados, junto con trabajos anteriores, muestran que los FTs FKH y RFX actúan conjuntamente en la regulación de los cilios tanto móviles como sensoriales, lo que sugiere que esta lógica reguladora podría ser un rasgo evolutivo antiguo anterior a la subespecialización funcional de los cilios. Finalmente, esperamos que los resultados de nuestro trabajo ayuden a entender mejor las bases biológicas de las ciliopatías huérfanas[CA] Els cilis són estructures eucariotes complexes conservades evolutivament que, projectant des de la superfície de les cèl·lules, exerceixen un gran nombre de funcions biològiques. Els cilis es classifiquen tradicionalment en mòbils o sensorials i en la seua composició intervenen centenars de proteïnes. Aquest conjunt de gens que codifiquen per als components ciliars es coneix com el cilioma. Les mutacions en el cilioma subjauen a un grup cada vegada major de malalties multisistèmiques altament pleiotròpiques denominades globalment com ciliopaties. Aquestes malalties es caracteritzen, entre altres símptomes, per retard mental, defectes sensorials i/o trastorns metabòlics. A pesar que s'estima que 1 de cada 1.000 persones està afectada per aquestes malalties, les bases moleculars de les ciliopaties són encara poc conegudes. L'adequat assemblatge i funcionalitat del cili requereixen de l'expressió estretament coordinada dels components del cili; no obstant això, se sap poc sobre la lògica reguladora que controla la transcripció del cilioma. La majoria dels gens del cilioma són compartits tant per cilis mòbils com sensorials. Els factors de transcripció (FTs) de la família RFX tenen un paper evolutivament conservat en la regulació transcripcional del cilioma tant mòbil com sensorial. En els vertebrats, la transcripció del cilioma mòbil també està regulada directament per FoxJ1, un FT de la família forkhead (FKH). No obstant això, fins hui, es desconeixen els FTs que actuen al costat de RFX en la transcripció del cilioma sensorial en qualsevol organisme. En aquest treball, hem identificat a FKH-8, un FT de la família FKH, com a selector terminal del cilioma sensorial de C. elegans. fkh-8 s'expressa de manera consistent en les seixanta neurones sensorials ciliades de C. elegans, s'uneix a les regions reguladores dels gens del cilioma sensorial, també és necessari per a la correcta expressió dels gens del cilioma i actua de manera sinèrgica amb el conegut regulador mestre de la ciliogènesi DAF-19/RFX. En conseqüència, els mutants per a fkh-8 mostren una àmplia gamma de defectes de comportament en una plètora de paradigmes sensorials, incloent la olfacció, la gustació i la mecano-sensació. Així, hem identificat, per primera vegada, un FT que actua juntament amb els FTs de la família RFX en la regulació directa del cilioma sensorial. A més, els nostres resultats, juntament amb treballs anteriors, mostren que els FTs FKH i RFX actuen conjuntament en la regulació dels cilis tant mòbils com sensorials, la qual cosa suggereix que aquesta lògica reguladora podria ser un tret evolutiu antic anterior a la subespecialització funcional dels cilis. Finalment, esperem que els resultats del nostre treball ajuden a entendre millor les bases biològiques de les ciliopaties òrfenes.[EN] Cilia are complex evolutionary conserved eukaryotic structures that, projecting from cell surfaces, perform a variety of biological roles. Cilia are traditionally classified into motile or sensory and hundreds of proteins take part in their composition. This set of genes coding for ciliary components is known as the ciliome. Mutations in the ciliome underlie an ever-growing group of highly pleiotropic multisystemic diseases globally termed as ciliopathies. These diseases are characterized, among other symptoms, by mental retardation, sensory defects and/or metabolic disorders. Despite an estimated 1 in 1,000 people affected by these diseases, the molecular bases of the ciliopathies are still poorly understood. Proper cilium assembly and functionality requires the tightly co-regulated expression of ciliary components; however, little is known about the regulatory logic controlling ciliome transcription. Most ciliome genes are shared between motile and sensory cilia. RFX transcription factors (TFs) have an evolutionarily conserved role in the transcriptional regulation of both motile and sensory ciliome. In vertebrates, transcription of motile ciliome is also directly regulated by FoxJ1, a Forkhead (FKH) TF. However, to date, TFs working together with RFX in the transcription of the sensory ciliome are unknown in any organism. In this work, we have identified FKH-8, a FKH TF, as a terminal selector of the sensory ciliome in C. elegans. fkh-8 is consistently expressed within the sixty ciliated sensory neurons of C. elegans, it binds the regulatory regions of the sensory ciliome genes, it is also required for correct ciliome gene expression and acts synergistically with the known master regulator of the ciliogenesis DAF-19/RFX. Accordingly, fkh-8 mutants display a wide range of behavioural defects in a plethora of sensory mediated paradigms, including olfaction, gustation, and mechano-sensation. Thus, we have identified, for the first time, a TF that acts together with RFX TFs in the direct regulation of the sensory ciliome. Moreover, our results, together with previous work, show that FKH and RFX TFs act together in the regulation of both motile and sensory cilia, suggesting this regulatory logic could be an ancient trait pre-dating functional sub-specialization of cilia. Finally, we hope our results could help better understand the biological basis of orphan ciliopathies.This thesis project has been made possible thanks to a pre-doctoral fellowship from the FPI Programme (BES-2015-072799) conferred by the (now extinct) Spanish Ministry of Economy & Competitivity. The following grants also provided a funding frame throughout the whole research process: “Estudio de los mecanismos transcripcionales que regulan la diferenciación de las neuronas monoaminérgicas y su conservación evolutiva.” SAF2014-56877-R “Dissecting the gene regulatory mechanisms that generate serotonergic neurons and their link to mental disorders.” ERC-St 281920 “Programas de regulación transcripcional asociados a enfermedades genéticas.” SAF2017-84790-R “Regulatory rules and evolution of neuronal gene expression.” ERC-Co 101002203Brocal Ruiz, R. (2022). Transcriptional Regulatory Logic of Cilium Formation in C. Elegans [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181667TESI

    Computational Methods for Inferring Transcriptome Dynamics

    Get PDF
    The sequencing of the human genome paved the way for a new type of medicine, in which a molecular-level, cell-by-cell understanding of the genomic control system informs diagnosis and treatment. A key experimental approach for achieving such understanding is measuring gene expression dynamics across a range of cell types and biological conditions. The raw outputs of these experiments are millions of short DNA sequences, and computational methods are required to draw scientific conclusions from such experimental data. In this dissertation, I present computational methods to address some of the challenges involved in inferring dynamic transcriptome changes. My work focuses two types of challenges: (1) discovering important biological variation within a population of single cells and (2) robustly extracting information from sequencing reads. Three of the methods are designed to identify biologically relevant differences among a heterogenous mixture of cells. SingleSplice uses a statistical model to detect true biological variation in alternative splicing within a population of single cells. SLICER elucidates transcriptome changes during a sequential biological process by positing the process as a nonlinear manifold embedded in high-dimensional gene expression space. MATCHER uses manifold alignment to infer what multiple types of single cell measurements obtained from different individual cells would look like if they were performed simultaneously on the same cell. These methods gave insight into several important biological systems, including embryonic stem cells and cardiac fibroblasts undergoing reprogramming. To enable study of the pseudogene ceRNA effect, I developed a computational method for robustly computing pseudogene expression levels in the presence of high sequence similarity that confounds sequencing read alignment. AppEnD, an algorithm for detecting untemplated additions, allowed the study of transcript modifications during RNA degradation.Doctor of Philosoph

    Genotype-Phenotype Maps in Complex Living Systems

    Full text link

    Alternative isoform regulation in myotonic dystrophy

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Program in Health Sciences and Technology, 2012.Cataloged from PDF version of thesis.Includes bibliographical references.Myotonic dystrophy (DM) is the most common form of adult onset muscular dystrophy, affecting more than 1 in 8000 individuals globally. The symptoms of DM are multi-systemic and include myotonia, severe muscle wasting, cardiac arrhythmias, cataracts, gastrointestinal dysfunction, and cognitive deficits. DM is caused by the expansion of CTG or CCTG repeat sequences expressed in noncoding portions of RNA, which sequester or activate RNA splicing factor proteins, leading to widespread deleterious changes in transcriptome isoform usage. We developed a method for studying transcriptomes, RNAseq, which provides a high resolution, digital inventory of gene and isoform expression. By applying RNAseq to human tissues and cell lines, we discovered that essentially 92-94% of all human genes are alternatively spliced, 86% of them with a minor isoform frequency 15% or more. We found that the majority of alternative splicing and alternative polyadenylation and cleavage events are tissue-regulated, and that patterns of these RNA processing events are strongly correlated across tissues, implicating protein factors that may regulate both types of events. We applied this method towards the goal of identifying transcriptome changes occurring in DM, focusing on the Muscleblind-like (MBNL) family of RNA binding proteins, which are functionally inactivated by CUG or CCUG repeats. Using RNAseq to profile tissues and cells depleted of MBNLs, we found that MBNL1 and MBNL2 co-regulate hundreds of redundant targets. MBNL1 UV cross-linking and immunoprecipitation, followed by sequencing (CLIPseq), was used to identify the in vivo transcriptome-wide binding locations of MBNL1, and facilitated the construction of a context-dependent RNA map for MBNL1 splicing regulation. Extensive 3' UTR binding of MBNL1 was found to localize mRNAs to membrane compartments of mouse myoblasts, suggesting a new global function for MBNLs, and additional mechanisms by which MBNL depletion can lead to DM symptoms.by Eric T. Wang.Ph.D
    corecore