556 research outputs found

    Design and implementation of a cyberinfrastructure for RNA motif search, prediction and analysis

    Get PDF
    RNA secondary and tertiary structure motifs play important roles in cells. However, very few web servers are available for RNA motif search and prediction. In this dissertation, a cyberinfrastructure, named RNAcyber, capable of performing RNA motif search and prediction, is proposed, designed and implemented. The first component of RNAcyber is a web-based search engine, named RmotifDB. This web-based tool integrates an RNA secondary structure comparison algorithm with the secondary structure motifs stored in the Rfam database. With a user-friendly interface, RmotifDB provides the ability to search for ncRNA structure motifs in both structural and sequential ways. The second component of RNAcyber is an enhanced version of RmotifDB. This enhanced version combines data from multiple sources, incorporates a variety of well-established structure-based search methods, and is integrated with the Gene Ontology. To display RmotifDB’s search results, a software tool, called RSview, is developed. RSview is able to display the search results in a graphical manner. Finally, RNAcyber contains a web-based tool called Junction-Explorer, which employs a data mining method for predicting tertiary motifs in RNA junctions. Specifically, the tool is trained on solved RNA tertiary structures obtained from the Protein Data Bank, and is able to predict the configuration of coaxial helical stacks and families (topologies) in RNA junctions at the secondary structure level. Junction-Explorer employs several algorithms for motif prediction, including a random forest classification algorithm, a pseudoknot removal algorithm, and a feature ranking algorithm based on the gini impurity measure. A series of experiments including 10-fold cross- validation has been conducted to evaluate the performance of the Junction-Explorer tool. Experimental results demonstrate the effectiveness of the proposed algorithms and the superiority of the tool over existing methods. The RNAcyber infrastructure is fully operational, with all of its components accessible on the Internet

    Barry Smith an sich

    Get PDF
    Festschrift in Honor of Barry Smith on the occasion of his 65th Birthday. Published as issue 4:4 of the journal Cosmos + Taxis: Studies in Emergent Order and Organization. Includes contributions by Wolfgang Grassl, Nicola Guarino, John T. Kearns, Rudolf Lüthe, Luc Schneider, Peter Simons, Wojciech Żełaniec, and Jan Woleński

    Pharmacogenomics of sickle cell disease therapeutics: pain and drug metabolism associated gene variants and hydroxyurea-induced post-transcriptional expression of miRNAs

    Get PDF
    Sickle cell disease (SCD) is a common blood disease caused by a single nucleotide substitution (c.20T>A, p.Glu6Val) in the beta globin gene on chromosome 11. The prevalence of the disease is high throughout large areas in sub-Saharan Africa, the Mediterranean basin, the Middle East, and India due to the level of protection that the sickle cell trait, provides against severe malaria. Approximately 300,000 infants are born per year with sickle cell anemia, which is defined as homozygosity for the sickle hemoglobin (HbS). The majority (nearly 75%) of these births occur in sub-Saharan Africa, particularly in two countries: Nigeria, and the Democratic Republic of the Congo where there are poorly resourced healthcare systems. Early diagnosis, penicillin prophylaxis, blood transfusions, hydroxyurea, and hematopoietic stem-cell transplantation can dramatically improve survival and quality of life for patients with SCD. However, our understanding of the role of genetic and clinical factors in explaining the complex phenotypic diversity of this disease is still limited. Early prediction of the severity, and patients' responses to specific therapeutics of SCD could lead to more precise treatment and management. Beyond well-known modifiers of disease severity, such as fetal hemoglobin (HbF) levels and αthalassemia, other genetic variants might influence specific sub-phenotypes. New treatments and management strategies accounting for these genetic and nongenetic factors could substantially and rapidly improve the quality of life and reduce health care costs for patients with SCD. Patients with SCD are subjected to long term administration of drugs and there is a limited data on pharmacogenomics of SCD therapeutics. Vaso-occlusive crisis (VOC) are the main clinical events of SCD and are associated with recurrent and long-term use of antalgics/opioids and HU. This project aimed to investigate the clinical and genetic predictors of painful vaso-occlusive crisis (VOC) among SCD Cameroon patients by exploring pharmacokinetic determinants of treatment responses as well as post-transcriptional signatures triggered by hydroxyurea treatment, particularly, miRNA expression. SCD patients were recruited from Yaounde Central Hospital and Laquintinie Hospital in Douala (Wonkam et al., 2018, Mnika et al., 2019 (b)), and recent migrants SCD patients from the DRC, recruited at the Haematology Clinic, Groote Schuur Hospital in Cape Town, South Africa (Mnika et al., 2019 (a) and Mnika et al., 2019 (b)). Sociodemographic and clinical data were collected by means of a structured questionnaire. Patients' medical records were reviewed to extract their clinical features over the past 3 years. Specifically, the occurrences of VOC, hematological parameters, hospital outpatient visits, hospitalisation, overt strokes, blood transfusions, and administration of hydroxyurea were recorded. Height, weight, body mass index (BMI), systolic and diastolic blood pressures (SBP and DBP) were measured. Detailed descriptions of patients and sampling methods used in the Cameroonian patients have been reported previously (Wonkam et al., 2018 Mnika et al., 2019 (a) and Mnika et al., 2019 (b)). For the purpose of comparing frequencies of variants, ethnically matched Cameroonian controls were randomly recruited from apparently healthy blood donors in Yaounde for participation in the study. All blood samples were collected for genomic characterisation and analysis. DNA was extracted from peripheral blood, following instructions on the available commercial kit [QIAamp DNA Blood Maxi Kit ® (Qiagen, United States)]. Genotyping (TaqMan and MassArray) was performed for 40 variants in 17 pain-related genes, three fetal haemoglobin (HbF)-promoting loci, two kidney dysfunction-related genes, and HBA1/HBA2 genes for 436 patients. A subset of these samples was also genotyped to analyse 32 core and 267 extended pharmacogenes using commercially available PharmacoScan® platform for characterisation of pharmacokinetic determinant of response. We also compared the pharmacogenes variants from these African groups, to data extracted from the 1000 genomes Project. Moreover, association studies were carried out on pharmacogenes variants with SCD clinical variability. Additionally, protein-protein interaction (PPI) network and enriched biological processes and pathways were investigated. For association studies, statistical models using regression frameworks to analyse 40 variants were performed in R®. For miRNA expression, total RNA was isolated using the miRNeasy kit according to protocol of the Manufacturer (QIAGEN, Hilden, Germany); and sequenced by the Genomic and RNA Profiling Core at Baylor College of Medicine, United States, using the NanoString Platform (NanoString Technologies, Inc., Seattle, WA, United States), according to manufacturer's instructions. Genes with statistically significant changes in expression were analysed using the significance analyses of microarrays (SAM) tools. Female sex, body mass index, Hb/HbF, blood transfusions, leucocytosis and consultation or hospitalisation rates significantly correlated with VOC. Three painrelated gene variants correlated with VOC (CACNA2D3-rs6777055, P = 0·025; DRD2- rs4274224, P = 0·037; KCNS1-rs734784, P= 0·01). Five pain-related gene variants correlated with hospitalization/consultation rates (COMT-rs6269, P = 0·027; FAAHrs4141964, P = 0·003; OPRM1- rs1799971, P = 0·031; ADRB2-rs1042713; P < 0·001; UGT2B7-rs7438135, P = 0·037). The 3·7 kb HBA1/HBA2 deletion correlated with increased VOC (P = 0·002). HbF-promoting loci variants correlated with decreased hospitalisation (BCL11A-rs4671393, P = 0·026; HBS1L-MYB-rs28384513, P = 0·01). APOL1 G1/G2 correlated with increased hospitalisation (P = 0·048). A commercial genotyping array platform (PharmacoScan®) with 4627 markers located in 1191 genes was used to investigate 299 pharmacogenes (32 ADME core and 267 extended pharmacogenes). Based on the PharmacoScan analyses, no statistically significant differences in allele frequencies were detected between SCD cases and controls from Cameroon. A principal component analysis (PCA) revealed that Cameroonians' data clustered with other Africans, but this population is significantly distinct from American, European and Asian populations data. Variant allele frequencies in 21/32 core pharmacogenes were significantly different between the two SCD groups (Cameroon vs. Congo). No correlation between clinical variability and variants in the core genes was detected for both populations under study. An association study of the core and extended PharmacoScan variants to VOC identified statistically significant associations between two single nucleotide polymorphisms (SNPs) to VOC after correction of multiple testing. These two SNPs mapped to 50 genes, with two SNPs located in core pharmacogenes (SLCO4A1- rs118042746, p=1.21e-07; UGT1A10, UGT1A8- rs10176426, p=1.22e-07). Functional enrichment analyses revealed that these 50 genes are involved in three biological processes and four pathways relevant to SCD pathophysiology, including xenobiotic glucuronidation (GO:0052697, p = 2.3e-03), and drug metabolism - other enzymes (p = 2.1e-02). Further analyses of the 50 genes, identified key genes in human proteinprotein networks: NTSR1, LRMDA, SMAD SMAD4 and CDH2. These four genes also interacted with three core pharmacogenes associated with VOC: UGT1A8, UGT1A10 and SLCO4A1. We found 22/798 miRNAs to be differentially expressed under HU treatment, with the majority (13/22) being functionally associated with HbF-regulatory genes, including BCL11A (miR-148b-3p, miR-32-5p, miR-340-5p, miR-29c-3p), MYB (miR-105-5p), KLF-3 (miR-106b-5), and SP1 (miR-29b-3p, miR-625-5p, miR-324-5p, miR-125a-5p, miR-99b-5p, miR-374b-5p, miR-145-5p). The present thesis started by highlighting the scarcity of studies investigating variable responses to pain in SCD patients and then proceeded to addressing this research gap. To our knowledge this is the first body of from Africa to provide evidence supporting the possible development of a genetic risk model for pain in SCD. This is also the first body of work to report an association between these two SNPs and VOC in core and extended pharmacogenes. Our data reveals that the commercial pharmacogenes arrays investigated might need additional evidence for appropriateness among Africans. Therefore, it advocates the need to invest in research exploring population-specific arrays, drug design, targeting, and efficacy, for improved clinical management of patients of African descent. Previous studies have investigated various mechanisms to understand the genomic variations affecting responses to HU, but full understanding of the variable HU-mediated HbF production among individuals affected by SCD remains elusive. The present study showed that mechanisms of HbF production in response to HU, could particularly be mediated through miRNA regulation. The data reveals some alternative perspectives and routes towards identifying new therapeutic targets and approaches for SCD. However, this study needs to be replicated in larger samples in multiple African populations

    Understanding the Code of Life: Holistic Conceptual Modeling of the Genome

    Full text link
    [ES] En las últimas décadas, los avances en la tecnología de secuenciación han producido cantidades significativas de datos genómicos, hecho que ha revolucionado nuestra comprensión de la biología. Sin embargo, la cantidad de datos generados ha superado con creces nuestra capacidad para interpretarlos. Descifrar el código de la vida es un gran reto. A pesar de los numerosos avances realizados, nuestra comprensión del mismo sigue siendo mínima, y apenas estamos empezando a descubrir todo su potencial, por ejemplo, en áreas como la medicina de precisión o la farmacogenómica. El objetivo principal de esta tesis es avanzar en nuestra comprensión de la vida proponiendo una aproximación holística mediante un enfoque basado en modelos que consta de tres artefactos: i) un esquema conceptual del genoma, ii) un método para su aplicación en el mundo real, y iii) el uso de ontologías fundacionales para representar el conocimiento del dominio de una forma más precisa y explícita. Las dos primeras contribuciones se han validado mediante la implementación de sistemas de información genómicos basados en modelos conceptuales. La tercera contribución se ha validado mediante experimentos empíricos que han evaluado si el uso de ontologías fundacionales conduce a una mejor comprensión del dominio genómico. Los artefactos generados ofrecen importantes beneficios. En primer lugar, se han generado procesos de gestión de datos más eficientes, lo que ha permitido mejorar los procesos de extracción de conocimientos. En segundo lugar, se ha logrado una mejor comprensión y comunicación del dominio.[CA] En les últimes dècades, els avanços en la tecnologia de seqüenciació han produït quantitats significatives de dades genòmiques, fet que ha revolucionat la nostra comprensió de la biologia. No obstant això, la quantitat de dades generades ha superat amb escreix la nostra capacitat per a interpretar-los. Desxifrar el codi de la vida és un gran repte. Malgrat els nombrosos avanços realitzats, la nostra comprensió del mateix continua sent mínima, i a penes estem començant a descobrir tot el seu potencial, per exemple, en àrees com la medicina de precisió o la farmacogenómica. L'objectiu principal d'aquesta tesi és avançar en la nostra comprensió de la vida proposant una aproximació holística mitjançant un enfocament basat en models que consta de tres artefactes: i) un esquema conceptual del genoma, ii) un mètode per a la seua aplicació en el món real, i iii) l'ús d'ontologies fundacionals per a representar el coneixement del domini d'una forma més precisa i explícita. Les dues primeres contribucions s'han validat mitjançant la implementació de sistemes d'informació genòmics basats en models conceptuals. La tercera contribució s'ha validat mitjançant experiments empírics que han avaluat si l'ús d'ontologies fundacionals condueix a una millor comprensió del domini genòmic. Els artefactes generats ofereixen importants beneficis. En primer lloc, s'han generat processos de gestió de dades més eficients, la qual cosa ha permés millorar els processos d'extracció de coneixements. En segon lloc, s'ha aconseguit una millor comprensió i comunicació del domini.[EN] Over the last few decades, advances in sequencing technology have produced significant amounts of genomic data, which has revolutionised our understanding of biology. However, the amount of data generated has far exceeded our ability to interpret it. Deciphering the code of life is a grand challenge. Despite our progress, our understanding of it remains minimal, and we are just beginning to uncover its full potential, for instance, in areas such as precision medicine or pharmacogenomics. The main objective of this thesis is to advance our understanding of life by proposing a holistic approach, using a model-based approach, consisting of three artifacts: i) a conceptual schema of the genome, ii) a method for its application in the real-world, and iii) the use of foundational ontologies to represent domain knowledge in a more unambiguous and explicit way. The first two contributions have been validated by implementing genome information systems based on conceptual models. The third contribution has been validated by empirical experiments assessing whether using foundational ontologies leads to a better understanding of the genomic domain. The artifacts generated offer significant benefits. First, more efficient data management processes were produced, leading to better knowledge extraction processes. Second, a better understanding and communication of the domain was achieved.Las fructíferas discusiones y los resultados derivados de los proyectos INNEST2021 /57, MICIN/AEI/10.13039/501100011033, PID2021-123824OB-I00, CIPROM/2021/023 y PDC2021- 121243-I00 han contribuido en gran medida a la calidad final de este tesis.García Simón, A. (2022). Understanding the Code of Life: Holistic Conceptual Modeling of the Genome [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/19143

    Denatured: Emergent realities of encyclopedic DNA elements

    Get PDF
    The Human Genome Project was the center of much controversy in the 1990\u27s, as creating a map of the human genome drew into question the boundaries between nature and nurture, or science and society. Fifteen years have now passed since the Human Genome Project\u27s completion, and the new paradigm of genetics is no longer governed by a strict nature/nurture dualism. This project looks at one of the Human Genome Project\u27s successors: the Encyclopedia of DNA Elements (ENCODE) project, which has created new boundaries and limitations in this new phase of genetic thinking. Using a frame analysis and Actor-Network Theory approach to follow how ENCODE has formed and reformed over the years, this project traces the ENCODE project as a new way of translating genetic code from the cell to the world around it, and ultimately back into the cell. Throughout these processes, the ENCODE project brings into question the meaning of human, creates a platform for viewing the genome as a moldable substance, and ultimately presents itself as the end of human disease

    Generation and Applications of Knowledge Graphs in Systems and Networks Biology

    Get PDF
    The acceleration in the generation of data in the biomedical domain has necessitated the use of computational approaches to assist in its interpretation. However, these approaches rely on the availability of high quality, structured, formalized biomedical knowledge. This thesis has the two goals to improve methods for curation and semantic data integration to generate high granularity biological knowledge graphs and to develop novel methods for using prior biological knowledge to propose new biological hypotheses. The first two publications describe an ecosystem for handling biological knowledge graphs encoded in the Biological Expression Language throughout the stages of curation, visualization, and analysis. Further, the second two publications describe the reproducible acquisition and integration of high-granularity knowledge with low contextual specificity from structured biological data sources on a massive scale and support the semi-automated curation of new content at high speed and precision. After building the ecosystem and acquiring content, the last three publications in this thesis demonstrate three different applications of biological knowledge graphs in modeling and simulation. The first demonstrates the use of agent-based modeling for simulation of neurodegenerative disease biomarker trajectories using biological knowledge graphs as priors. The second applies network representation learning to prioritize nodes in biological knowledge graphs based on corresponding experimental measurements to identify novel targets. Finally, the third uses biological knowledge graphs and develops algorithmics to deconvolute the mechanism of action of drugs, that could also serve to identify drug repositioning candidates. Ultimately, the this thesis lays the groundwork for production-level applications of drug repositioning algorithms and other knowledge-driven approaches to analyzing biomedical experiments

    Automated retrieval and extraction of training course information from unstructured web pages

    Get PDF
    Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, Naïve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance

    A teachable semi-automatic web information extraction system based on evolved regular expression patterns

    Get PDF
    This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements

    Epigenomics of Cell Fate in Development and Disease

    Get PDF
    Epigenetic features at regulatory elements provide instructive cues for transcriptional regulation during development. However, the particular epigenetic alterations necessary for proper cell fate acquisition and differentiation are not well understood. This dissertation explores the epigenetic dynamics of regulatory elements during development and uses epigenome annotations to document inappropriate transcriptional regulation in disease. First, I summarize my contributions to developing a new algorithm for detecting differential DNA methylation, M&M. I report the application of the M&M algorithm to identify distinct classes of DNA methylation dynamics in surface ectoderm (SE) progenitor cells and SE-derived lineages: epigenome alterations, and differential DNA methylation in particular, that are present in progenitor cells are transmitted to daughter cells and consequently observed in differentiated cells. I exploit this property of DNA methylation to characterize DNA methylation dynamics in surface ectoderm embryonic tissue and SE-derived cells. Next, I use zebrafish to investigate the biological relevance of the classes of DNA methylation dynamics described in the SE context. In zebrafish, I use the pigment cell development system to understand the contribution of DNA methylation to a particular cell fate choice: melanocyte or iridophore cell fate. Next, I investigate the consequence of somatic mutations in primary liver cancer by utilizing epigenomic annotations of human tissues to distinguish putatively functional mutations from passenger mutations. Here I present support for the hypothesis that transcriptional regulatory instructions for heterologous cell types are co-opted by cancer cells during malignant tumorigenesis. Finally I present a review of the evolution of epigenetic regulation over regulatory elements. Altogether, this dissertation advances our understanding of epigenetic regulation in cell fate decisions by integrating functional genomics with developmental biology and cancer genetics
    corecore