70 research outputs found

    Network-based approach for post genome-wide association study analysis in admixed populations

    Get PDF
    Includes abstract.Includes bibliographical references.In this project, we review some existing pathway-based approaches for GWA study analyses, by exploring different implemented methods for combining effects of multiple modest genetic variants at gene and pathway levels. We then propose a graph-based method, ancGWAS, that incorporates the signal from GWA study, and the locus-specific ancestry into the human protein-protein interaction (PPI) network to identify significant sub-networks or pathways associated with the trait of interest. This network-based method applies centrality measures within linkage disequilibrium (LD) on the network to search for pathways and applies a scoring summary statistic on the resulting pathways to identify the most enriched pathways associated with complex diseases

    Deciphering the genetic background of quantitative traits using machine learning and bioinformatics frameworks

    Get PDF
    In dieser Doktorarbeit habe ich zwei AnsĂ€tze verfolgt, mit denen genetische Mechanismen, welche quantitativen Merkmalen zugrunde liegen, aufgezeigt und bestimmt werden können. In diesem Zusammenhang lag mein Fokus auf der Entwicklung effizienter Methoden um Genotyp-PhĂ€notyp Assoziationen zu identifizieren. Durch diese lassen sich im Weiteren regulatorische Mechanismen beschreiben, welche phĂ€notypische Unterschiede zwischen Individuen verursachen. Im ersten Ansatz habe ich SchlĂŒsselmechanismen der Genregulation untersucht, welche die Entwicklung der Bruchfestigkeit von Eierschalen steuern. Das Ziel war es zeitliche Unterschiede der Signalkaskaden, welche die Eierschalen Bruchfestigkeit im Verlauf eines Vogellebens regulieren, zu detektieren. HierfĂŒr habe ich die Bruchfestigkeit zu zwei verschiedenen Zeitpunkten innerhalb eines Produktionszyklus betrachtet und die Genotyp-PhĂ€notyp Assoziationen mithilfe eines Random Forest-Algorithmus bestimmt. FĂŒr die Analyse der entsprechenden Gene wurde ein etablierter systembiologischer Ansatz verfolgt, mit dem genregulatorische Pathways und Master-Regulatoren identifiziert werden konnten. Meine Ergebnisse zeigen, dass einige Pathways und Master-Regulatoren (z.B. Slc22a1 und Sox11) gleichzeitig in verschiedenen Legephasen identifiziert wurden, andere (z.B. Scn11a, St8sia2 oder der TGF-beta Pathway) speziell in lediglich einer Phase gefunden wurden. Sie stellen somit altersspezifische Mechanismen dar.Insgesamt liefern meine Ergebnisse (i) signifikante Einblicke in altersspezifische und allgemeine molekulare Mechanismen, welche die Eierschalen-Bruchfestigkeit regulieren und bestimmen; und (ii) neue Zuchtziele, um die BruchstĂ€rke von Eierschalen vor allem in spĂ€teren Legephasen zu erhöhen und somit die Eierschalen QualitĂ€t zu verbessern. In meinem zweitem Ansatz, habe ich die Methode der Random Forests mit einer Strategie zur Signaldetektierung kombiniert, um robuste Genotyp-PhĂ€notyp-Beziehungen zu identifizieren. Ziel dieses Ansatzes war die Verbesserung der Effizienz der Einzel-SNP basierten Assoziationsanalyse. Genomweite Assoziationsstudien (GWAS) sind ein weit verbreiteter Ansatz zur Identifikation genomischer Varianten und Genen, die verantwortlich sind fĂŒr Merkmale, welche von Interesse sowohl fĂŒr den akademischen als auch den wirtschaftlichen Sektor sind. Trotz des langjĂ€hrigen Einsatzes verschiedener GWAS-Methoden stellt die zuverlĂ€ssige Identifikation von Genotyp-PhĂ€notyp-Beziehungen noch immer eine Herausforderung fĂŒr viele quantitative Merkmale dar. Dies wird hauptsĂ€chlich durch die große Anzahl genomischer Loci begrĂŒndet, welche lediglich einen schwachen Effekt auf das zu untersuchende Merkmal haben. Daher lĂ€sst sich Hypothese aufstellen, dass genomische Varianten, welche zwar einen geringen, aber dennoch realen Einfluss ausĂŒben, in vielen GWAS-AnsĂ€tzen unentdeckt bleiben. Zur Behandlung dieser UnzulĂ€nglichkeiten wird in der Arbeit ein zweistufiges Verfahren verwendet. ZunĂ€chst werden kubische Splines fĂŒr Teststatistiken und genomische Regionen angepasst. Die Spline-Maxima, welche höher als die zu erwartenden zufallsbasierten Maximalwerte ausfallen, werden als quantitative Merkmals-Loci (QTL) eingestuft. Anschließend werden die SNPs in diesen QTLs, basierend auf ihrer AssoziationsstĂ€rke mit den PhĂ€notypen, durch einen Random Forests-Ansatz priorisiert. Im Rahmen einer Fallstudie haben wir unseren Ansatz auf reale DatensĂ€tze angewendet und eine plausible Anzahl, teilweise neuartiger, genomischer Varianten und Genen identifiziert, welche verschiedenen QualitĂ€tsmerkmalen zugrunde liegen.In this thesis, I developed two frameworks that can help highlight the genetic mechanisms underlying quantitative traits. In this regard, my focus was to design efficient methodologies to discover genotype-phenotype associations and then use these identified associations to describe the regulatory mechanism that affects the manifestation of phenotypic differences among the individuals. In the first framework, I investigated key regulatory mechanisms governing the development of eggshell strength. The aim was to highlight the temporal changes in the signaling cascades governing the dynamic eggshell strength during the life of birds. I considered chicken eggshell strength at two different time points during the egg production cycle and studied the genotype-phenotype associations by employing the Random Forest algorithm on genotypic data. For the analysis of corresponding genes, a well established systems biology approach was adopted to delineate gene regulatory pathways and master regulators underlying this important trait. My results indicate that, while some of the master regulators (Slc22a1 and Sox11) and pathways are common at different laying stages of chicken, others (e.g., Scn11a, St8sia2, or the TGF-beta pathway) represent age-specific functions. Overall, my results provide: (i) significant insights into age-specific and common molecular mechanisms underlying the regulation of eggshell strength; and (ii) new breeding targets to improve the eggshell quality during the later stages of the chicken production cycle. In my second framework, I combined the Random Forests and a signal detection strategy to identify robust genotype-phenotype associations. The objective of this framework was to improve on the efficiency of single-SNP based association analysis. Genome wide association studies (GWAS) are a well established methodology to identify genomic variants and genes that are responsible for traits of interest in all branches of the life sciences. Despite the long time this methodology has had to mature the reliable detection of genotype-phenotype associations is still a challenge for many quantitative traits mainly because of the large number of genomic loci with weak individual effects on the trait under investigation. Thus, it can be hypothesized that many genomic variants that have a small, however real, effect~remain unnoticed in many GWAS approaches. Here, we propose a two-step procedure to address this problem. In a first step, cubic splines are fitted to the test statistic values and genomic regions with spline-peaks that are higher than expected by chance are considered as quantitative trait loci (QTL). Then the SNPs in these QTLs are prioritized with respect to the strength of their association with the phenotype using a Random Forests approach. As a case study, we apply our procedure to real data sets and find trustworthy numbers of, partially novel, genomic variants and genes involved in various egg quality traits.2021-10-1

    Consequences of refining biological networks through detailed pathway information : From genes to proteoforms

    Get PDF
    Biologiske nettverk kan brukes til Ä modellere molekylÊre prosesser, forstÄ sykdomsprogresjon og finne nye behandlingsstrategier. Denne avhandlingen har undersÞkt hvordan utformingen av slike nettverk pÄvirker deres struktur, og hvordan dette kan benyttes til Ä forbedre spesifisiteten for pÄfÞlgende analyser av slike modeller. Det fÞrste som ble undersÞkt var potensialet ved Ä bruke mer detaljerte molekylÊre data nÄr man modellerer humane biokjemiske reaksjonsnettverk. Resultatene bekrefter at det er nok informasjon om proteoformer, det vil si proteiner i spesifikke post-translasjonelle tilstander, for systematiske analyser og viste ogsÄ store forskjeller i strukturen mellom en gensentrisk og en proteoformsentrisk representasjon. Deretter utviklet vi programmatisk tilgang og sÞk i slike nettverk basert pÄ ulike typer av biomolekyler, samt en generisk algoritme som muliggjÞr fleksibel kartlegging av eksperimentelle data knyttet til den teoretiske representasjonen av proteoformer i referansedatabaser. Til slutt ble det konstruert sÄkalte pathway-spesifikke nettverk ved bruk av ulike detaljnivÄer ved representasjonen av biokjemiske reaksjoner. Her ble informasjon som vanligvis blir oversett i standard nettverksrepresentasjoner inkludert: smÄ molekyler, isoformer og modifikasjoner. Strukturelle egenskaper, som nettverksstÞrrelse, graddistribusjon og tilkobling i bÄde globale og lokale undernettverk, ble deretter analysert for Ä kvantifisere virkningene av endringene.Biological networks can be used to model molecular processes, understand disease progression, and find new treatment strategies. This thesis investigated how refining the design of biological networks influences their structure, and how this can be used to improve the specificity of pathway analyses. First, we investigate the potential to use more detailed molecular data in current human biological pathways. We verified that there are enough proteoform annotations, i.e. information about proteins in specific post-translational states, for systematic analyses and characterized the structure of gene-centric versus proteoform-centric network representations of pathways. Next, we enabled the programmatic search and mining of pathways using different models for biomolecules including proteoforms. We notably designed a generic proteoform matching algorithm enabling the flexible mapping of experimental data to the theoretic representation in reference databases. Finally, we constructed pathway-based networks using different degrees of detail in the representation of biochemical reactions. We included information overlooked in most standard network representations: small molecules, isoforms, and post-translational modifications. Structural properties such as network size, degree distribution, and connectivity in both global and local subnetworks, were analysed to quantify the impact of the added molecular entities.Doktorgradsavhandlin

    Attention is more than prediction precision [Commentary on target article]

    Get PDF
    A cornerstone of the target article is that, in a predictive coding framework, attention can be modelled by weighting prediction error with a measure of precision. We argue that this is not a complete explanation, especially in the light of ERP (event-related potentials) data showing large evoked responses for frequently presented target stimuli, which thus are predicted

    Text Mining for Pathway Curation

    Get PDF
    Biolog:innen untersuchen hĂ€ufig Pathways, Netzwerke von Interaktionen zwischen Proteinen und Genen mit einer spezifischen Funktion. Neue Erkenntnisse ĂŒber Pathways werden in der Regel zunĂ€chst in Publikationen veröffentlicht und dann in strukturierter Form in LehrbĂŒchern, Datenbanken oder mathematischen Modellen weitergegeben. Deren Kuratierung kann jedoch aufgrund der hohen Anzahl von Publikationen sehr aufwendig sein. In dieser Arbeit untersuchen wir wie Text Mining Methoden die Kuratierung unterstĂŒtzen können. Wir stellen PEDL vor, ein Machine-Learning-Modell zur Extraktion von Protein-Protein-Assoziationen (PPAs) aus biomedizinischen Texten. PEDL verwendet Distant Supervision und vortrainierte Sprachmodelle, um eine höhere Genauigkeit als vergleichbare Methoden zu erreichen. Eine Evaluation durch Expert:innen bestĂ€tigt die NĂŒtzlichkeit von PEDLs fĂŒr Pathway-Kurator:innen. Außerdem stellen wir PEDL+ vor, ein Kommandozeilen-Tool, mit dem auch Nicht-Expert:innen PPAs effizient extrahieren können. Drei Kurator:innen bewerten 55,6 % bis 79,6 % der von PEDL+ gefundenen PPAs als nĂŒtzlich fĂŒr ihre Arbeit. Die große Anzahl von PPAs, die durch Text Mining identifiziert werden, kann fĂŒr Forscher:innen ĂŒberwĂ€ltigend sein. Um hier Abhilfe zu schaffen, stellen wir PathComplete vor, ein Modell, das nĂŒtzliche Erweiterungen eines Pathways vorschlĂ€gt. Es ist die erste Pathway-Extension-Methode, die auf ĂŒberwachtem maschinellen Lernen basiert. Unsere Experimente zeigen, dass PathComplete wesentlich genauer ist als existierende Methoden. Schließlich schlagen wir eine Methode vor, um Pathways mit komplexen Ereignisstrukturen zu erweitern. Hier ĂŒbertrifft unsere neue Methode zur konditionalen Graphenmodifikation die derzeit beste Methode um 13-24% Genauigkeit in drei Benchmarks. Insgesamt zeigen unsere Ergebnisse, dass Deep Learning basierte Informationsextraktion eine vielversprechende Grundlage fĂŒr die UnterstĂŒtzung von Pathway-Kurator:innen ist.Biological knowledge often involves understanding the interactions between molecules, such as proteins and genes, that form functional networks called pathways. New knowledge about pathways is typically communicated through publications and later condensed into structured formats such as textbooks, pathway databases or mathematical models. However, curating updated pathway models can be labour-intensive due to the growing volume of publications. This thesis investigates text mining methods to support pathway curation. We present PEDL (Protein-Protein-Association Extraction with Deep Language Models), a machine learning model designed to extract protein-protein associations (PPAs) from biomedical text. PEDL uses distant supervision and pre-trained language models to achieve higher accuracy than the state of the art. An expert evaluation confirms its usefulness for pathway curators. We also present PEDL+, a command-line tool that allows non-expert users to efficiently extract PPAs. When applied to pathway curation tasks, 55.6% to 79.6% of PEDL+ extractions were found useful by curators. The large number of PPAs identified by text mining can be overwhelming for researchers. To help, we present PathComplete, a model that suggests potential extensions to a pathway. It is the first method based on supervised machine learning for this task, using transfer learning from pathway databases. Our evaluations show that PathComplete significantly outperforms existing methods. Finally, we generalise pathway extension from PPAs to more realistic complex events. Here, our novel method for conditional graph modification outperforms the current best by 13-24% accuracy on three benchmarks. We also present a new dataset for event-based pathway extension. Overall, our results show that deep learning-based information extraction is a promising basis for supporting pathway curators

    Scientific Kenyon: Neuroscience Edition (Full Issue)

    Get PDF

    Unveiling the extracellular APE1 role in hepatocellular carcinoma tumor biology

    Get PDF
    Le cellule tumorali possono sviluppare chemioresistenza attraverso l\u2019attivazione di meccanismi di riparo al DNA. L'endonucleasi APE1 \ue8 un enzima coinvolto nel processo di riparo al DNA per escissione di basi (BER). Conferisce chemio e radioresistenza in diversi tipi di tumori come mammella, carcinoma epatocellulare (HCC) e polmone per questo motivo potrebbe essere considerato un possibile bersaglio per nuove strategie antitumorali. APE1 esercita molte altre funzioni, come la risposta cellulare allo stress ossidativo, la regolazione dell'espressione genica e il processamento dei miRNA ed \ue8 overespressa in diversi tipi di tumore. Di recente \ue8 stato scoperto che APE1 pu\uf2 essere esocitata a seguito di stress ed \ue8 stato dimostrato che il suo rilascio extracellulare \ue8 regolato dall'acetilazione dei residui K6 / K7 del dominio N-Term. Nessun dato relativo alla secrezione di APE1 in HCC \ue8 stato finora riscontrato. In questo studio abbiamo dimostrato che APE1 viene secreta ed \ue8 stata ritrovata nel siero di pazienti affetti da HCC e per queste ragioni potrebbe essere considerata come un biomarcatore. Abbiamo fornito indicazioni sul ruolo biologico di APE1 sierica, verificandone la sua funzione paracrina nella regolazione dell'espressione dei geni IL-6 e IL-8. Abbiamo chiarito anche i meccanismi responsabili della secrezione di APE1 utilizzando una linea cellulare di HCC. I nostri risultati suggeriscono che APE1 possa agire in modo paracrino come un fattore pro-infiammatorio e forniscono una caratterizzazione della sua funzione esogena, relativa alla modulazione dello stato infiammatorio nel microambiente tumorale, contribuendo all'evoluzione dell'HCC. Considerando recenti evidenze sul coinvolgimento di APE1 nel processamento di onco-miR in condizione di stress genotossico, sono state rilevate nuove indicazioni sul suo ruolo nella biologia dei miRNA, chiarendo il suo importante contributo nella regolazione dell'espressione/processamento dei miRNA e anche nel loro smistamento tramite vescicole extracellulari. Quest\u2019ultimo aspetto potrebbe avere grandi implicazioni nella progressione del tumore e nei processi di chemioresistenza.Tumor cells can develop drug resistance via repair mechanisms that counteract the DNA damage from chemotherapy or radiation therapy. The Apurinic/apyrimidinic endonuclease 1 (APE1) is an enzyme involved in the DNA base excision repair (BER) pathway. It confers resistance to chemotherapy or radiotherapy treatments in different kind of tumors like breast, hepatocellular carcinoma (HCC), and lung for this reason it could be considered as a possible target for novel anticancer strategies. Many other non-repair activities are ascribable to APE1, such as the cell response to oxidative stress, the regulation of gene expression and miRNA processing. There are also consistent recent evidences concerning the secretion of APE1, for which elevated intracellular protein levels in cancer are linked to poor prognosis. It was in fact, demonstrated that APE1 is a non-classically secreted protein, and its extracellular release is regulated by the acetylation of K6/K7 residues of the N-Term domain. No data regarding secreted APE1 are still available in HCC. In this study we proved that serum secreted APE1 (sAPE1) could be considered as a diagnostic biomarker in hepatocellular carcinoma (HCC). We provided indications about sAPE1 biological role in HCC, elucidating sAPE1 paracrine function in the regulation of IL-6 and IL-8 mRNA expression. We elucidated also the mechanisms responsible for APE1 secretion using a HCC cell line. Our findings suggest a role of extracellular APE1 as a paracrine pro-inflammatory molecule, and provide a characterization of the APE1 exogenous function, which may modulate the inflammatory status in cancer microenvironment, contributing in the evolution of HCC. According to previous evidences about APE1 involvement in oncogenic miRNA processing under genotoxic stress, we also provide new indication about its role in miRNAs biology, elucidating its important contribute in miRNA expression/processing regulation and also in miRNA sorting in EVs, which could have great implication in tumor progression and in chemoresistance processes
    • 

    corecore