70 research outputs found

    Predicting enhancers using a small subset of high confidence examples and co-training

    Get PDF
    ABSTRACT Enhancers are important regulatory regions located throughout the genome, primarily in non-coding regions. Several experimental methods have been developed over the last several years to identify their location, but the search space is large and the overlap between the putative enhancer identified using these methods tends to be very small. Computational methods for enhancer prediction often use one large set of experimentally identified enhancer regions as input, and therefore rely critically on their correctness. We chose to take a different approach, and start with a high confidence set of 21 enhancer that are in the intersection of enhancers identified using three completely unrelated experimental approaches: deepCAGE, HiCap and classical enhancer reporter assays. Because this starting set is so small, we use a semi-supervised approach called co-training rather than a fully supervised approach to progressively predict enhancers from unlabeled regions. Using this approach we are able to outperform supervised learning as well as simpler semi-supervised learning methods and achieve an average area under the ROC curve of 0.84

    Direct long-read RNA sequencing uncovers functional variation affecting transcript production and RNA modifications

    Get PDF
    The production of multiple transcripts per gene is a process regulated by inherited genetic variants and epitranscriptomic modifications, and plays a prominent role in modulating complex traits and diseases. To simultaneously characterize the effect of genetic variants on transcript abundance and N6-methyladenosine (m6A) modifications, we produced long-read native poly(A) RNA-seq data for 60 genetically different lymphoblastoid cell lines (LCLs) from the 1000 Genomes/Geuvadis project. We identified a high diversity of both annotated (31%) and unannotated (61%) transcripts, with only a small proportion expressed across individuals (35% and 7%, respectively). In a genome-wide genetic analysis on transcripts, we identified 105 trQTLs, of which 76 were not detected as eQTLs using a larger published short-read RNAseq dataset (317 samples). A population wide characterization of m6A methylation DRACH motifs identified an average of 40.1 m6A modifications on 6,222 genes. Genetic association analysis of highly variable modifications from 1,155 genes identified m6A modification quantitative trait loci (m6A-QTLs) for 16 transcripts. Colocalization analysis of trQTL and m6A-QTLs, identified 33 candidate transcripts mediating GWAS traits, with 46.4% of the colocalized trQTLs implicating novel risk transcripts. Overall, the simultaneous characterization of transcripts and post-transcriptional modifications identified genetic effects on transcription often missed when using other sequencing technologies

    Die Bibliothek als Erfolgsfaktor - 10 Jahre danach

    Get PDF
    Im Jahr 2022 feiert die Universitätsbibliothek Bochum ihr 60. Jubiläum. Die UB Bochum ist auf dem Campus der Ruhr-Universität Bochum neben ihrer Rolle als professionelle Dienstleisterin für Studium, Lehre und Forschung längst ein attraktiver Lern- und Begegnungsort, geographisch zentral und in Sachen Digitalisierung sowie Vernetzung und Kooperationen zukunftsweisend

    Proteomic analysis of 92 circulating proteins and their effects in cardiometabolic diseases

    Get PDF
    BACKGROUND: Human plasma contains a wide variety of circulating proteins. These proteins can be important clinical biomarkers in disease and also possible drug targets. Large scale genomics studies of circulating proteins can identify genetic variants that lead to relative protein abundance.METHODS: We conducted a meta-analysis on genome-wide association studies of autosomal chromosomes in 22,997 individuals of primarily European ancestry across 12 cohorts to identify protein quantitative trait loci (pQTL) for 92 cardiometabolic associated plasma proteins.RESULTS: We identified 503 (337 cis and 166 trans) conditionally independent pQTLs, including several novel variants not reported in the literature. We conducted a sex-stratified analysis and found that 118 (23.5%) of pQTLs demonstrated heterogeneity between sexes. The direction of effect was preserved but there were differences in effect size and significance. Additionally, we annotate trans-pQTLs with nearest genes and report plausible biological relationships. Using Mendelian randomization, we identified causal associations for 18 proteins across 19 phenotypes, of which 10 have additional genetic colocalization evidence. We highlight proteins associated with a constellation of cardiometabolic traits including angiopoietin-related protein 7 (ANGPTL7) and Semaphorin 3F (SEMA3F).CONCLUSION: Through large-scale analysis of protein quantitative trait loci, we provide a comprehensive overview of common variants associated with plasma proteins. We highlight possible biological relationships which may serve as a basis for further investigation into possible causal roles in cardiometabolic diseases.</p

    Proteomic analysis of 92 circulating proteins and their effects in cardiometabolic diseases

    Get PDF
    Background: Human plasma contains a wide variety of circulating proteins. These proteins can be important clinical biomarkers in disease and also possible drug targets. Large scale genomics studies of circulating proteins can identify genetic variants that lead to relative protein abundance. Methods: We conducted a meta-analysis on genome-wide association studies of autosomal chromosomes in 22,997 individuals of primarily European ancestry across 12 cohorts to identify protein quantitative trait loci (pQTL) for 92 cardiometabolic associated plasma proteins. Results: We identified 503 (337 cis and 166 trans) conditionally independent pQTLs, including several novel variants not reported in the literature. We conducted a sex-stratified analysis and found that 118 (23.5%) of pQTLs demonstrated heterogeneity between sexes. The direction of effect was preserved but there were differences in effect size and significance. Additionally, we annotate trans-pQTLs with nearest genes and report plausible biological relationships. Using Mendelian randomization, we identified causal associations for 18 proteins across 19 phenotypes, of which 10 have additional genetic colocalization evidence. We highlight proteins associated with a constellation of cardiometabolic traits including angiopoietin-related protein 7 (ANGPTL7) and Semaphorin 3F (SEMA3F). Conclusion: Through large-scale analysis of protein quantitative trait loci, we provide a comprehensive overview of common variants associated with plasma proteins. We highlight possible biological relationships which may serve as a basis for further investigation into possible causal roles in cardiometabolic diseases

    Genetic Landscape of the ACE2 Coronavirus Receptor

    Get PDF
    Background:SARS-CoV-2, the causal agent of COVID-19, enters human cells using the ACE2 (angiotensin-converting enzyme 2) protein as a receptor. ACE2 is thus key to the infection and treatment of the coronavirus. ACE2 is highly expressed in the heart and respiratory and gastrointestinal tracts, playing important regulatory roles in the cardiovascular and other biological systems. However, the genetic basis of the ACE2 protein levels is not well understood.Methods:We have conducted the largest genome-wide association meta-analysis of plasma ACE2 levels in &gt;28 000 individuals of the SCALLOP Consortium (Systematic and Combined Analysis of Olink Proteins). We summarize the cross-sectional epidemiological correlates of circulating ACE2. Using the summary statistics–based high-definition likelihood method, we estimate relevant genetic correlations with cardiometabolic phenotypes, COVID-19, and other human complex traits and diseases. We perform causal inference of soluble ACE2 on vascular disease outcomes and COVID-19 severity using mendelian randomization. We also perform in silico functional analysis by integrating with other types of omics data.Results:We identified 10 loci, including 8 novel, capturing 30% of the heritability of the protein. We detected that plasma ACE2 was genetically correlated with vascular diseases, severe COVID-19, and a wide range of human complex diseases and medications. An X-chromosome cis–protein quantitative trait loci–based mendelian randomization analysis suggested a causal effect of elevated ACE2 levels on COVID-19 severity (odds ratio, 1.63 [95% CI, 1.10–2.42]; P=0.01), hospitalization (odds ratio, 1.52 [95% CI, 1.05–2.21]; P=0.03), and infection (odds ratio, 1.60 [95% CI, 1.08–2.37]; P=0.02). Tissue- and cell type–specific transcriptomic and epigenomic analysis revealed that the ACE2 regulatory variants were enriched for DNA methylation sites in blood immune cells.Conclusions:Human plasma ACE2 shares a genetic basis with cardiovascular disease, COVID-19, and other related diseases. The genetic architecture of the ACE2 protein is mapped, providing a useful resource for further biological and clinical studies on this coronavirus receptor

    Creación y Simulación de Metodologías de Análisis, Clasificación e Integración de Nuevos Requerimientos a Software Propietario

    Get PDF
    La priorización de nuevos requerimientos a implementar en un software propietario es un punto fundamental para su mantenimiento, la conservación de la calidad, observación de las reglas de negocio y los estándares de la empresa. Aunque existen herramientas de priorización basadas en técnicas probadas y reconocidas, las mismas requieren una calificación previa de cada requerimiento. Si la empresa cuenta con solicitudes provenientes de varios clientes de un mismo producto, aumentan los factores que afectan a la empresa, las herramientas disponibles no contemplan estos aspectos y hacen mucho más compleja la tarea de calificación. Este trabajo de investigación abarca la realización de un relevamiento de los métodos de priorización y selección de nuevos requerimientos utilizados por empresas de la zona de Rosario, y la definición de una metodología para la selección un nuevo requerimiento, que implica el análisis y evaluación de todas las implicaciones sobre el producto de software y la empresa, respetando sus reglas de negocio. La metodología creada conduce a la definición de los procesos para la construcción de una herramienta de calificación y priorización de nuevos requerimientos en software propietario que tiene solicitudes de varios clientes al mismo tiempo, con instrumentos de calificación que consideran todos los aspectos relacionados, proveerá técnicas de priorización actuales y emitirá informes personalizados según diferentes perspectivas de la empresa.Eje: Ingeniería de SoftwareRed de Universidades con Carreras en Informática (RedUNCI

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    Enhancer Vorhersage Basierend auf Epigenomischen Daten

    No full text
    In this thesis, we show how to exploit the current knowledge of enhancers, and integrate different types of epigenomic data to make condition-specific predictions on the location of active enhancers. First, we introduce a novel method for genome-wide enhancer prediction which is solely based on histone modification data. Our method is a combination of two random forest classifiers, where one classifier learns the difference between active and inactive genomic regions and the other concentrates on the more difficult task to distinguish active enhancers from active promoters. We model and optimize the corresponding features taking into account the local chromatin structure. For an active enhancer, this is in essence an accessible region flanked by nucleosomes with specific histone modifications. To avoid circular reasoning, our training enhancers are defined by feature set-independent characteristics: accessibility and bidirectional transcription. We thoroughly validate our method on mouse embryonic stem cell data and achieve very good performances on a constructed test set as well as on a validated set of enhancers. Moreover, our genome-wide enhancer predictions have a high spatial resolution. We also cluster proximal enhancers and show that the resulting regions of high enhancer density are in good agreement with a published list of super-enhancers in mouse embryonic stem cells. In contrast to many other methods, we offer a pre-trained classifier with integrated data normalization that can be used to reliably predict enhancers across different cell types and species. This classifier is superior to the prominent unsupervised method ChromHMM, and shows similar results as the recent supervised REPTILE approach when applied in the same cell type. In terms of transferability to other conditions, our method outperforms REPTILE. Finally, we demonstrate how our pre-trained classifier can be embedded into a comprehensive framework to predict condition-specific regulatory units (pairs of enhancers and putative target genes) of histone modification and gene expression data.In dieser Doktorarbeit zeigen wir, wie man die aktuellen Enhancer-Kentnisse nutzen und verschiedene epigenetische Datensätze integrieren kann um die Postition aktiver Enhancer unter spezifischen Bedingungen vorherzusagen. Zuerst stellen wir eine neue Methode zur genomweiten Enhancer-Vorhersage basierend auf Histonmodifikationsdaten vor. Unsere Methode kombiniert zwei Random Forest Klassifikationsverfahren zur Unterscheidung von aktiven und inaktiven genomischen Regionen und zur schwierigeren Unterscheidung von aktiven Enhancern und aktiven Promotoren. Beim Modellieren und Optimieren der Klassifikationsmerkmale (Feature) berücksichtigen wir die lokale Chromatinstruktur. Kennzeichnend für einen aktiven Enhancer ist imWesentlichen ein Abschnitt zugänglichen Chromatins, umgeben von Nukleosomen mit spezifischen Histonmodifikationen. Unsere Trainings-Enhancer sind so definiert, dass sie offene Chromatinregionen umfassen und nachweislich bidirektionale Transkripte herstellen. Diese Enhancer-Charakteristiken haben wir möglichst unabhängig von den Klassifikationsmerkmalen gewählt um Zirkelschlüsse zu vermeiden. Wir haben unsere Methode in embryonalen Stammzellen der Maus validiert und sehr gute Vorhersagergebnisse auf ausgewählten Testsets erzielt. Außerdem haben wir vorhergesagte, beieinanderliegende Enhancer in Regionen hoher Enhancer-Dichte zusammengefasst, für die wir eine gute Übereinstimmung mit veröffentlichten Superenhancern feststellen konnten. Im Gegensatz zu vielen Methoden zur Enhancer-Vorhersage bieten wir ein trainiertes Modell mit integriereter Datennormalisierung an, dass zuverlässig auf neue Datensätze anderer Zelltypen und Spezies angewendet werden kann. Unser Modell zeigt bessere Ergenisse als die viel genutzte Methode ChromHMM, und ist bei Anwendung innerhalb eines Zelltyps vergleichbar mit der REPTILE-Methode. Für die Anwendung auf neue Datensätze ist unsere Methode besser geeignet. Schließlich zeigen wir, wie unser trainiertes Modell als Basis eines Frameworks fungieren kann um bedingungsspezifische regulatorische Einheiten (Enhancer-Gen-Paare) von Histonmodifikations- und Genexpressionsdaten vorherzusagen
    corecore