112 research outputs found

    Pybedtools: a flexible Python library for manipulating genomic datasets and annotations

    Get PDF
    Summary: pybedtools is a flexible Python software library for manipulating and exploring genomic datasets in many common formats. It provides an intuitive Python interface that extends upon the popular BEDTools genome arithmetic tools. The library is well documented and efficient, and allows researchers to quickly develop simple, yet powerful scripts that enable complex genomic analyses

    GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations

    Get PDF
    Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics

    A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer

    Get PDF
    BACKGROUND: The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequence. FINDINGS: We present a read dataset from whole-genome shotgun sequencing of the model organism Escherichia coli K-12 substr. MG1655 generated on a MinION™ device during the early-access MinION™ Access Program (MAP). Sequencing runs of the MinION™ are presented, one generated using R7 chemistry (released in July 2014) and one using R7.3 (released in September 2014). CONCLUSIONS: Base-called sequence data are provided to demonstrate the nature of data produced by the MinION™ platform and to encourage the development of customised methods for alignment, consensus and variant calling, de novo assembly and scaffolding. FAST5 files containing event data within the HDF5 container format are provided to assist with the development of improved base-calling methods

    Implementación de un prototipo funcional de aprendizaje de máquina para identificar correos electrónicos de Spear Phishing

    Get PDF
    Trabajo de investigaciónEste trabajo tiene como propósito la detección de correos electrónicos Spear Phishing a mediante un prototipo web, debido a que las técnicas de ingeniería social son muy usadas hoy en día para robar a los usuarios datos de identidad personal y/o credenciales de sus cuentas financieras, por esta razón, todas las personas deben implementar una medida para detectar estos ataques de ingeniería social.2 JUSTIFICACIÓN 3 PLANTEAMIENTO DEL PROBLEMA 4 OBJETIVOS 5 MARCOS DE REFERENCIA 6 ESTADO DEL ARTE 7 METODOLOGÍA 8 DESARROLLO DE LA PROPUESTA 9 INSTALACIÓN Y EQUIPO REQUERIDO 10 RESULTADOS 11 CONCLUSIONES 12 TRABAJOS FUTUROS 13 BIBLIOGRAFÍA 14 ANEXOSPregradoIngeniero de Sistema

    Combating subclonal evolution of resistant cancer phenotypes

    Get PDF
    Metastatic breast cancer remains challenging to treat, and most patients ultimately progress on therapy. This acquired drug resistance is largely due to drug-refractory sub-populations (subclones) within heterogeneous tumors. Here, we track the genetic and phenotypic subclonal evolution of four breast cancers through years of treatment to better understand how breast cancers become drug-resistant. Recurrently appearing post-chemotherapy mutations are rare. However, bulk and single-cell RNA sequencing reveal acquisition of malignant phenotypes after treatment, including enhanced mesenchymal and growth factor signaling, which may promote drug resistance, and decreased antigen presentation and TNF-α signaling, which may enable immune system avoidance. Some of these phenotypes pre-exist in pre-treatment subclones that become dominant after chemotherapy, indicating selection for resistance phenotypes. Post-chemotherapy cancer cells are effectively treated with drugs targeting acquired phenotypes. These findings highlight cancer's ability to evolve phenotypically and suggest a phenotype-targeted treatment strategy that adapts to cancer as it evolves

    Clique-Finding for Heterogeneity and Multidimensionality in Biomarker Epidemiology Research: The CHAMBER Algorithm

    Get PDF
    Commonly-occurring disease etiology may involve complex combinations of genes and exposures resulting in etiologic heterogeneity. We present a computational algorithm that employs clique-finding for heterogeneity and multidimensionality in biomedical and epidemiological research (the "CHAMBER" algorithm).This algorithm uses graph-building to (1) identify genetic variants that influence disease risk and (2) predict individuals at risk for disease based on inherited genotype. We use a set-covering algorithm to identify optimal cliques and a Boolean function that identifies etiologically heterogeneous groups of individuals. We evaluated this approach using simulated case-control genotype-disease associations involving two- and four-gene patterns. The CHAMBER algorithm correctly identified these simulated etiologies. We also used two population-based case-control studies of breast and endometrial cancer in African American and Caucasian women considering data on genotypes involved in steroid hormone metabolism. We identified novel patterns in both cancer sites that involved genes that sulfate or glucuronidate estrogens or catecholestrogens. These associations were consistent with the hypothesized biological functions of these genes. We also identified cliques representing the joint effect of multiple candidate genes in all groups, suggesting the existence of biologically plausible combinations of hormone metabolism genes in both breast and endometrial cancer in both races.The CHAMBER algorithm may have utility in exploring the multifactorial etiology and etiologic heterogeneity in complex disease

    Population Genomic Inferences from Sparse High-Throughput Sequencing of Two Populations of Drosophila melanogaster

    Get PDF
    Short-read sequencing techniques provide the opportunity to capture genome-wide sequence data in a single experiment. A current challenge is to identify questions that shallow-depth genomic data can address successfully and to develop corresponding analytical methods that are statistically sound. Here, we apply the Roche/454 platform to survey natural variation in strains of Drosophila melanogaster from an African (n = 3) and a North American (n = 6) population. Reads were aligned to the reference D. melanogaster genomic assembly, single nucleotide polymorphisms were identified, and nucleotide variation was quantified genome wide. Simulations and empirical results suggest that nucleotide diversity can be accurately estimated from sparse data with as little as 0.2× coverage per line. The unbiased genomic sampling provided by random short-read sequencing also allows insight into distributions of transposable elements and copy number polymorphisms found within populations and demonstrates that short-read sequencing methods provide an efficient means to quantify variation in genome organization and content. Continued development of methods for statistical inference of shallow-depth genome-wide sequencing data will allow such sparse, partial data sets to become the norm in the emerging field of population genomics

    Pulmonary Endpoints (Lung Carcinomas and Asbestosis) Following Inhalation Exposure to Asbestos

    Get PDF
    Lung carcinomas and pulmonary fibrosis (asbestosis) occur in asbestos workers. Understanding the pathogenesis of these diseases is complicated because of potential confounding factors, such as smoking, which is not a risk factor in mesothelioma. The modes of action (MOA) of various types of asbestos in the development of lung cancers, asbestosis, and mesotheliomas appear to be different. Moreover, asbestos fibers may act differentially at various stages of these diseases, and have different potencies as compared to other naturally occurring and synthetic fibers. This literature review describes patterns of deposition and retention of various types of asbestos and other fibers after inhalation, methods of translocation within the lung, and dissolution of various fiber types in lung compartments and cells in vitro. Comprehensive dose-response studies at fiber concentrations inhaled by humans as well as bivariate size distributions (lengths and widths), types, and sources of fibers are rarely defined in published studies and are needed. Species-specific responses may occur. Mechanistic studies have some of these limitations, but have suggested that changes in gene expression (either fiber-catalyzed directly or by cell elaboration of oxidants), epigenetic changes, and receptor-mediated or other intracellular signaling cascades may play roles in various stages of the development of lung cancers or asbestosis
    corecore