268 research outputs found

    Parallelization of Mapping Algorithms for Next Generation Sequencing Applications

    Get PDF
    With the advent of next-generation high throughput sequencing instruments, large volumes of short sequence data are generated at an unprecedented rate. Processing and analyzing these massive data requires overcoming several challenges. A particular challenge addressed in this abstract is the mapping of short sequences (reads) to a reference genome by allowing mismatches. This is a significantly time consuming combinatorial problem in many applications including whole-genome resequencing, targeted sequencing, transcriptome/small RNA, DNA methylation and ChiP sequencing, and takes time on the order of days using existing sequential techniques on large scale datasets. In this work, we introduce six parallelization methods each having different scalability characteristics to speedup short sequence mapping. We also address an associated load balancing problem that involves grouping nodes of a tree from different levels. This problem arises due to a trade-off between computational cost and granularity while partitioning the workload. We comparatively present the proposed parallelization methods and give theoretical cost models for each of them. Experimental results on real datasets demonstrate the effectiveness of the methods and indicate that they are successful at reducing the execution time from the order of days to under just a few hours for large datasets. To the best of our knowledge this is the first study on parallelization of short sequence mapping problem

    The invisible power of fairness. How machine learning shapes democracy

    Full text link
    Many machine learning systems make extensive use of large amounts of data regarding human behaviors. Several researchers have found various discriminatory practices related to the use of human-related machine learning systems, for example in the field of criminal justice, credit scoring and advertising. Fair machine learning is therefore emerging as a new field of study to mitigate biases that are inadvertently incorporated into algorithms. Data scientists and computer engineers are making various efforts to provide definitions of fairness. In this paper, we provide an overview of the most widespread definitions of fairness in the field of machine learning, arguing that the ideas highlighting each formalization are closely related to different ideas of justice and to different interpretations of democracy embedded in our culture. This work intends to analyze the definitions of fairness that have been proposed to date to interpret the underlying criteria and to relate them to different ideas of democracy.Comment: 12 pages, 1 figure, preprint version, submitted to The 32nd Canadian Conference on Artificial Intelligence that will take place in Kingston, Ontario, May 28 to May 31, 201

    Micro-Environment Causes Reversible Changes in DNA Methylation and mRNA Expression Profiles in Patient-Derived Glioma Stem Cells

    Get PDF
    In vitro and in vivo models are widely used in cancer research. Characterizing the similarities and differences between a patient\u27s tumor and corresponding in vitro and in vivo models is important for understanding the potential clinical relevance of experimental data generated with these models. Towards this aim, we analyzed the genomic aberrations, DNA methylation and transcriptome profiles of five parental tumors and their matched in vitro isolated glioma stem cell (GSC) lines and xenografts generated from these same GSCs using high-resolution platforms. We observed that the methylation and transcriptome profiles of in vitro GSCs were significantly different from their corresponding xenografts, which were actually more similar to their original parental tumors. This points to the potentially critical role of the brain microenvironment in influencing methylation and transcriptional patterns of GSCs. Consistent with this possibility, ex vivo cultured GSCs isolated from xenografts showed a tendency to return to their initial in vitro states even after a short time in culture, supporting a rapid dynamic adaptation to the in vitro microenvironment. These results show that methylation and transcriptome profiles are highly dependent on the microenvironment and growth in orthotopic sites partially reverse the changes caused by in vitro culturing

    Age-Specific Signatures of Glioblastoma at the Genomic, Genetic, and Epigenetic Levels

    Get PDF
    Age is a powerful predictor of survival in glioblastoma multiforme (GBM) yet the biological basis for the difference in clinical outcome is mostly unknown. Discovering genes and pathways that would explain age-specific survival difference could generate opportunities for novel therapeutics for GBM. Here we have integrated gene expression, exon expression, microRNA expression, copy number alteration, SNP, whole exome sequence, and DNA methylation data sets of a cohort of GBM patients in The Cancer Genome Atlas (TCGA) project to discover age-specific signatures at the transcriptional, genetic, and epigenetic levels and validated our findings on the REMBRANDT data set. We found major age-specific signatures at all levels including age-specific hypermethylation in polycomb group protein target genes and the upregulation of angiogenesis-related genes in older GBMs. These age-specific differences in GBM, which are independent of molecular subtypes, may in part explain the preferential effects of anti-angiogenic agents in older GBM and pave the way to a better understanding of the unique biology and clinical behavior of older versus younger GBMs

    The Invisible Power of Fairness. How Machine Learning Shapes Democracy

    Get PDF
    Many machine learning systems make extensive use of large amounts of data regarding human behaviors. Several researchers have found various discriminatory practices related to the use of human-related machine learning systems, for example in the field of criminal justice, credit scoring and advertising. Fair machine learning is therefore emerging as a new field of study to mitigate biases that are inadvertently incorporated into algorithms. Data scientists and computer engineers are making various efforts to provide definitions of fairness. In this paper, we provide an overview of the most widespread definitions of fairness in the field of machine learning, arguing that the ideas highlighting each formalization are closely related to different ideas of justice and to different interpretations of democracy embedded in our culture. This work intends to analyze the definitions of fairness that have been proposed to date to interpret the underlying criteria and to relate them to different ideas of democracy

    GliomaPredict: a clinically useful tool for assigning glioma patients to specific molecular subtypes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Advances in generating genome-wide gene expression data have accelerated the development of molecular-based tumor classification systems. Tools that allow the translation of such molecular classification schemas from research into clinical applications are still missing in the emerging era of personalized medicine.</p> <p>Results</p> <p>We developed GliomaPredict as a computational tool that allows the fast and reliable classification of glioma patients into one of six previously published stratified subtypes based on sets of extensively validated classifiers derived from hundreds of glioma transcriptomic profiles. Our tool utilizes a principle component analysis (PCA)-based approach to generate a visual representation of the analyses, quantifies the confidence of the underlying subtype assessment and presents results as a printable PDF file. GliomaPredict tool is implemented as a plugin application for the widely-used GenePattern framework.</p> <p>Conclusions</p> <p>GliomaPredict provides a user-friendly, clinically applicable novel platform for instantly assigning gene expression-based subtype in patients with gliomas thereby aiding in clinical trial design and therapeutic decision-making. Implemented as a user-friendly diagnostic tool, we expect that in time GliomaPredict, and tools like it, will become routinely used in translational/clinical research and in the clinical care of patients with gliomas.</p

    Relation between the frequency of CD34+ bone marrow derived circulating progenitor cells and the number of diseased coronary arteries in patients with myocardial ischemia and diabetes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bone marrow-derived circulating progenitor cells (BM-CPCs) in patients with coronary heart disease are impaired with respect to number and mobilization. However, it is unknown whether the mobilization of BM-CPCs depends on the number of diseased coronary arteries. Therefore, in our study, we analysed the correlation between the diseased coronary arteries and the frequency of CD34/45+ BM-CPCs in peripheral blood (PB) in patients with ischemic heart disease (IHD).</p> <p>Methods</p> <p>The frequency of CD34/45<sup>+ </sup>BM-CPCs was measured by flow cytometry in 120 patients with coronary 1 vessel (IHD1, n = 40), coronary 2 vessel (IHD2, n = 40), coronary 3 vessel disease (IHD3, n = 40) and in a control group of healthy subjects (n = 40). There was no significant difference of the total number of cardiovascular risk factors between IHD groups, beside diabetes mellitus (DM), which was significantly higher in IHD3 group compared to IHD2 and IHD1 groups.</p> <p>Results</p> <p>The frequency of CD34/45<sup>+ </sup>BM-CPCs was significantly reduced in patients with IHD compared to the control group (CD34/45<sup>+</sup>; p < 0.001). The frequency of BM-CPCs was impaired in patients with IHD3 compared to IHD1 (CD34/45<sup>+</sup>; p < 0.001) and to IHD2 (CD34/45<sup>+</sup>; p = 0.001). But there was no significant difference in frequency of BM-CPCs between the patients with IHD2 and IHD1 (CD34/45<sup>+</sup>; p = 0.28). In a subgroup we observed a significant negative correlation between levels of hemoglobin AIc (HbAIc) and the frequency of BM-CPCs (CD34/45<sup>+</sup>; p < 0.001, r = -0.8).</p> <p>Conclusions</p> <p>The frequency of CD34/45<sup>+ </sup>BM-CPCs in PB is impaired in patients with IHD. This impairment may augment with an increased number of diseased coronary arteries. Moreover, the frequency of CD34/45<sup>+ </sup>BM-CPCs in ischemic tissue is further impaired by diabetes in patients with IHD.</p

    Seismic Constraints on the Thickness and Structure of the Martian Crust from InSight

    Get PDF
    NASA¿s InSight mission [1] has for the first time placed a very broad-band seismometer on the surface of Mars. The Seismic Experiment for Interior Structure (SEIS) [2] has been collecting continuous data since early February 2019. The main focus of InSight is to enhance our understanding of the internal structure and dynamics of Mars, which includes the goal to better constrain the crustal thickness of the planet [3]. Knowing the present-day crustal thickness of Mars has important implications for its thermal evolution [4] as well as for the partitioning of silicates and heat-producing elements between the different layers of Mars. Current estimates for the crustal thickness of Mars are based on modeling the relationship between topography and gravity [5,6], but these studies rely on different assumptions, e.g. on the density of the crust and upper mantle, or the bulk silicate composition of the planet and the crust. The resulting values for the average crustal thickness differ by more than 100%, from 30 km to more than 100 km [7]. New independent constraints from InSight will be based on seismically determining the crustal thickness at the landing site. This single firm measurement of crustal thickness at one point on the planet will allow to constrain both the average crustal thickness of Mars as well as thickness variations across the planet when combined with constraints from gravity and topography [8]. Here we describe the determination of the crustal structure and thickness at the InSight landing site based on seismic receiver functions for three marsquakes compared with autocorrelations of InSight data [9].We acknowledge NASA, CNES, partner agencies and institutions (UKSA, SSO,DLR, JPL, IPGP-CNRS, ETHZ, IC, MPS-MPG) and the operators of JPL, SISMOC, MSDS, IRIS-DMC and PDS for providing SEED SEIS data. InSight data is archived in the PDS, and a full list of archives in the Geosciences, Atmospheres, and Imaging nodes is at https://pds-geosciences.wustl.edu/missions/insight/. This work was partially carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. ©2021, California Institute of Technology. Government sponsorship acknowledge
    corecore