40 research outputs found

    Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

    Full text link
    The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics

    Integrative bioinformatics applications for complex human disease contexts

    Get PDF
    This thesis presents new methods for the analysis of high-throughput data from modern sources in the context of complex human diseases, at the example of a bioinformatics analysis workflow. New measurement techniques improve the resolution with which cellular and molecular processes can be monitored. While RNA sequencing (RNA-seq) measures mRNA expression, single-cell RNA-seq (scRNA-seq) resolves this on a per-cell basis. Long-read sequencing is increasingly used in genomics. With imaging mass spectrometry (IMS) the protein level in tissues is measured spatially resolved. All these techniques induce specific challenges, which need to be addressed with new computational methods. Collecting knowledge with contextual annotations is important for integrative data analyses. Such knowledge is available through large literature repositories, from which information, such as miRNA-gene interactions, can be extracted using text mining methods. After aggregating this information in new databases, specific questions can be answered with traceable evidence. The combination of experimental data with these databases offers new possibilities for data integrative methods and for answering questions relevant for complex human diseases. Several data sources are made available, such as literature for text mining miRNA-gene interactions (Chapter 2), next- and third-generation sequencing data for genomics and transcriptomics (Chapters 4.1, 5), and IMS for spatially resolved proteomics (Chapter 4.4). For these data sources new methods for information extraction and pre-processing are developed. For instance, third-generation sequencing runs can be monitored and evaluated using the poreSTAT and sequ-into methods. The integrative (down-stream) analyses make use of these (heterogeneous) data sources. The cPred method (Chapter 4.2) for cell type prediction from scRNA-seq data was successfully applied in the context of the SARS-CoV-2 pandemic. The robust differential expression (DE) analysis pipeline RoDE (Chapter 6.1) contains a large set of methods for (differential) data analysis, reporting and visualization of RNA-seq data. Topics of accessibility of bioinformatics software are discussed along practical applications (Chapter 3). The developed miRNA-gene interaction database gives valuable insights into atherosclerosis-relevant processes and serves as regulatory network for the prediction of active miRNA regulators in RoDE (Chapter 6.1). The cPred predictions, RoDE results, scRNA-seq and IMS data are unified as input for the 3D-index Aorta3D (Chapter 6.2), which makes atherosclerosis related datasets browsable. Finally, the scRNA-seq analysis with subsequent cPred cell type prediction, and the robust analysis of bulk-RNA-seq datasets, led to novel insights into COVID-19. Taken all discussed methods together, the integrative analysis methods for complex human disease contexts have been improved at essential positions.Die Dissertation beschreibt Methoden zur Prozessierung von aktuellen Hochdurchsatzdaten, sowie Verfahren zu deren weiterer integrativen Analyse. Diese findet Anwendung vor allem im Kontext von komplexen menschlichen Krankheiten. Neue Messtechniken erlauben eine detailliertere Beobachtung biomedizinischer Prozesse. Mit RNA-Sequenzierung (RNA-seq) wird mRNA-Expression gemessen, mit Hilfe von moderner single-cell-RNA-seq (scRNA-seq) sogar für (sehr viele) einzelne Zellen. Long-Read-Sequenzierung wird zunehmend zur Sequenzierung ganzer Genome eingesetzt. Mittels bildgebender Massenspektrometrie (IMS) können Proteine in Geweben räumlich aufgelöst quantifiziert werden. Diese Techniken bringen spezifische Herausforderungen mit sich, die mit neuen bioinformatischen Methoden angegangen werden müssen. Für die integrative Datenanalyse ist auch die Gewinnung von geeignetem Kontextwissen wichtig. Wissenschaftliche Erkenntnisse werden in Artikeln veröffentlicht, die über große Literaturdatenbanken zugänglich sind. Mittels Textmining können daraus Informationen extrahiert werden, z.B. miRNA-Gen-Interaktionen, die in eigenen Datenbank aggregiert werden um spezifische Fragen mit nachvollziehbaren Belegen zu beantworten. In Kombination mit experimentellen Daten bieten sich so neue Möglichkeiten für integrative Methoden. Durch die Extraktion von Rohdaten und deren Vorprozessierung werden mehrere Datenquellen erschlossen, wie z.B. Literatur für Textmining von miRNA-Gen-Interaktionen (Kapitel 2), Long-Read- und RNA-seq-Daten für Genomics und Transcriptomics (Kapitel 4.2, 5) und IMS für Protein-Messungen (Kapitel 4.4). So dienen z.B. die poreSTAT und sequ-into Methoden der Vorprozessierung und Auswertung von Long-Read-Sequenzierungen. In der integrativen (down-stream) Analyse werden diese (heterogenen) Datenquellen verwendet. Für die Bestimmung von Zelltypen in scRNA-seq-Experimenten wurde die cPred-Methode (Kapitel 4.2) erfolgreich im Kontext der SARS-CoV-2-Pandemie eingesetzt. Auch die robuste Pipeline RoDE fand dort Anwendung, die viele Methoden zur (differentiellen) Datenanalyse, zum Reporting und zur Visualisierung bereitstellt (Kapitel 6.1). Themen der Benutzbarkeit von (bioinformatischer) Software werden an Hand von praktischen Anwendungen diskutiert (Kapitel 3). Die entwickelte miRNA-Gen-Interaktionsdatenbank gibt wertvolle Einblicke in Atherosklerose-relevante Prozesse und dient als regulatorisches Netzwerk für die Vorhersage von aktiven miRNA-Regulatoren in RoDE (Kapitel 6.1). Die cPred-Methode, RoDE-Ergebnisse, scRNA-seq- und IMS-Daten werden im 3D-Index Aorta3D (Kapitel 6.2) zusammengeführt, der relevante Datensätze durchsuchbar macht. Die diskutierten Methoden führen zu erheblichen Verbesserungen für die integrative Datenanalyse in komplexen menschlichen Krankheitskontexten

    DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

    Full text link
    Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques to more memory-centric techniques, thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.Comment: Our open source software is available at https://github.com/CMU-SAFARI/DAMO

    Enabling Hyperscale Web Services

    Full text link
    Modern web services such as social media, online messaging, web search, video streaming, and online banking often support billions of users, requiring data centers that scale to hundreds of thousands of servers, i.e., hyperscale. In fact, the world continues to expect hyperscale computing to drive more futuristic applications such as virtual reality, self-driving cars, conversational AI, and the Internet of Things. This dissertation presents technologies that will enable tomorrow’s web services to meet the world’s expectations. The key challenge in enabling hyperscale web services arises from two important trends. First, over the past few years, there has been a radical shift in hyperscale computing due to an unprecedented growth in data, users, and web service software functionality. Second, modern hardware can no longer support this growth in hyperscale trends due to a decline in hardware performance scaling. To enable this new hyperscale era, hardware architects must become more aware of hyperscale software needs and software researchers can no longer expect unlimited hardware performance scaling. In short, systems researchers can no longer follow the traditional approach of building each layer of the systems stack separately. Instead, they must rethink the synergy between the software and hardware worlds from the ground up. This dissertation establishes such a synergy to enable futuristic hyperscale web services. This dissertation bridges the software and hardware worlds, demonstrating the importance of that bridge in realizing efficient hyperscale web services via solutions that span the systems stack. The specific goal is to design software that is aware of new hardware constraints and architect hardware that efficiently supports new hyperscale software requirements. This dissertation spans two broad thrusts: (1) a software and (2) a hardware thrust to analyze the complex hyperscale design space and use insights from these analyses to design efficient cross-stack solutions for hyperscale computation. In the software thrust, this dissertation contributes uSuite, the first open-source benchmark suite of web services built with a new hyperscale software paradigm, that is used in academia and industry to study hyperscale behaviors. Next, this dissertation uses uSuite to study software threading implications in light of today’s hardware reality, identifying new insights in the age-old research area of software threading. Driven by these insights, this dissertation demonstrates how threading models must be redesigned at hyperscale by presenting an automated approach and tool, uTune, that makes intelligent run-time threading decisions. In the hardware thrust, this dissertation architects both commodity and custom hardware to efficiently support hyperscale software requirements. First, this dissertation characterizes commodity hardware’s shortcomings, revealing insights that influenced commercial CPU designs. Based on these insights, this dissertation presents an approach and tool, SoftSKU, that enables cheap commodity hardware to efficiently support new hyperscale software paradigms, improving the efficiency of real-world web services that serve billions of users, saving millions of dollars, and meaningfully reducing the global carbon footprint. This dissertation also presents a hardware-software co-design, uNotify, that redesigns commodity hardware with minimal modifications by using existing hardware mechanisms more intelligently to overcome new hyperscale overheads. Next, this dissertation characterizes how custom hardware must be designed at hyperscale, resulting in industry-academia benchmarking efforts, commercial hardware changes, and improved software development. Based on this characterization’s insights, this dissertation presents Accelerometer, an analytical model that estimates gains from hardware customization. Multiple hyperscale enterprises and hardware vendors use Accelerometer to make well-informed hardware decisions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169802/1/akshitha_1.pd

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Applied Bioinformatics for ncRNA Characterization - Case Studies Combining Next Generation Sequencing & Genomics

    Get PDF
    Non-coding RNAs (ncRNAs) present a diverse class of functional molecules inherent in virtually all forms of cellular life. Besides the canonical protein-encoding mRNAs the role of these abundant transcripts has been overlooked for decades. Defined by their highly conserved structure ncRNAs are resistant to degradation and perform various regulatory functions. Despite the poor sequence conservation, comparative genomics can be employed to identify homologous ncRNAs based on their structure in related species. Through the availability of next generation sequencing techniques, a rich corpus of datasets is available which grants a detailed look into cellular processes. The combination of genomic and transcriptomic data allows for a detailed understanding of molecular mechanism as well as characterization of individual gene functions and their evolution. However, analytical processing of modern high-throughput data is only made viable through optimized bioinformatic algorithms and reproducible automation pipelines. This thesis consists of four major parts highlighting the diverse roles of ncRNAs concerning the transcription process viewed from different vantage points. The first part concerns an unusually long untranslated region in Rhodobacter which harbors a ncRNA that regulates the expression of the downstream division cell wall cluster. Second, the degradation of 6S RNA in Bacillus subtilis is experimentally reconstructed to shed light on this final part of the RNA life cycle. This ncRNA is ubiquitous among bacteria and known to be a global transcription regulator itself. Next, the focus moves to the eukaryotic system and RNase P, an ancient ribozyme that is involved in tRNA maturation. Due to differences in composition with an optional RNA and multiple protein subunits, its phylogenetic distribution and deviant characteristics throughout the eukaryotic lineage are examined in order to trace its evolution. Finally, a diverse subgroup of non-translated RNAs are circRNAs which recently received increased attention due to their abundance in neural tissue. Resulting from post-transcriptional back-splicing events circRNAs compete with their host gene for expression. In a zoological study of social insects circRNA were for the first time identified in honeybees. The goal was to find task-related differences in circRNA expression between nurse bees and foragers and thus pinpoint potential functions of these elusive ncRNAs. The combination of genomic methods and transcriptomic data makes in-depth functional analysis of ncRNAs possible and enables us to understand the molecular mechanisms on multiple levels. Through structural predictions a riboswitch like transcriptional control of UpsM was revealed that is unique to Rhodobacteraceae. Transcriptomic analysis exposed that 6S RNA is primarily processed by RNase J1 for maturation and degraded at internal loops by RNase Y. Evolutionary comparison of organellar RNase P revealed that the RNA subunit is potentially less conserved than thought while organellar proteinonly variants are widespread potentially due to horizontal gene transfer. In the case of circRNA, an entire group of ncRNAs was characterized in the social model organism of honeybees and evidence of at least one gene where circRNA levels are significantly reduced during nurse-to-forager transition could be shown. Moreover, an unexpected link between elevated DNA methylation and RNA circularization was discovered. The bioinformatic findings in all of these cases provide a foundation for further experimental research and illustrate how scientific endeavors cannot be automated completely but require rigorous investigation with customized tools

    Adaptive Prefetching and Cache Partitioning for Multicore Processors

    Get PDF
    El acceso a la memoria principal en los procesadores actuales supone un importante cuello de botella para las prestaciones, dado que los diferentes núcleos compiten por el limitado ancho de banda de memoria, agravando la brecha entre las prestaciones del procesador y las de la memoria principal. Distintas técnicas atacan este problema, siendo las más relevantes el uso de jerarquías de caché multinivel y la prebúsqueda. Las cachés jerárquicas aprovechan la localidad temporal y espacial que en general presentan los programas en el acceso a los datos, para mitigar las enormes latencias de acceso a memoria principal. Para limitar el número de accesos a la memoria DRAM, fuera del chip, los procesadores actuales cuentan con grandes cachés de último nivel (LLC). Para mejorar su utilización y reducir costes, estas cachés suelen compartirse entre todos los núcleos del procesador. Este enfoque mejora significativamente el rendimiento de la mayoría de las aplicaciones en comparación con el uso de cachés privados más pequeños. Compartir la caché, sin embargo, presenta una problema importante: la interferencia entre aplicaciones. La prebúsqueda, por otro lado, trae bloques de datos a las cachés antes de que el procesador los solicite, ocultando la latencia de memoria principal. Desafortunadamente, dado que la prebúsqueda es una técnica especulativa, si no tiene éxito puede contaminar la caché con bloques que no se usarán. Además, las prebúsquedas interfieren con los accesos a memoria normales, tanto los del núcleo que emite las prebúsquedas como los de los demás. Esta tesis se centra en reducir la interferencia entre aplicaciones, tanto en las caché compartidas como en el acceso a la memoria principal. Para reducir la interferencia entre aplicaciones en el acceso a la memoria principal, el mecanismo propuesto en esta disertación regula la agresividad de cada prebuscador, activando o desactivando selectivamente algunos de ellos, dependiendo de su rendimiento individual y de los requisitos de ancho de banda de memoria principal de los otros núcleos. Con respecto a la interferencia en cachés compartidos, esta tesis propone dos técnicas de particionado para la LLC, las cuales otorgan más espacio de caché a las aplicaciones que progresan más lentamente debido a la interferencia entre aplicaciones. La primera propuesta de particionado de caché requiere hardware específico no disponible en procesadores comerciales, por lo que se ha evaluado utilizando un entorno de simulación. La segunda propuesta de particionado de caché presenta una familia de políticas que superan las limitaciones en el número de particiones y en el número de vías de caché disponibles mediante la agrupación de aplicaciones en clústeres y la superposición de particiones de caché, por lo que varias aplicaciones comparten las mismas vías. Dado que se ha implementado utilizando los mecanismos para el particionado de la LLC que presentan algunos procesadores Intel modernos, esta propuesta ha sido evaluada en una máquina real. Los resultados experimentales muestran que el mecanismo de prebúsqueda selectiva propuesto en esta tesis reduce el número de solicitudes de memoria principal en un 20%, cosa que se traduce en mejoras en la equidad del sistema, el rendimiento y el consumo de energía. Por otro lado, con respecto a los esquemas de partición propuestos, en comparación con un sistema sin particiones, ambas propuestas reducen la iniquidad del sistema en un promedio de más del 25%, independientemente de la cantidad de aplicaciones en ejecución, y esta reducción en la injusticia no afecta negativamente al rendimiento.Accessing main memory represents a major performance bottleneck in current processors, since the different cores compete among them for the limited offchip bandwidth, aggravating even more the so called memory wall. Several techniques have been applied to deal with the core-memory performance gap, with the most preeminent ones being prefetching and hierarchical caching. Hierarchical caches leverage the temporal and spacial locality of the accessed data, mitigating the huge main memory access latencies. To limit the number of accesses to the off-chip DRAM memory, current processors feature large Last Level Caches. These caches are shared between all the cores to improve the utilization of the cache space and reduce cost. This approach significantly improves the performance of most applications compared to using smaller private caches. Cache sharing, however, presents an important shortcoming: the interference between applications. Prefetching, on the other hand, brings data blocks to the caches before they are requested, hiding the main memory latency. Unfortunately, since prefetching is a speculative technique, inaccurate prefetches may pollute the cache with blocks that will not be used. In addition, the prefetches interfere with the regular memory requests, both the ones from the application running on the core that issued the prefetches and the others. This thesis focuses on reducing the inter-application interference, both in the shared cache and in the access to the main memory. To reduce the interapplication interference in the access to main memory, the proposed approach regulates the aggressiveness of each core prefetcher, and selectively activates or deactivates some of them, depending on their individual performance and the main memory bandwidth requirements of the other cores. With respect to interference in shared caches, this thesis proposes two LLC partitioning techniques that give more cache space to the applications that have their progress diminished due inter-application interferences. The first cache partitioning proposal requires dedicated hardware not available in commercial processors, so it has been evaluated using a simulation framework. The second proposal dealing with cache partitioning presents a family of partitioning policies that overcome the limitations in the number of partitions and the number of available ways by grouping applications and overlapping cache partitions, so multiple applications share the same ways. Since it has been implemented using the cache partitioning features of modern Intel processors it has been evaluated in a real machine. Experimental results show that the proposed selective prefetching mechanism reduces the number of main memory requests by 20%, which translates to improvements in unfairness, performance, and energy consumption. On the other hand, regarding the proposed partitioning schemes, compared to a system with no partitioning, both reduce unfairness more than 25% on average, regardless of the number of applications running in the multicore, and this reduction in unfairness does not negatively affect the performance.L'accés a la memòria principal en els processadors actuals suposa un important coll d'ampolla per a les prestacions, ja que els diferents nuclis competeixen pel limitat ample de banda de memòria, agreujant la bretxa entre les prestacions del processador i les de la memòria principal. Diferents tècniques ataquen aquest problema, sent les més rellevants l'ús de jerarquies de memòria cau multinivell i la prebusca. Les memòries cau jeràrquiques aprofiten la localitat temporal i espacial que en general presenten els programes en l'accés a les dades per mitigar les enormes latències d'accés a memòria principal. Per limitar el nombre d'accessos a la memòria DRAM, fora del xip, els processadors actuals compten amb grans caus d'últim nivell (LLC). Per millorar la seva utilització i reduir costos, aquestes memòries cau solen compartir-se entre tots els nuclis del processador. Aquest enfocament millora significativament el rendiment de la majoria de les aplicacions en comparació amb l'ús de caus privades més menudes. Compartir la memòria cau, no obstant, presenta una problema important: la interferencia entre aplicacions. La prebusca, per altra banda, porta blocs de dades a les memòries cau abans que el processador els sol·licite, ocultant la latència de memòria principal. Desafortunadament, donat que la prebusca és una técnica especulativa, si no té èxit pot contaminar la memòria cau amb blocs que no fan falta. A més, les prebusques interfereixen amb els accessos normals a memòria, tant els del nucli que emet les prebusques com els dels altres. Aquesta tesi es centra en reduir la interferència entre aplicacions, tant en les cau compartides com en l'accés a la memòria principal. Per reduir la interferència entre aplicacions en l'accés a la memòria principal, el mecanismo proposat en aquesta dissertació regula l'agressivitat de cada prebuscador, activant o desactivant selectivament alguns d'ells, en funció del seu rendiment individual i dels requisits d'ample de banda de memòria principal dels altres nuclis. Pel que fa a la interferència en caus compartides, aquesta tesi proposa dues tècniques de particionat per a la LLC, les quals atorguen més espai de memòria cau a les aplicacions que progressen més lentament a causa de la interferència entre aplicacions. La primera proposta per al particionat de memòria cau requereix hardware específic no disponible en processadors comercials, per la qual cosa s'ha avaluat utilitzant un entorn de simulació. La segona proposta de particionat per a memòries cau presenta una família de polítiques que superen les limitacions en el nombre de particions i en el nombre de vies de memòria cau disponibles mitjan¿ cant l'agrupació d'aplicacions en clústers i la superposició de particions de memòria cau, de manera que diverses aplicacions comparteixen les mateixes vies. Atès que s'ha implementat utilitzant els mecanismes per al particionat de la LLC que ofereixen alguns processadors Intel moderns, aquesta proposta s'ha avaluat en una màquina real. Els resultats experimentals mostren que el mecanisme de prebusca selectiva proposat en aquesta tesi redueix el nombre de sol·licituds a la memòria principal en un 20%, cosa que es tradueix en millores en l'equitat del sistema, el rendiment i el consum d'energia. Per altra banda, pel que fa als esquemes de particiónat proposats, en comparació amb un sistema sense particions, ambdues propostes redueixen la iniquitat del sistema en més d'un 25% de mitjana, independentment de la quantitat d'aplicacions en execució, i aquesta reducció en la iniquitat no afecta negativament el rendiment.Selfa Oliver, V. (2018). Adaptive Prefetching and Cache Partitioning for Multicore Processors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/112423TESI
    corecore