10 research outputs found
Recommended from our members
Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging
Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, it becomes a challenging issue due to the huge amount of data recently collected making conventional clustering algorithms inappropriate. The use of swarm intelligence algorithms has shown promising results when applied to data clustering of moderate size due to their decentralized and self-organized behavior. However, these algorithms exhibit limited capabilities when large data sets are involved. In this paper, we developed a decentralized distributed big data clustering solution using three swarm intelligence algorithms according to MapReduce framework. The developed framework allows cooperation between the three algorithms namely particle swarm optimization, ant colony optimization and artificial bees colony to achieve largely scalable data partitioning through a migration strategy. This latter reaps advantage of the combined exploration and exploitation capabilities of these algorithms to foster diversity. The framework is tested using amazon elastic map-reduce service (EMR) deploying up to 192 computer nodes and 30 gigabytes of data. Parallel metrics such as speed-up, size-up and scale-up are used to measure the elasticity and scalability of the framework. Our results are compared with their counterparts big data clustering results and show a significant improvement in terms of time and convergence to good quality solution. The developed model has been applied to epigenetics data clustering according to methylation features in CpG islands, gene body, and gene promoter in order to study the epigenetics impact on aging. Experimental results reveal that DNA-methylation changes slightly and not aberrantly with aging corroborating previous studies
Using MapReduce Streaming for Distributed Life Simulation on the Cloud
Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
Bioinformatics
This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here
Computational Methods for the Analysis of Genomic Data and Biological Processes
In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality
Going viral : an integrated view on virological data analysis from basic research to clinical applications
Viruses are of considerable interest for several fields of life science research. The genomic richness of these entities, their environmen- tal abundance, as well as their high adaptability and, potentially, pathogenicity make treatment of viral diseases challenging. This thesis proposes three novel contributions to antiviral research that each concern analysis procedures of high-throughput experimen- tal genomics data. First, a sensitive approach for detecting viral genomes and transcripts in sequencing data of human cancers is presented that improves upon prior approaches by allowing de- tection of viral nucleotide sequences that consist of human-viral homologs or are diverged from known reference sequences. Sec- ond, a computational method for inferring physical protein contacts from experimental protein complex purification assays is put for- ward that allows statistically meaningful integration of multiple data sets and is able to infer protein contacts of transiently binding protein classes such as kinases and molecular chaperones. Third, an investigation of minute changes in viral genomic populations upon treatment of patients with the mutagen ribavirin is presented that first characterizes the mutagenic effect of this drug on the hepatitis C virus based on deep sequencing data.Viren sind von beträchtlichem Interesse für die biowissenschaftliche Forschung. Der genetische Reichtum, die hohe Vielfalt, wie auch die Anpassungsfähigkeit und mögliche Pathogenität dieser Organismen erschwert die Behandlung von viralen Erkrankungen. Diese Promotionsschrift enthält drei neuartige Beiträge zur antiviralen Forschung welche die Analyse von experimentellen Hochdurchsatzdaten der Genomik betreffen: erstens, ein sensitiver Ansatz zur Entdeckung viraler Genome und Transkripte in Sequenzdaten humaner Karzinome, der die Identifikation von viralen Nukleotidsequenzen ermöglicht, die von Referenzgenomen ab- weichen oder homolog zu humanen Faktoren sind. Zweitens, eine computergestützte Methode um physische Proteinkontakte von experimentellen Proteinkomplex-Purifikationsdaten abzuleiten welche die statistische Integration von mehreren Datensätzen erlaubt um insbesondere Proteinkontakte von flüchtig interagierenden Proteinklassen wie etwa Kinasen und Chaperonen aus den Daten ableiten zu können. Drittens, eine Untersuchung von kleinsten Änderungen viraler Genompopulationen während der Behandlung von Patienten mit dem Mutagen ribavirin die zum ersten Mal die mutagene Wirkung dieses Medikaments auf das Hepatitis C Virus mittels Tiefensequenzdaten nachweist
Mechanisms and Novel Therapeutic Approaches for Gynecologic Cancer
This book—entitled “Mechanisms and Novel Therapeutic Approaches for Gynecologic Cancer”—was edited as a Special Issue of Biomedicines, focusing on basic research such as genomics, epigenomics, and proteomics, as well as clinical research in the field of gynecologic oncology. The number of patients with gynecological cancer has been increasing worldwide due to its high lethality and lack of early detection tools and effective therapeutic interventions. In this regard, basic research on its pathophysiology and novel molecular targeting intervention is required to improve the prognosis of gynecologic cancer. This book contains 13 papers, including 8 original research papers and 5 reviews focusing on the basic research of gynecologic oncology. The reader can learn about state-of-the-art research and obtain extensive knowledge of the current advances in the field of gynecologic oncology. It is my hope that this book contributes towards the progress of gynecologic oncology