36 research outputs found
Algorithms for Analysis of Heterogeneous Cancer and Viral Populations Using High-Throughput Sequencing Data
Next-generation sequencing (NGS) technologies experienced giant leaps in recent years. Short read samples reach millions of reads, and the number of samples has been growing enormously in the wake of the COVID-19 pandemic. This data can expose essential aspects of disease transmission and development and reveal the key to its treatment. At the same time, single-cell sequencing saw the progress of getting from dozens to tens of thousands of cells per sample. These technological advances bring new challenges for computational biology and require the development of scalable, robust methods to deal with a wide range of problems varying from epidemiology to cancer studies.
The first part of this work is focused on processing virus NGS data. It proposes algorithms that can facilitate the initial data analysis steps by filtering genetically related sequencing and the tool investigating intra-host virus diversity vital for biomedical research and epidemiology.
The second part addresses single-cell data in cancer studies. It develops evolutionary cancer models involving new quantitative parameters of cancer subclones to understand the underlying processes of cancer development better
AIRO 2016. 46th Annual Conference of the Italian Operational Research Society. Emerging Advances in Logistics Systems Trieste, September 6-9, 2016 - Abstracts Book
The AIRO 2016 book of abstract collects the contributions from the conference participants.
The AIRO 2016 Conference is a special occasion for the Italian Operations Research community, as AIRO annual conferences turn 46th edition in 2016. To reflect this special occasion, the Programme and Organizing Committee, chaired by Walter Ukovich, prepared a high quality Scientific Programme including the first initiative of AIRO Young, the new AIRO poster section that aims to promote the work of students, PhD students, and Postdocs with an interest in Operations Research.
The Scientific Programme of the Conference offers a broad spectrum of contributions covering the variety of OR topics and research areas with an emphasis on “Emerging Advances in Logistics Systems”.
The event aims at stimulating integration of existing methods and systems, fostering communication amongst different research groups, and laying the foundations for OR integrated research projects in the next decade.
Distinct thematic sections follow the AIRO 2016 days starting by initial presentation of the objectives and features of the Conference. In addition three invited internationally known speakers will present Plenary Lectures, by Gianni Di Pillo, Frédéric Semet e Stefan Nickel, gathering AIRO 2016 participants together to offer key presentations on the latest advances and developments in OR’s research
Visualization Tools for Comparative Genomics applied to Convergent Evolution in Ash Trees
Assembly and analysis of whole genomes is now a routine part of genetic
research, but effective tools for the visualization of whole genomes and their
alignments are few. Here we present two approaches to allow such visualizations
to be done in an efficient and user-friendly manner. These allow researchers to
spot problems and patterns in their data and present them effectively.
First, FluentDNA is developed to tackle single full genome visualization and
assembly tasks by representing nucleotides as colored pixels in a zooming
interface. This enables users to identify features without relying on algorithmic
annotation. FluentDNA also supports visualizing pairwise alignments of wellassembled whole genomes from chromosome to nucleotide resolution.
Second, Pantograph is developed to tackle the problem of visualizing variation
among large numbers of whole genome sequences. This uses a graph genome
approach, which addresses many of the technical challenges of whole genome
multiple sequence alignments by representing aligned sequences as nodes which
can be shared by many individuals. Pantograph is capable of scaling to thousands
of individuals and is applied to SARS and A. thaliana pangenomes.
Alongside the development of these new genomics tools, comparative genomic
research was undertaken on worldwide species of ash trees. I assembled 13 ash
genomes and used FluentDNA to quality check the results and discovered
contaminants and a mitochondrial integration. I annotated protein coding genes
in 28 ash assemblies and aligned their gene families. Using phylogenetic analysis,
I identified gene duplications that likely occurred in an ancient whole genome
duplication shared by all ash species. I examined the fate of these duplicated
genes, showing that losses are concentrated in a subset of gene families more
often than predicted by a null model simulation. I conclude that convergent
evolution has occurred in the loss and retention of duplicated genes in different
ash species.BBSRC BB/S004661/
The effect of genetic variation at the immunoglobulin heavy chain variable region gene loci on biases in the generation of the human primary antibody repertoire
The human primary antibody repertoire must be incredibly diverse in order to combat a constantly evolving array of pathogens. Random events contribute to the repertoire diversity that is created within an individual on a daily basis. Similarly, much of the genetic variation existing between individuals and populations at the immunoglobulin loci has been generated via stochastic processes. However, deviations from randomness have been detected both during the processes that generate the primary repertoire within an individual and during the evolution of the immunoglobulin genes. This complex set of events and genes has been notoriously difficult to investigate. However, next-generation sequencing technologies have recently allowed the creation of datasets containing thousands of rearranged sequences. This has allowed great insight into the events taking place during the formation of a B cell in the bone marrow and into genetic diversity within and between human populations. The information contained in such large datasets allows many questions to be asked. Preferential IGHD-IGHJ pairing has been reported, but the mechanism involved is unclear. Using very large datasets and the fact that VDJ rearrangement is an intrachromosomal event, complex patterns were detected. These patterns changed in a predictable way in individuals carrying IGHD deletion polymorphisms, suggesting a strong positional influence and the involvement of other factors too. Results also suggest that the recombinase associates with an IGHJ gene before associating with an IGHD partner. Allele frequencies of structural IGH polymorphisms vary between different ethnic human populations. Differentiation of sequence variants of the IGH locus was investigated in individuals from six different ethnic backgrounds, including a rarely studied Amerindian population. Significant differentiation was observed between several pairs of populations. This work has important implications for a more personalised approach to vaccines and therapeutics. Analysis of these diverse populations suggested the existence of many novel unreported allelic variants. These alleles were then included into the repertoire used previously to study the evolution of the subregions of IGHV genes. The analysis produced remarkably similar results, suggesting that the implications for selection in these regions will not change regardless of the number of extra alleles that remain to be reported
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
Introducing distributed dynamic data-intensive (D3) science: Understanding applications and infrastructure
A common feature across many science and engineering applications is the
amount and diversity of data and computation that must be integrated to yield
insights. Data sets are growing larger and becoming distributed; and their
location, availability and properties are often time-dependent. Collectively,
these characteristics give rise to dynamic distributed data-intensive
applications. While "static" data applications have received significant
attention, the characteristics, requirements, and software systems for the
analysis of large volumes of dynamic, distributed data, and data-intensive
applications have received relatively less attention. This paper surveys
several representative dynamic distributed data-intensive application
scenarios, provides a common conceptual framework to understand them, and
examines the infrastructure used in support of applications.Comment: 38 pages, 2 figure
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
Molecular analysis of the breeding biology of the Asian arowana (Scleropages formosus)
Ph.DDOCTOR OF PHILOSOPH
Bioinformatics and Next Generation Sequencing: Applications of Arthropod Genomes
Over the past decade, the Next Generation Sequencing (NGS) technology has been broadly applied in many areas such as genomics, medical diagnosis, biotechnology, virology, biological systematics, forensic biology, and anthropology. Taken together, it has offered us brilliant insights into life sciences. Most of the work presented in this thesis describes NGS applications on genome assembly, genome annotation, and comparative genomics, using arthropods as case studies: (1) by sequencing and analyzing the genomes of three Tetranychus spider mites with three completely different feeding behaviors, we uncovered genomic signature variations and indicative of pest adaptations; (2) we sequenced, assembled and annotated five Brevipalpus flat mite genomes and their corresponding endosymbiont Cardinium genomes. Comparative genomics reveals herbivorous pest adaptations and parthenogenesis; (3) the complete genomic analysis of parasitoid wasp Copidosoma floridanum indicates the mechanism of polyembryony of such primary parasite of moths. By bioinformatics and genomics approaches, my study provides the genomic basis and establishes the hypotheses for the future biology in pest and arthropod researches. These NGS applications of arthropod genomes will offer new insights into arthropod evolution and plant-herbivore interactions, open unique opportunities to develop novel plant protection strategies, and additionally, provide arthropod genomic resources as well