23 research outputs found
Recommended from our members
Computational methods for single cell RNA and genome assembly resolution using genetic variation
Genetic variation and natural selection have driven the evolutionary history on this planet and are responsible for creating us and all other life as we know it. Over the past several decades, the genomic revolution has allowed us to assess population variation across humans and other species and use that to link genotypes with phenotypes and infer evolutionary histories. In this thesis, I explore computational methods for using genetic variation to demultiplex and disambiguate complex data.
In single cell RNAseq, problems of batch effects, doublets, and ambient RNA are each sources of noise that impede our ability to infer the functional states of cells and compare them between experiments. One new popular new experimental design promising to solve each of these while also reducing experimental costs is mixturing multiple individuals' cells into a single experiment. In chapter 2, I present a method for clustering cells by genotype, calling doublets, and using the cross-genotype signal in singletons to estimate and remove ambient RNA. I compare this methods to other existing methods including one that requires \textit{a priori} information about the genotypes, and two which do not. I find that my method outperforms each of these methods across a wide range of data parameters and sample types.
In genome assembly, the recent higher throughput and lower cost of long read sequencing has revolutionized our ability to create reference quality genomes and has revitalized the assembly community. Now, massive efforts are taking place in the Darwin Tree of Life project and the Earth Biogenome project to create reference genomes for all multicelular eukaryotic life. This will create a scientific resource for the next generation of biological science, will serve as a conservation of data that could otherwise be lost in this time of mass extinction, and will allow for a much more broad understanding of evolution and the evolutionary history of life on Earth. While much progress has been made in data quality and assembly algorithms, some problems still exist. Until recently, the DNA input requirements for long read sequencing technologies made it impossible to sequence single individuals of these species with long reads. Also, high heterozygosity makes assembly more difficult due to the inherent ambiguity between heterozygous sequence versus paralogous sequence when confronted with inexact homology. One solution to the DNA input requirements would be to pool individuals, but this only increases the heterozygosity of the sample and reduces assembly quality. In chapter 3, we present the first high quality assembly of a single mosquito using new library preparation methods with reduced DNA requirements. This reduces the number of haplotypes to two, improving the assembly quality. In chapter 4, we further address the problems brought on by heterozygosity in assembly. I present a suite of tools that use the phasing consistency of multiple heterozygous sequences as a signal for physical linkage, thus using genetic variation to our advantage rather than as a challenge to overcome. This tool creates phased, linked assemblies and phasing aware scaffolding. Further, I provide a tool for phasing aware scaffolding on existing assemblies. This includes a novel haplotype phasing algorithm with some unique beneficial properties. It is robust to non-heterozygous variants as input and can detect and correct those genotypes. And it naturally extends to polyploid genomes.Wellcome Trus
Office space allocation by using mathematical programming and meta-heuristics
Office Space Allocation (OSA) is the task of efficient usage of spatial resources of an organisation. A common goal in a typical OSA problem is to minimise the wastage of space either by limiting the overuse or underuse of the facilities. The problem also contains a myriad of hard and soft constraints based on the preferences of respective organisations. In this thesis, the OSA variant usually encountered in academic institutions is investigated. Previous research in this area is rather sparse. This thesis provides a definition, extension, and literature review for the problem as well as a new parametrised data instance generator.
In this thesis, two main algorithmic approaches for tackling the OSA are proposed: The first one is integer linear programming. Based on the definition of several constraints and some additional variables, two different mathematical models are proposed. These two models are not strictly alternatives to each other. While one of them provides more performance for the types of instances it is applicable, it lacks generality. The other approach provides less performance; however, it is easier to apply this model to different OSA problems. The second algorithmic approach is based on metaheuristics. A three step process in heuristic development is followed. In the first step, general local search techniques (descent methods, threshold acceptance, simulated annealing, great deluge) traverse within the neighbourhood via random relocation and swap moves. The second step of heuristic development aims to investigate large sections of the whole neighbourhood greedily via very fast cost calculation, cost update, and search for best move procedures within an evolutionary local search framework. The final step involves refinements and hybridisation of best performing (in terms of solution quality) mathematical programming and meta-heuristic techniques developed in prior steps.
This thesis aims to be one of the pioneering works in the research area of OSA. The major contributions are: the analysis of the problem, a new parametrised data instance generator, mathematical programming models, and meta-heuristic approaches in order to extend the state-of-the art in this area
Office space allocation by using mathematical programming and meta-heuristics
Office Space Allocation (OSA) is the task of efficient usage of spatial resources of an organisation. A common goal in a typical OSA problem is to minimise the wastage of space either by limiting the overuse or underuse of the facilities. The problem also contains a myriad of hard and soft constraints based on the preferences of respective organisations. In this thesis, the OSA variant usually encountered in academic institutions is investigated. Previous research in this area is rather sparse. This thesis provides a definition, extension, and literature review for the problem as well as a new parametrised data instance generator.
In this thesis, two main algorithmic approaches for tackling the OSA are proposed: The first one is integer linear programming. Based on the definition of several constraints and some additional variables, two different mathematical models are proposed. These two models are not strictly alternatives to each other. While one of them provides more performance for the types of instances it is applicable, it lacks generality. The other approach provides less performance; however, it is easier to apply this model to different OSA problems. The second algorithmic approach is based on metaheuristics. A three step process in heuristic development is followed. In the first step, general local search techniques (descent methods, threshold acceptance, simulated annealing, great deluge) traverse within the neighbourhood via random relocation and swap moves. The second step of heuristic development aims to investigate large sections of the whole neighbourhood greedily via very fast cost calculation, cost update, and search for best move procedures within an evolutionary local search framework. The final step involves refinements and hybridisation of best performing (in terms of solution quality) mathematical programming and meta-heuristic techniques developed in prior steps.
This thesis aims to be one of the pioneering works in the research area of OSA. The major contributions are: the analysis of the problem, a new parametrised data instance generator, mathematical programming models, and meta-heuristic approaches in order to extend the state-of-the art in this area
An investigation of multi-objective hyper-heuristics for multi-objective optimisation
In this thesis, we investigate and develop a number of online learning selection choice function based hyper-heuristic methodologies that attempt to solve multi-objective unconstrained optimisation problems. For the first time, we introduce an online learning selection choice function based hyperheuristic framework for multi-objective optimisation. Our multi-objective hyper-heuristic controls and combines the strengths of three well-known multi-objective evolutionary algorithms (NSGAII, SPEA2, and MOGA), which are utilised as the low level heuristics. A choice function selection heuristic acts as a high level strategy which adaptively ranks the performance of those low-level heuristics according to feedback received during the search process, deciding which one to call at each decision point. Four performance measurements are integrated into a ranking scheme which acts as a feedback learning mechanism to provide knowledge of the problem domain to the high level strategy. To the best of our knowledge, for the first time, this thesis investigates the influence of the move acceptance component of selection hyper-heuristics for multi-objective optimisation. Three multi-objective choice function based hyper-heuristics, combined with different move acceptance strategies including All-Moves as a deterministic move acceptance and the Great Deluge Algorithm (GDA) and Late Acceptance (LA) as a nondeterministic move acceptance function.
GDA and LA require a change in the value of a single objective at each step and so a well-known hypervolume metric, referred to as D metric, is proposed for their applicability to the multi-objective optimisation problems. D metric is used as a way of comparing two non-dominated sets with respect to the objective space. The performance of the proposed multi-objective selection choice function based hyper-heuristics is evaluated on the Walking Fish Group (WFG) test suite which is a common benchmark for multi-objective optimisation. Additionally, the proposed approaches are applied to the vehicle crashworthiness design problem, in order to test its effectiveness on a realworld multi-objective problem. The results of both benchmark test problems demonstrate the capability and potential of the multi-objective hyper-heuristic approaches in solving continuous multi-objective optimisation problems. The multi-objective choice function Great Deluge Hyper-Heuristic (HHMO_CF_GDA) turns out to be the best choice for solving these types of problems
An investigation of multi-objective hyper-heuristics for multi-objective optimisation
In this thesis, we investigate and develop a number of online learning selection choice function based hyper-heuristic methodologies that attempt to solve multi-objective unconstrained optimisation problems. For the first time, we introduce an online learning selection choice function based hyperheuristic framework for multi-objective optimisation. Our multi-objective hyper-heuristic controls and combines the strengths of three well-known multi-objective evolutionary algorithms (NSGAII, SPEA2, and MOGA), which are utilised as the low level heuristics. A choice function selection heuristic acts as a high level strategy which adaptively ranks the performance of those low-level heuristics according to feedback received during the search process, deciding which one to call at each decision point. Four performance measurements are integrated into a ranking scheme which acts as a feedback learning mechanism to provide knowledge of the problem domain to the high level strategy. To the best of our knowledge, for the first time, this thesis investigates the influence of the move acceptance component of selection hyper-heuristics for multi-objective optimisation. Three multi-objective choice function based hyper-heuristics, combined with different move acceptance strategies including All-Moves as a deterministic move acceptance and the Great Deluge Algorithm (GDA) and Late Acceptance (LA) as a nondeterministic move acceptance function.
GDA and LA require a change in the value of a single objective at each step and so a well-known hypervolume metric, referred to as D metric, is proposed for their applicability to the multi-objective optimisation problems. D metric is used as a way of comparing two non-dominated sets with respect to the objective space. The performance of the proposed multi-objective selection choice function based hyper-heuristics is evaluated on the Walking Fish Group (WFG) test suite which is a common benchmark for multi-objective optimisation. Additionally, the proposed approaches are applied to the vehicle crashworthiness design problem, in order to test its effectiveness on a realworld multi-objective problem. The results of both benchmark test problems demonstrate the capability and potential of the multi-objective hyper-heuristic approaches in solving continuous multi-objective optimisation problems. The multi-objective choice function Great Deluge Hyper-Heuristic (HHMO_CF_GDA) turns out to be the best choice for solving these types of problems
LIPIcs, Volume 277, GIScience 2023, Complete Volume
LIPIcs, Volume 277, GIScience 2023, Complete Volum
2013 GREAT Day Program
SUNY Geneseo’s Seventh Annual GREAT Day.https://knightscholar.geneseo.edu/program-2007/1007/thumbnail.jp
2019 EURÄ“CA Abstract Book
Listing of student participant abstracts
Mathematical model of interactions immune system with Micobacterium tuberculosis
Tuberculosis (TB) remains a public health problem in the world, because of the increasing prevalence and treatment outcomes are less satisfactory. About 3 million people die each year and an estimated one third of the world's population infected with Mycobacterium Tuberculosis (M.tb) is latent. This is apparently related to incomplete understanding of the immune system in infection M.tb. When this has been known that immune responses that play a role in controlling the development of M.tb is Macrophages, T Lymphocytes and Cytokines as mediators. However, how the interaction between the two populations and a variety of cytokines in suppressing the growth of Mycobacterium tuberculosis germ is still unclear. To be able to better understand the dynamics of infection with M tuberculosis host immune response is required of a model.One interesting study on the interaction of the immune system with M.tb mulalui mathematical model approach. Mathematical model is a good tool in understanding the dynamic behavior of a system. With the mediation of mathematical models are expected to know what variables are most responsible for suppressing the growth of Mycobacterium tuberculosis germ that can be a more appropriate approach to treatment and prevention target is to develop a vaccine. This research aims to create dynamic models of interaction between macrophages (Macrophages resting, macrophages activated and macrophages infected), T lymphocytes (CD4 + T cells and T cells CD8 +) and cytokine (IL-2, IL-4, IL-10,IL-12,IFN-dan TNF-) on TB infection in the lung. To see the changes in each variable used parameter values derived from experimental literature. With the understanding that the variable most responsible for defense against Mycobacterium tuberculosis germs, it can be used as the basis for the development of a vaccine or drug delivery targeted so hopefully will improve the management of patients with tuberculosis. Mathematical models used in building Ordinary Differential Equations (ODE) in the form of differential equation systems Non-linear first order, the equation contains the functions used in biological systems such as the Hill function, Monod function, Menten- Kinetic Function. To validate the system used 4th order Runge Kutta method with the help of software in making the program Matlab or Maple to view the behavior and the quantity of cells of each population