177 research outputs found

    On coding labeled trees

    Get PDF
    Trees are probably the most studied class of graphs in Computer Science. In this thesis we study bijective codes that represent labeled trees by means of string of node labels. We contribute to the understanding of their algorithmic tractability, their properties, and their applications. The thesis is divided into two parts. In the first part we focus on two types of tree codes, namely Prufer-like codes and Transformation codes. We study optimal encoding and decoding algorithms, both in a sequential and in a parallel setting. We propose a unified approach that works for all Prufer-like codes and a more generic scheme based on the transformation of a tree into a functional digraph suitable for all bijective codes. Our results in this area close a variety of open problems. We also consider possible applications of tree encodings, discussing how to exploit these codes in Genetic Algorithms and in the generation of random trees. Moreover, we introduce a modified version of a known code that, in Genetic Algorithms, outperform all the other known codes. In the second part of the thesis we focus on two possible generalizations of our work. We first take into account the classes of k-trees and k-arch graphs (both superclasses of trees): we study bijective codes for this classes of graphs and their algorithmic feasibility. Then, we shift our attention to Informative Labeling Schemes. In this context labels are no longer considered as simple unique node identifiers, they rather convey information useful to achieve efficient computations on the tree. We exploit this idea to design a concurrent data structure for the lowest common ancestor problem on dynamic trees. We also present an experimental comparison between our labeling scheme and the one proposed by Peleg for static trees

    Non-weighted aggregate evaluation function of multi-objective optimization for knock engine modeling

    Get PDF
    In decision theory, the weighted sum model (WSM) is the best known Multi-Criteria Decision Analysis (MCDA) approach for evaluating a number of alternatives in terms of a number of decision criteria. Assigning weights is a difficult task, especially if the number of criteria is large and the criteria are very different in character. There are some problems in the real world which utilize conflicting criteria and mutual effect. In the field of automotive, the knocking phenomenon in internal combustion or spark ignition engines limits the efficiency of the engine. Power and fuel economy can be maximized by optimizing some factors that affect the knocking phenomenon, such as temperature, throttle position sensor, spark ignition timing, and revolution per minute. Detecting knocks and controlling the above factors or criteria may allow the engine to run at the best power and fuel economy. The best decision must arise from selecting the optimum trade-off within the above criteria. The main objective of this study was to proposed a new Non-Weighted Aggregate Evaluation Function (NWAEF) model for non-linear multi-objectives function which will simulate the engine knock behavior (non-linear dependent variable) in order to optimize non-linear decision factors (non-linear independent variables). This study has focused on the construction of a NWAEF model by using a curve fitting technique and partial derivatives. It also aims to optimize the nonlinear nature of the factors by using Genetic Algorithm (GA) as well as investigate the behavior of such function. This study assumes that a partial and mutual influence between factors is required before such factors can be optimized. The Akaike Information Criterion (AIC) is used to balance the complexity of the model and the data loss, which can help assess the range of the tested models and choose the best ones. Some statistical tools are also used in this thesis to assess and identify the most powerful explanation in the model. The first derivative is used to simplify the form of evaluation function. The NWAEF model was compared to Random Weights Genetic Algorithm (RWGA) model by using five data sets taken from different internal combustion engines. There was a relatively large variation in elapsed time to get to the best solution between the two model. Experimental results in application aspect (Internal combustion engines) show that the new model participates in decreasing the elapsed time. This research provides a form of knock control within the subspace that can enhance the efficiency and performance of the engine, improve fuel economy, and reduce regulated emissions and pollution. Combined with new concepts in the engine design, this model can be used for improving the control strategies and providing accurate information to the Engine Control Unit (ECU), which will control the knock faster and ensure the perfect condition of the engine

    An Empirical Study on the Influence of Genetic Operators for Molecular Docking Optimization

    Get PDF
    Evolutionary approaches to molecular docking typically use a real-value encoding with standard genetic operators. Mutation is usually based on Gaussian and Cauchy distributions whereas for crossover no special considerations are made. The choice of operators is important for an efficient algorithm for this problem. We investigate their effect by performing a locality, heritability and heuristic bias analysis. Our investigation focus on encoding properties and how the different variation operators affect them. It is important to understand the behavior and influence of these components in order to design new and more efficient evolutionary algorithms for the molecular docking problem. Results confirm that high locality is important and explain the behavior of different crossover and mutation operators. In addition, the heritability and heuristic bias study provides some insights in how the different crossover operators perform. Optimization runs in different instances of the problem support the analysis findings. The performance and behavior of the variation operators are consistent on several molecules

    Genetics and Genomics of Forest Trees

    Get PDF
    Forest tree genetics and genomics are advancing at an accelerated rate, thanks to recent developments in high-throughput, next-generation sequencing capabilities, and novel biostatistical tools. Population and landscape genetics and genomics have seen the rise of new approaches implemented in large-scale studies that employ the use of genome-wide sampling. Such studies have started to discern the dynamics of neutral and adaptive variation in nature and the processes that underlie spatially explicit patterns of genetic and genomic variation in nature. The continuous development of genetic maps in forest trees and the expansion of QTL and association mapping approaches contribute to the unravelling of the genotype-phenotype relationship and lead to marker-assisted and genome-wide selection. However, major challenges lie ahead. Recent literature suggests that species demography and genetic diversity have been affected both by climatic oscillations and anthropogenically induced stresses in a way calls into question the possibility of future adaptation. Moreover, the pace of contemporary environmental change presents a great challenge to forest tree populations and their ability to adapt, taking into consideration their life history characteristics. Several questions emerge that include, but are not limited to, the interpretation of forest tree genome surveillance and their structural/functional properties, the adaptive and neutral processes that have shaped forest tree genomes, the analysis of phenotypic traits relevant to adaptation (especially adaptation under contemporary climate change), the link between epigenetics/epigenomics and phenotype/genotype, and the use of genetics/genomics as well as genetic monitoring to advance conservation priorities

    Optimal distribution network reconfiguration using meta-heuristic algorithms

    Get PDF
    Finding optimal configuration of power distribution systems topology is an NP-hard combinatorial optimization problem. It becomes more complex when time varying nature of loads in large-scale distribution systems is taken into account. In the second chapter of this dissertation, a systematic approach is proposed to tackle the computational burden of the procedure. To solve the optimization problem, a novel adaptive fuzzy based parallel genetic algorithm (GA) is proposed that employs the concept of parallel computing in identifying the optimal configuration of the network. The integration of fuzzy logic into GA enhances the efficiency of the parallel GA by adaptively modifying the migration rates between different processors during the optimization process. A computationally efficient graph encoding method based on Dandelion coding strategy is developed which automatically generates radial topologies and prevents the construction of infeasible radial networks during the optimization process. The main shortcoming of the proposed algorithm in Chapter 2 is that it identifies only one single solution. It means that the system operator will not have any option but relying on the found solution. That is why a novel hybrid optimization algorithm is proposed in the third chapter of this dissertation that determines Pareto frontiers, as candidate solutions, for multi-objective distribution network reconfiguration problem. Implementing this model, the system operator will have more flexibility in choosing the best configuration among the alternative solutions. The proposed hybrid optimization algorithm combines the concept of fuzzy Pareto dominance (FPD) with shuffled frog leaping algorithm (SFLA) to recognize non-dominated suboptimal solutions identified by SFLA. The local search step of SFLA is also customized for power systems applications so that it automatically creates and analyzes only the feasible and radial configurations in its optimization procedure which significantly increases the convergence speed of the algorithm. In the fourth chapter, the problem of optimal network reconfiguration is solved for the case in which the system operator is going to employ an optimization algorithm that is automatically modifying its parameters during the optimization process. Defining three fuzzy functions, the probability of crossover and mutation will be adaptively tuned as the algorithm proceeds and the premature convergence will be avoided while the convergence speed of identifying the optimal configuration will not decrease. This modified genetic algorithm is considered a step towards making the parallel GA, presented in the second chapter of this dissertation, more robust in avoiding from getting stuck in local optimums. In the fifth chapter, the concentration will be on finding a potential smart grid solution to more high-quality suboptimal configurations of distribution networks. This chapter is considered an improvement for the third chapter of this dissertation for two reasons: (1) A fuzzy logic is used in the partitioning step of SFLA to improve the proposed optimization algorithm and to yield more accurate classification of frogs. (2) The problem of system reconfiguration is solved considering the presence of distributed generation (DG) units in the network. In order to study the new paradigm of integrating smart grids into power systems, it will be analyzed how the quality of suboptimal solutions can be affected when DG units are continuously added to the distribution network. The heuristic optimization algorithm which is proposed in Chapter 3 and is improved in Chapter 5 is implemented on a smaller case study in Chapter 6 to demonstrate that the identified solution through the optimization process is the same with the optimal solution found by an exhaustive search

    Efficient strategies for epistasis detection in genome-wide data

    Get PDF
    Genome-Wide Association Studies have been carried out with SNP array technology since 2005, identifying thousands of loci for a great many traits and diseases. There are now large data sources, such as UK biobank, that provide medical and genetic data of hundreds-of-thousands of people. However, there is a shortfall in the heritability explained for the phenotypes that have been assessed. One of the explanations for this deficit is interactions between genes, called epistasis, that are not detected and so part of the causation missed. In this thesis, I carry out a comprehensive review of the large number of available epistasis detection tools in the literature. This is followed by a simulation benchmarking study to assess the ability of a representative group of these tools to detect epistatic interactions. From these tools, BOOST, MDR and MPI3SNP found the most interactions in this simulation study. Next, I set out three possible strategies for searching in biobank scale data in order to find a best practices workflow. These were exhaustive searching, an approach tailored to the tools' strengths and by splitting the data into linkage disequilibrium-based haplotype blocks and reducing the computational load. A simulation study was devised that found a mixed approach, using both BOOST and MDR for different types of interactions. The final pipeline initially uses the BOOST algorithm to find pure epistatic interactions and filter out insignificant pairs of SNPs. Those remaining variants with large single-locus effect sizes are assessed with MDR for impure interactions. Those interactions that are identified are assessed for significance, effect size and heritability explained. Finally, validation is carried out across each interacting pair, incorporating numerous sources of a priori knowledge. This was applied to Atrial Fibrillation, Alzheimer's Disease and Parkinson's Disease, three diseases that have previously been assessed for interactions. Although no statistically significant results were identified, this approach demonstrated an increased amount of heritability explained, showing that some of the missing heritability could be accounted for this way. A downstream analysis method was devised, finding genes in linkage with the interacting loci, applying a number of functional annotations and searching STRING-db for evidence of known interactions. Finally, the study was extended to examine rare variants in rare disease congenital hypothyroidism. As a systemic disorder, it could potentially have pathological interacting mutations. After variant calling, four de novo variants were identified, potentially explaining the condition. Six related interactions were found, with one not present in the parents, so possibly explaining the condition. The mutations, present in TG and PDIA4 have evidence of an interaction in STRING-db and both being involved in thyroid hormone synthesis in the KEGG database. These contributions provide a novel, tested pipeline for identifying epistasis from GWAS data, as well as a corpus of simulated data for future researchers. A robust methodology is applied for testing resulting interactions statistically, as well as an approach for validating interactions by incorporating numerous data sources to find significant commonalities between variants

    Phylogeography and Population Structure of the Prairie Skink (Eumeces Septentrionalis)

    Get PDF
    The geographical and geneological limits of species are firmly rooted in historical and current processes. Phylogenetic studies focus on the historical aspect and examine character states to estimate ancestor-descendent relationships. The number of described species in the world has been estimated at 1.4 to 1.8 million and other estimates suggest that as many as 30 million species may exist (May, 1990; Wilson, 1992). Phylogenetic studies provide the information needed to delimit and classify these species based on their historical relationships. Studies on species population structure focus on the current and historical processes acting on a species. These studies use a variety of methods and estimate gene flow and allele frequencies across the species range. This thesis is composed of three chapters that examine the phylogeny and population structure of the prairie skink, Eumeces septentrionalis. Chapter 1 reviews some of the current methods available for examining population structure and justifies the m ethods used in this study. Chapter 2 examines the phylogenetic relationships within Eumeces septentrionalis using DMA sequence data from two portions of the mitochondrial genome. Specifically, populations from the northern subspecies, E. s. septentrionalis, are compared with the southern subspecies, E. s. obtusirostris. These data along with the phylogenetic species concept are then used to examine the placement of E. s. septentrionalis and E. s. obtusirostris as one or two distinct species. Chapter 3 focuses on the population structure of E. septentrionalis specifically with respect to the northern populations to examine the recolonization pattern following Pleistocene glaciation events. Two field seasons in the spring and summer of 2001 and 2002 were conducted for this study during which sixty-four tissue samples were collected from individuals in Canada, North Dakota, South Dakota, Minnesota, Wisconsin, and Kansas. ND4 (807 bp) and d-loop (~747 bp) regions of the mitochondrial genome (ND4 and d-loop) were sequenced from the collected samples, and these data were used in both phylogenetic and population structure analyses. Phylogenetic analyses demonstrated a substantial sequence divergence and reciprocal monophyly between the northern and southern subspecies of E. septentrionalis. Uncorrected pairwise distance values between the northern and southern subspecies ranged from 6.7 - 7.0%, and the monophyly of the northern and southern subspecies, E. s. septentrionalis and E. s. obtusirostris, were strongly supported by both maximum parsimony (bootstrap = 100) and maximum likelihood analyses. These results support the morphological differences found in previous studies and suggest that these two subspecies may be on separate evolutionary trajectories. The population structure of the prairie skink, E. septentrionalis, was examined using nested clade analyses, which revealed isolation by distance with restricted gene flow as the inferred geographical pattern for northern populations (E. s. septentrionalis). This pattern reflects the lack of overlapping haplotypes in distant populations and was found at both the hapiotype and upper clade levels. These results indicate that E. septentrionalis was likely subject to one or more vicariant events, and subsequently several localities probably acted as refugia and source populations during times of glacial advance and retreat
    corecore