2,379 research outputs found

    Quality Diversity: Harnessing Evolution to Generate a Diversity of High-Performing Solutions

    Get PDF
    Evolution in nature has designed countless solutions to innumerable interconnected problems, giving birth to the impressive array of complex modern life observed today. Inspired by this success, the practice of evolutionary computation (EC) abstracts evolution artificially as a search operator to find solutions to problems of interest primarily through the adaptive mechanism of survival of the fittest, where stronger candidates are pursued at the expense of weaker ones until a solution of satisfying quality emerges. At the same time, research in open-ended evolution (OEE) draws different lessons from nature, seeking to identify and recreate processes that lead to the type of perpetual innovation and indefinitely increasing complexity observed in natural evolution. New algorithms in EC such as MAP-Elites and Novelty Search with Local Competition harness the toolkit of evolution for a related purpose: finding as many types of good solutions as possible (rather than merely the single best solution). With the field in its infancy, no empirical studies previously existed comparing these so-called quality diversity (QD) algorithms. This dissertation (1) contains the first extensive and methodical effort to compare different approaches to QD (including both existing published approaches as well as some new methods presented for the first time here) and to understand how they operate to help inform better approaches in the future. It also (2) introduces a new technique for encoding neural networks for evolution with indirect encoding that contain multiple sensory or output modalities. Further, it (3) explores the idea that QD can act as an engine of open-ended discovery by introducing an expressive platform called Voxelbuild where QD algorithms continually evolve robots that stack blocks in new ways. A culminating experiment (4) is presented that investigates evolution in Voxelbuild over a very long timescale. This research thus stands to advance the OEE community\u27s desire to create and understand open-ended systems while also laying the groundwork for QD to realize its potential within EC as a means to automatically generate an endless progression of new content in real-world applications

    A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies

    Get PDF
    As the scope of microbial surveys expands with the parallel growth in sequencing capacity, a significant bottleneck in data analysis is the ability to generate a biologically meaningful multiple sequence alignment. The most commonly used aligners have varying alignment quality and speed, tend to depend on a specific reference alignment, or lack a complete description of the underlying algorithm. The purpose of this study was to create and validate an aligner with the goal of quickly generating a high quality alignment and having the flexibility to use any reference alignment. Using the simple nearest alignment space termination algorithm, the resulting aligner operates in linear time, requires a small memory footprint, and generates a high quality alignment. In addition, the alignments generated for variable regions were of as high a quality as the alignment of full-length sequences. As implemented, the method was able to align 18 full-length 16S rRNA gene sequences and 58 V2 region sequences per second to the 50,000-column SILVA reference alignment. Most importantly, the resulting alignments were of a quality equal to SILVA-generated alignments. The aligner described in this study will enable scientists to rapidly generate robust multiple sequences alignments that are implicitly based upon the predicted secondary structure of the 16S rRNA molecule. Furthermore, because the implementation is not connected to a specific database it is easy to generalize the method to reference alignments for any DNA sequence

    Recovering Faces from Portraits with Auxiliary Facial Attributes

    Full text link
    Recovering a photorealistic face from an artistic portrait is a challenging task since crucial facial details are often distorted or completely lost in artistic compositions. To handle this loss, we propose an Attribute-guided Face Recovery from Portraits (AFRP) that utilizes a Face Recovery Network (FRN) and a Discriminative Network (DN). FRN consists of an autoencoder with residual block-embedded skip-connections and incorporates facial attribute vectors into the feature maps of input portraits at the bottleneck of the autoencoder. DN has multiple convolutional and fully-connected layers, and its role is to enforce FRN to generate authentic face images with corresponding facial attributes dictated by the input attribute vectors. %Leveraging on the spatial transformer networks, FRN automatically compensates for misalignments of portraits. % and generates aligned face images. For the preservation of identities, we impose the recovered and ground-truth faces to share similar visual features. Specifically, DN determines whether the recovered image looks like a real face and checks if the facial attributes extracted from the recovered image are consistent with given attributes. %Our method can recover high-quality photorealistic faces from unaligned portraits while preserving the identity of the face images as well as it can reconstruct a photorealistic face image with a desired set of attributes. Our method can recover photorealistic identity-preserving faces with desired attributes from unseen stylized portraits, artistic paintings, and hand-drawn sketches. On large-scale synthesized and sketch datasets, we demonstrate that our face recovery method achieves state-of-the-art results.Comment: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV

    Aligning Multiple Sequences with Genetic Algorithm

    Get PDF
    The alignment of biological sequences is a crucial tool in molecular biology and genome analysis. It helps to build a phylogenetic tree of related DNA sequences and also to predict the function and structure of unknown protein sequences by aligning with other sequences whose function and structure is already known. However, finding an optimal multiple sequence alignment takes time and space exponential with the length or number of sequences increases. Genetic Algorithms (GAs) are strategies of random searching that optimize an objective function which is a measure of alignment quality (distance) and has the ability for exploratory search through the solution space and exploitation of current results

    비교유전체학을 이용한 선충의 서브텔로미어 진화와 표현형 변이 연구

    Get PDF
    학위논문(박사)--서울대학교 대학원 :자연과학대학 생명과학부,2020. 2. 이준호.CB4856 계통의 유전체를 N2의 표준 유전체와 비교하였다. CB4856 유전체는 Pacific Biosciences (PacBio) 사의 RSII 기법을 활용해 염기서열 분석을 진행하였고(80×, N50 리드 길이 11.8 kb), 이후 유전체 이어붙이기 과정을 거쳐 염색체에 가까운 수준(76 contigs, N50 contig 2.8 Mb)으로 완성할 수 있었다. 두 유전체를 비교한 결과 2,694개 유전자에서 구조 변이를 확인할 수 있었고 그 중 상당수는 염색체 바깥쪽에 몰려있었다. 염색체 끝에 인접한 서브텔로미어(subtelomere) 지역은 가장 구조 변이가 심각한 지역으로, 그 중에는 새롭게 서브텔로미어가 생겨난 곳도 있었다. 5번 염색체 오른쪽의 서브텔로미어 구조는 CB4856 계통의 조상에서 텔로미어(telomere) 손상이 일어났고, 텔로머레이즈(telomerase) 유전자가 분명 존재했음에도 그 대신 대안적 텔로미어 연장(Alternative Lengthening of telomeres)을 통해 손상이 회복됐으며, 이후 절단 유도 복제(break-induced replication)이 일어나면서 새롭게 서브텔로미어가 형성됐다는 것을 암시하고 있다. 본 연구는 구조 변이와 새로운 서브텔로미어를 포함한 상당한 유전체 변화가 한 종 내에서도 유지될 수 있고, 이러한 변화가 종 내의 유전다양성을 높일 수 있다는 것을 보여준다. 다음으로, 예쁜꼬마선충의 근연종이면서도 성별(암수한몸, 암컷, 수컷)과 행동(튜브 닉테이션)에서 확연한 차이를 보이는 Auanema freiburgensis와 Auanema sp. APS14 두 종의 유전체 초안 또한 본 연구에서 분석됐다. A. freiburgensis와 Auanema sp. APS14의 유전체는 각각 PacBio RSII (270×, N50 리드 길이 12.5 kb)와 Oxford Nanopore Technologies (ONT) 사의 MinION (113×, N50 리드 길이 3.6 kb)을 통해 염기서열 이 분석됐으며, 유전체 이어붙이기 결과 예쁜꼬마선충(~100 Mb)에 비해 유전체 크기 또한 상당히 작다는 것(각각 55 Mb와 69 Mb) 또한 확인되었다. 이 두 유전체는 어떻게 유전체 내에 생긴 변화가 새로운 형질의 진화에 영향을 줄 수 있었을지 이해하는 데에 기여할 수 있을 것으로 내다본다.Long-read sequencing technologies have contributed greatly to comparative genomics among species and can also be applied to study genomics within a species. In this study, to determine how substantial genomic changes are generated and tolerated within a species, a C. elegans strain, CB4856, was sequenced which is one of the most genetically divergent strains compared to the N2 reference strain. For this comparison, the Pacific Biosciences (PacBio) RSII platform (80×, N50 read length 11.8 kb) was used and de novo genome assembly were generated to the level of pseudochromosomes containing 76 contigs (N50 contig = 2.8 Mb). I identified structural variations that affected as many as 2,694 genes, most of which are at chromosome arms. Subtelomeric regions contained the most extensive genomic rearrangements, which even created new subtelomeres in some cases. The subtelomere structure of Chromosome VR implies that ancestral telomere damage was repaired by alternative lengthening of telomeres even in the presence of a functional telomerase gene and that a new subtelomere was formed by break-induced replication. My study demonstrates that substantial genomic changes including structural variations and new subtelomeres can be tolerated within a species, and that these changes may accumulate genetic diversity within a species. Secondly, I also assembled draft genomes of two C. elegans relative species, Auanema freiburgensis and Auanema sp. APS14, which have and a distinct reproductive (three genders; male, female, and hermaphrodite) and behavioral repertoire (tube-nictation). A. freiburgensis and Auanema sp. APS14 were sequenced using the PacBio RSII (270×, N50 read length 12.5 kb) and the Oxford Nanopore Technologies (ONT) MinION platforms (113×, N50 read length 3.6 kb), respectively, and their reads were assembled as smaller genomes (55 and 69 Mb, respectively) compared to that of C. elegans (~100 Mb). Comparative genomic studies of these genomes will help understand how genomic changes in close relative species affect evolution of novel traits.Chapter 1. Introduction 1 Long-read sequencing and de novo genome assembly 2 Caenorhabditis and Caenorhabditis elegans as a model system for comparative genomics 2 Repetitive nature of subtelomere and the trace of alternative lengthening of telomeres (ALT) in subtelomeric regions 3 Phenotypic diversity in the genus Auanema 4 Purposes of the study 6 Materials and Methods 7 Chapter II. De novo genome assembly of the CB4856 genome and subtelomere evolution via past ALT events in C. elegans 17 Part I. De novo genome assembly of the CB4856 genome and structural variants compared to the reference strain, N2 18 Long-read sequencing and de novo assembly of the CB4856 genome 18 Long-read sequencing identified new structural variations 19 Part II. Subtelomere evolution via past ALT events in C. elegans 21 Long-read sequencing revealed the hypervariable nature of subtelomeres 21 The structure of Chr VR subtelomere is unique, in consequence of past ALT and BIR events 21 New genes in the subtelomeric region 22 Chapter III. Phenotypic characterization of Korean nematodes and draft genome assembly of two Auanema species 24 Korean nematode collection 25 Phenotypic diversification in the genus Auanema 25 Highly contiguous genome assembly using two long-read sequencing technologies 26 Chapter IV. Discussion 28 Enrichment of genetic variations in chromosome arms and subtelomeres by background selection and error-prone recombination 29 New subtelomere formation by ALT and BIR 30 References 78 Abstract in Korean 87 Acknowledgement 88Docto

    Segmentally Variable Genes: A New Perspective on Adaptation

    Get PDF
    Genomic sequence variation is the hallmark of life and is key to understanding diversity and adaptation among the numerous microorganisms on earth. Analysis of the sequenced microbial genomes suggests that genes are evolving at many different rates. We have attempted to derive a new classification of genes into three broad categories: lineage-specific genes that evolve rapidly and appear unique to individual species or strains; highly conserved genes that frequently perform housekeeping functions; and partially variable genes that contain highly variable regions, at least 70 amino acids long, interspersed among well-conserved regions. The latter we term segmentally variable genes (SVGs), and we suggest that they are especially interesting targets for biochemical studies. Among these genes are ones necessary to deal with the environment, including genes involved in host–pathogen interactions, defense mechanisms, and intracellular responses to internal and environmental changes. For the most part, the detailed function of these variable regions remains unknown. We propose that they are likely to perform important binding functions responsible for protein–protein, protein–nucleic acid, or protein–small molecule interactions. Discerning their function and identifying their binding partners may offer biologists new insights into the basic mechanisms of adaptation, context-dependent evolution, and the interaction between microbes and their environment. Segmentally variable genes show a mosaic pattern of one or more rapidly evolving, variable regions. Discerning their function may provide new insights into the forces that shape genome diversity and adaptationNational Science Foundation (998088, 0239435

    Bioinformatics tools for analysing viral genomic data

    Get PDF
    The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing

    MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression

    Get PDF
    Background: Metagenomics is a genomics research discipline devoted to the study of microbial communities in environmental samples and human and animal organs and tissues. Sequenced metagenomic samples usually comprise reads from a large number of different bacterial communities and hence tend to result in large file sizes, typically ranging between 1–10 GB. This leads to challenges in analyzing, transferring and storing metagenomic data. In order to overcome these data processing issues, we introduce MetaCRAM, the first de novo, parallelized software suite specialized for FASTA and FASTQ format metagenomic read processing and lossless compression. Results: MetaCRAM integrates algorithms for taxonomy identification and assembly, and introduces parallel execution methods; furthermore, it enables genome reference selection and CRAM based compression. MetaCRAM also uses novel reference-based compression methods designed through extensive studies of integer compression techniques and through fitting of empirical distributions of metagenomic read-reference positions. MetaCRAM is a lossless method compatible with standard CRAM formats, and it allows for fast selection of relevant files in the compressed domain via maintenance of taxonomy information. The performance of MetaCRAM as a stand-alone compression platform was evaluated on various metagenomic samples from the NCBI Sequence Read Archive, suggesting 2- to 4-fold compression ratio improvements compared to gzip. On average, the compressed file sizes were 2-13 percent of the original raw metagenomic file sizes. Conclusions: We described the first architecture for reference-based, lossless compression of metagenomic data. The compression scheme proposed offers significantly improved compression ratios as compared to off-the-shelf methods such as zip programs. Furthermore, it enables running different components in parallel and it provides the user with taxonomic and assembly information generated during execution of the compression pipeline. Availability: The MetaCRAM software is freely available at http://web.engr.illinois.edu/~mkim158/metacram.html. The website also contains a README file and other relevant instructions for running the code. Note that to run the code one needs a minimum of 16 GB of RAM. In addition, virtual box is set up on a 4GB RAM machine for users to run a simple demonstration
    corecore