258 research outputs found

    BOSS: Bayesian Optimization over String Spaces

    Get PDF
    This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops. Recent applications of BO over strings have been hindered by the need to map inputs into a smooth and unconstrained latent space. Learning this projection is computationally and data-intensive. Our approach instead builds a powerful Gaussian process surrogate model based on string kernels, naturally supporting variable length inputs, and performs efficient acquisition function maximization for spaces with syntactical constraints. Experiments demonstrate considerably improved optimization over existing approaches across a broad range of constraints, including the popular setting where syntax is governed by a context-free grammar

    BOSS: Bayesian Optimization over String Spaces

    Get PDF
    This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops. Recent applications of BO over strings have been hindered by the need to map inputs into a smooth and unconstrained latent space. Learning this projection is computationally and data-intensive. Our approach instead builds a powerful Gaussian process surrogate model based on string kernels, naturally supporting variable length inputs, and performs efficient acquisition function maximization for spaces with syntactical constraints. Experiments demonstrate considerably improved optimization over existing approaches across a broad range of constraints, including the popular setting where syntax is governed by a context-free grammar

    Concept Learning By Example Decomposition

    Get PDF
    For efficient understanding and prediction in natural systems, even in artificially closed ones, we usually need to consider a number of factors that may combine in simple or complex ways. Additionally, many modern scientific disciplines face increasingly large datasets from which to extract knowledge (for example, genomics). Thus to learn all but the most trivial regularities in the natural world, we rely on different ways of simplifying the learning problem. One simplifying technique that is highly pervasive in nature is to break down a large learning problem into smaller ones; to learn the smaller, more manageable problems; and then to recombine them to obtain the larger picture. It is widely accepted in machine learning that it is easier to learn several smaller decomposed concepts than a single large one. Though many machine learning methods exploit it, the process of decomposition of a learning problem has not been studied adequately from a theoretical perspective. Typically such decomposition of concepts is achieved in highly constrained environments, or aided by human experts. In this work, we investigate concept learning by example decomposition in a general probably approximately correct (PAC) setting for Boolean learning. We develop sample complexity bounds for the different steps involved in the process. We formally show that if the cost of example partitioning is kept low then it is highly advantageous to learn by example decomposition. To demonstrate the efficacy of this framework, we interpret the theory in the context of feature extraction. We discover that many vague concepts in feature extraction, starting with what exactly a feature is, can be formalized unambiguously by this new theory of feature extraction. We analyze some existing feature learning algorithms in light of this theory, and finally demonstrate its constructive nature by generating a new learning algorithm from theoretical results

    Landscape generator : method to generate plausible landscape configurations for participatory spatial plan-making

    Get PDF
    Contemporary regional spatial plan-making in the Netherlands is characterized as a complex process wherein multiple actors, with different levels of interests and demands, try to commonly develop a coherent and comprehensive set of future plan scenarios. The construction of the set of spatial plan scenarios is the core activity of each regional spatial planning process and is often unique and tailored to the specific context and policy objectives formulated for a plan area. Modern collaborative scenario construction is complex due to a variety of participating actors, as public planners, domain experts and non-experts as interest groups and landowners. The level of participation of the non-expert group varies from process to process, but for effective spatial scenarios it is important to ergonomically construct, surprising and plausible scenarios with vivid, proximate and concrete content. The last decades, many attempts have been undertaken to support plan scenario development with digital systems, with strong emphasis on the analytical capabilities of computers. Little attention, however is given to the development of intuitive sketch and design tools and methods, that support the interactive process of large-scale collaborative multi-level plan design, by visualizing and modeling comprehensive landscape scenarios down to the level of cadastral lots. Therefore, the main objective of this research is to develop and evaluate a method, that generates plausible landscape configurations by using user-defined landscape typologies, as a digital support tool for participatory spatial plan-making. To enable the effective design and modeling of vivid and plausible future spatial scenarios, there is a need for a method which supports the two main steps of plan scenario construction in Simlandscape. Simlandscape introduces a rich set of instruments and procedures in order to construct a diverse and coherent scenario set that supports communication and social learning and that facilitate a better informed decision-making process. The central notion in Simlandscape is that actual transformation of the landscape takes place at the ownership lot level. Through construction of strategic spatial scenarios down to the level of individual or clustered lots, comprehensive qualitative and quantitative evaluation becomes possible. Design instruments are proposed, that are intuitive in supporting the funneling creative design process from abstract and general sketches to specific and detailed economic function allocation and landscape layout modeling. The latter activity is supported by the definition and allocation of landscape lot typologies with (non-spatial) attributes. The first step in plan scenario construction in Simlandscape consists of the distribution and allocation of landscape lot typologies to lot geometries. This step poses a complex problem, which can be manually as well as automatically be solved, but is not the core of this research. The second step, assumes that a landscape lot typology is allocated to a lot geometry, and contains generation of a plausible landscape configuration, based on the attributes of the landscape lot typology. This step can also be done manually, but is very time-consuming for a total plan area involved. Therefore, automatic generation of a plausible landscape configuration, based on the properties of the allocated landscape lot typology is important and central subject in this research. The automatic generation of landscape configuration is part of the research field called ‘generative modeling’. In chapter 2, the most of the established existing generative approaches in generative landscape modeling are reviewed for their applicability and relevance as the base for the method to generate plausible landscape configurations from landscape lot typologies. In spatial planning literature, four important more or less distinct fields of research are identified which offer directly or indirectly approaches for developing a generative method: 1) procedural modeling, 2) spatial multi-objective optimization modeling, 3) cellular automata and 4) multi-agent systems. The approaches to generate landscape configurations provide several points of departure. Unfortunately, none of the current approaches is directly applicable for the addressed objective in this research. Procedural modeling techniques as shape or landscape grammars are able to produce, or support the creation of detailed, appealing and realistic landscape visualizations. Due to this level of detail of modeling, the process of inference to identify relevant objects and mutual relations in reality, is complex due to the large number of objects and relations to be modeled. Moreover, the ambiguous character of the relations between objects provides large difficulties in identifying objective and generic rules. Spatial multi-objective optimization modeling in spatial planning problems, as linear integer programming, genetic algorithms and simulated annealing, have a strong theoretical base and are applied frequently in spatial planning literature to provide ‘the most favourable’ landscape and plan layout in terms of minimal development costs. More recently, also general spatial shape objectives are included in the multi-objective functions devised. The research objectives in these studies however, are often restricted to a level of layout planning which is less detailed than the objective stated in this research. A direct consequence is that shape objectives are in general terms of compactness and solely defined at the land-use class level. Furthermore, the number of land-uses to be allocated and the site to be modelled is kept relatively small. These features are enough to provide a proof of principle, but not to deal with realistic planning challenges. Cellular automata and multi-agent systems provide robust frameworks to realistically model subject and object interactions in space and time. However, the non-deterministic behavior and outcomes of the model runs make them less suitable to generate plausible landscape configurations as defined in this research. Chapter 3 describes the (development of the) landscape generator, that is compatible with the regional plan scenario development approach identified in Simlandscape. The landscape generator uses landscape types as building blocks of plan scenarios. A landscape typology describes a proposed future spatial development and contains spatial and (non)spatial (descriptive) attributes. A 2D reference image indirectly provides objective compositional and configurational characteristics of the proposed development. In essence, users allocate a landscape typology to a cadastral lot typology and based on this information, the landscape generator produces a comprehensive landscape configuration. The landscape generator is developed as a multi-objective heuristic optimization modeling approach. In this approach a sequentially updated multi-objective function is optimized for a two-dimensional allocation site. It is assumed that the site is homogeneous in physical characteristics (e.g. height, soil etc.). The multi-objective function is compiled from an available library of single spatial attributes. These spatial attributes and their target values are retrieved from the compositional and configurational characteristics present in the reference image of the landscape typology. Examples from the available spatial attributes are the number of landscape component instances, the relative size of each component or each component instance, compactness and shape of component instances and direct adjacency between two different landscape components. In a hypothetical case study, the capabilities and behavior of the landscape generator are demonstrated. In the case study, the landscape generator generates a variety of landscape configurations for a hypothetical allocation site (20x20 cells) and a rural forest estate as allocated landscape typology. The reference image of the rural forest estate provides detailed information for the compilation of the multi-objective function. The landscape generator contains probabilistic elements (e.g. random starting situation, near-random cell swap), which results in different output, each time it is run with identical input settings. The landscape generator is capable of producing a range of landscape configurations for a variety of situations. A unique situation is defined by the allocation of one landscape typology to one allocation site. Theoretically, since the method is based on the objective measurement of spatial characteristics present in a reference image, each user-defined typology can be used for a selected allocation site. The landscape typologies cannot be allocated to every imaginable dimensioned allocation site, but are bounded by the spatial extent which specifies a valid spatial extent. At the heart of the method lies the compilation of the multi-objective function. Ideally, this compilation can be executed completely objectively and without user-interaction, as the reference image of the landscape typology provides the required information. In the current prototype version of the landscape generator, however, the compilation process is partly (and in advance) controlled by the modeler. The modeler needs to specify which of the available spatial attributes to include, in which sequence to optimize them and what attribute target values to specify. Surely, the modeler is informed by statistics calculated for the reference image. An important task is to define consistent guidelines for the compilation of the multi-objective function from each landscape typology, irrespective of the properties of a valid allocation site. In this research, the modeler has been able to define specific guidelines for each landscape typology. In the current state of the method, a continuous assessment, through iterative testing, needs to be made by the modeler, about which compilation is sufficient in producing plausible configurations and which compilation process produces solutions within reasonable computation times. In chapter 4, a method is presented to obtain insight in the usability of the landscape generator. The produced landscape configurations are extensively evaluated in an extensive internet-based validation experiment. For a broad variety of different situations, landscape configurations are generated by the landscape generator for realistically dimensioned and enclosed sites. The configurations are compared with professional hand-drawn configurations, by a large group of planning professionals. The subjects are provided an interactive, user-friendly web-based inquiry, in which they are requested to (graphically) rank order a random selection out of a total set of landscape configurations (hand-made or computer-generated), from ‘most to least plausible’. The population is not informed about the difference in production process of each landscape configuration. In the experiment a distinction is made between subjective and objective plausibility, representing design quality aspects and representativeness of the landscape typology respectively. Eight different situations (three subjective and five objective) are assessed by the group of respondents and analyzed with a modified version of an approved statistical method, known as ‘the law of comparative judgement’. In addition, to indicate points of interest for further improvements of the methodology, implicit and explicit dimensions of evaluation used by the respondents for each of the objective assessments are identified. The implicit dimensions are identified using linear regression analysis, with single spatial metric properties of the configurations as explanatory variables. To identify explicit dimensions of evaluation the respondents are asked for two of the earlier presented situations, to select five pre-defined used dimensions of evaluation. The current experiment setup provides a robust method as well as reliable results about the capability of the landscape generator to produce plausible landscape configurations. With its modern interactive web interface, its well-balanced data scheme (randomness, several situations) and the use of approved statistical methods, the experiment finds a balance between maximum effective information retrieval and an acceptable level of user workload. In chapter 5, the results of the validation experiment are presented and in chapter 6 these results are analyzed. For each of the three assignments of the design quality test, it is concluded that the whole set of computer-generated configurations is not of comparable design quality as the whole set of professional configurations. Several individual computer-generated landscape configurations have comparable design quality as the professional configurations. The landscape generator is able to produce configurations with landscape components which are with respect to its individual area, shape and relative adjacency plausible. The overall structure is, however, often perceived as near-random. In some situations this is regarded plausible, while in other situations it is regarded implausible. The results of the four analyzed assignments of the representativeness test show a more favorable view on the capabilities of the landscape generator. In half of the cases, the whole set of computer-generated configurations are considered comparable in representativeness to professional onfigurations. In the other half, several individual computer-generated are considered of comparable representativeness. The representativeness test is most important in plausibility validation of the landscape generator, as the primary objective of the research implies that each actor (with different levels of design experience) should be able to provide her development idea (described in the landscape typology) as a comprehensive visualization in an integrated plan scenario. In the initial planning phases of application of the landscape generator, it is more important to obtain a first impression of the impacts (visual and analytical) of a plan scenario than a completely well-modeled and calculated landscape design. Possible non-professional design choices in a landscape typology can be reflected in the generated landscape configurations. Analysis to dimensions of evaluation gives insight into possible explanations for the plausibility ordering of the subjects.A distinction is made between explicit and implicit dimensions of evaluation. Explicit dimensions are directly assembled in the experiment and provide perceived dimensions of evaluations. The implicit dimensions, identified with linear regression analysis are however uncertain in its reliability and ideally should be assembled in relation to explicit dimensions. Results of the linear regression analysis can direct future research with different approaches. First, the attribute target values in the current compilation can be re-specified. Second, non-used but available spatial attributes can be added to the multi-objective function. Third, new spatial attributes may be developed to be included in the optimization process. In light of the main objective in this research, it is important to define consistent guidelines for generating landscape typologies for different situations. In this research, a start is made to identify important choices with respect to the minimal selection of spatial attributes, the influence of its sequence and feasible attribute target value specification. The experiment results further provide detailed directions for improvements of the landscape generator. Other recommendations put forward in this research are related to: 1) the modification of the current heuristic approach (for performance improvement and local trapping avoidance purposes) by hybridization with existing heuristic approaches as simulated annealing and evolutionary algorithms, 2) full-automatic translation from the main characteristics of a landscape typology into the compilation of the multi-objective optimization function; this translation should be as generic as possible and the resulting configurations should be thoroughly validated for plausibility for a variety of possible representative situations (i.e. combination of proposed landscape typology with typical influential allocation site characteristics), 3) extending, if possible, the current library of available spatial attributes with functions that describe more overall organizational properties of landscape typologies or investigation of (parallel or sequential) optimization at different scale levels, 4) the inclusion or extension with representative infrastructure generation and 5) the increase in the effectiveness of the validation experiment by standardizing the acquisition of professional configurations (e.g. designing materials, formats and conditions and automation of conversion to images used in the inquiry) and 6) increase in the reliability of the validation experiment by separating the different parts of the experiment according prioritisation of experiment objectives

    Workflows for the Large-Scale Assessment of miRNA Evolution: Birth and Death of miRNA Genes in Tunicates

    Get PDF
    As described over 20 years ago with the discovery of RNA interference (RNAi), double-stranded RNAs occupied key roles in regulation and as defense-line in animal cells. This thesis focuses on metazoan microRNAs (miRNAs). These small non-coding RNAs are distinguished from their small-interfering RNA (siRNA) relatives by their tightly controlled, efficient and flexible biogenesis, together with a broader flexibility to target multiple mRNAs by a seed imperfect base-pairing. As potent regulators, miRNAs are involved in mRNA stability and post-transcriptional regulation tasks, being a conserved mechanism used repetitively by the evolution, not only in metazoans, but plants and unicellular organisms. Through a comprehensive revision of the current animal miRNA model, the canonical pathway dominates the extensive literature about miRNAs, and served as a scaffold to understand the scenes behind the regulatory landscape performed by the cell. The characterization of a diverse set of non-canonical pathways has expanded this view, suggesting a diverse, rich and flexible regulatory landscape to generate mature miRNAs. The production of miRNAs, derived from isolated or clustered transcripts, is an efficient and highly conserved mechanism traced back to animals with high fidelity at family level. In evolutionary terms, expansions of miRNA families have been associated with an increasing morphological and developmental complexity. In particular, the Chordata clade (the ancient cephalochordates, highly derived and secondary simplified tunicates, and the well-known vertebrates) represents an interesting scenario to study miRNA evolution. Despite clearly conserved miRNAs along these clades, tunicates display massive restructuring events, including emergence of highly derived miRNAs. As shown in this thesis, model organisms or vertebrate-specific bias exist in current animal miRNA annotations, misrepresenting more diverse groups, such as marine invertebrates. Current miRNA databases, such as miRBase and Rfam, classified miRNAs under different definitions and possessed annotations that are not simple to be linked. As an alternative, this thesis proposes a method to curate and merge those annotations, making use of miRBase precursor/mature annotations and genomes together with Rfam predicted sequences. This approach generated structural models for shared miRNA families, based on the alignment of their correct-positioned mature sequences as anchors. In this process, the developed structural curation steps flagged 33 miRNA families from the Rfam as questionable. Curated Rfam and miRBase anchored-structural alignments provided a rich resource for constructing predictive miRNA profiles, using correspondent hidden Markov (HMMs) and covariance models (CMs). As a direct application, the use of those models is time-consuming, and the user has to deal with multiple iterations to achieve a genome-wide non-overlapping annotation. To resolve this, the proposed miRNAture pipeline provides an automatic and flexible solution to annotate miRNAs. It combines multiple homology approaches to generate the best candidates validated at sequence and structural levels. This increases the achievable sensitivity to annotate canonical miRNAs, and the evaluation against human annotation shows that clear false positive calls are rare and additional counterparts lie in retained-introns, transcribed lncRNAs or repeat families. Further development of miRNAture suggests an inclusion of multiple rules to distinguish non-canonical miRNA families. This thesis describes multiple homology approaches to annotate the genomic information from a non-model chordate: the colonial tunicate Didemnum vexillum. Detected high levels of genetic variance and unexpected levels of DNA degradation were evidenced through a comprehensive analysis of genome-assembly methods and gene annotation. Despite those challenges, it was possible to find candidate homeobox and skeletogenesis- related genes. On its own, the ncRNA annotation included expected conserved families, and an extensive search of the Rhabdomyosarcoma 2-associated transcript (RMST) lncRNA family traced-back at the divergence of deuterostomes. In addition, a complete study of the annotation thresholds suggested variations to detect miRNAs, later implemented on the miRNAture tool. This chapter is a showcase of the usual workflow that should follow comprehensive sequencing, assembly and annotation project, in the light of the increasing research approaching DNA sequencing. In the last 10 years, the remarkable increment in tunicate sequencing projects boosted the access to an expanded miRNA annotation landscape. In this way, a comprehensive homology approach annotated the miRNA complement of 28 deuterostome genomes (including current 16 reported tunicates) using miRNAture. To get proper structural models as input, corrected miRBase structural alignments served as a scaffold for building correspondent CMs, based on a developed genetic algorithm. By this means, this automatic approach selected the set of sequences that composed the alignments, generating 2492 miRNA CMs. Despite the multiple sources and associated heterogeneity of the studied genomes, a clustering approach successfully gathered five groups of similar assemblies and highlighted low quality assemblies. The overall family and loci reduction on tunicates is notorious, showing on average 374 microRNA (miRNA) loci, in comparison to other clades: Cephalochordata (2119), Vertebrata (3638), Hemichordata (1092) and Echinodermata (2737). Detection of 533 miRNA families on the divergence of tunicates shows an expanded landscape regarding currently miRNA annotated families. Shared sets of ancestral, chordates, Olfactores, and specific clade-specific miRNAs were uncovered using a phyloge- netic conservation criteria. Compared to current annotations, the family repertories were expanded in all cases. Finally, relying on the adjacent elements from annotated miRNAs, this thesis proposes an additional syntenic support to cluster miRNA loci. In this way, the structural alignment of miR-1497, originally annotated in three model tunicates, was expanded with a clear syntenic support on tunicates

    Determinantes moleculares de la tolerancia a los antibióticos en la cepa de alto riesgo Pseudomonas aeruginosa AG1 mediante un enfoque multi-ómico: del genoma a la red transcriptómica en respuesta a la ciprofloxacina

    Get PDF
    La resistencia a los antibióticos es una amenaza importante para la salud pública porque compromete la administración de una terapia antibiótica adecuada. Pseudomonas aeruginosa es un patógeno oportunista que causa infecciones entre huéspedes inmunodeprimidos. P. aeruginosa AG1 (PaeAG1) es una cepa costarricense con resistencia a múltiples antibióticos como los β-lactámicos (incluidos los carbapenémicos), aminoglucósidos y fluoroquinolonas. PaeAG1 se identificó como el primer aislamiento de P. aeruginosa llevando los genes VIM-2 e IMP-18 que codifican las enzimas metalo-β-lactamasas (MBL). Según la Organización Mundial de la Salud (OMS), esta cepa se considera crítica, siendo clasificada en el grupo de Prioridad 1 por su resistencia a los carbapenémicos. PaeAG1 tiene características particulares a niveles genómicos y fenómicos, muchas de ellas relacionadas con la resistencia a los antibióticos. Debido a esto fue de interés estudiar los determinantes moleculares de la tolerancia a los antibióticos en PaeAG1 utilizando un enfoque multi-ómico. Primero, el ensamblaje del genoma fue el paso inicial para comprender la arquitectura genómica de esta cepa de alto riesgo. Del estudio con 13 enfoques diferentes, la selección del mejor ensamblaje reveló que el genoma de PaeAG1 tiene 57 islas genómicas que albergan seis profagos y dos integrones completos con los genes de las MBL. Además, se encontraron 250 genes de virulencia y 60 genes asociados a la resistencia a los antibióticos. Segundo, un enfoque genómico comparativo fue implementado para definir y actualizar la relación filogenética entre los genomas completos de P. aeruginosa, el contenido de islas genómicas en otras cepas, y la arquitectura de las regiones genómicas alrededor de los dos integrones portadores de MBL. Para el caso del IMP-18, el integrón que lo contiene y la arquitectura alrededor nunca habían sido reportados en la literatura. Luego, estudiamos el perfil proteómico de PaeAG1 después de la exposición a antibióticos usando electroforesis en gel bidimensional con un protocolo de análisis de imágenes y aprendizaje automático (inteligencia artificial). Los perfiles proteómicos mostraron que ciprofloxacina (CIP) induce un patrón proteico similar al control sin antibióticos, en contraste con otros antibióticos que se agruparon por separado. En cuarto lugar, para estudiar la respuesta central a múltiples perturbaciones en P. aeruginosa, es decir, el perturboma central, un enfoque de aprendizaje automático fue implementado. Utilizando datos transcriptómicos públicos, evaluamos seis enfoques para clasificar y seleccionar genes. La anotación molecular de 46 genes de la respuesta central reveló funciones biológicas relacionadas con la reparación del daño del ADN, metabolismo y la respiración aeróbica en el contexto de la tolerancia al estrés. Finalmente, para evaluar los efectos de la ciprofloxacina en PaeAG1, realizamos una comparación de curvas de crecimiento, análisis de expresión diferencial usando RNA-Seq y análisis de redes. El análisis transcriptómico mostró una expresión diferencial de 518 genes en el tiempo después del tratamiento con ciprofloxacina, incluyendo genes de fagos residentes que se regularon positivamente. Este último caso se validó a nivel fenómico utilizando ensayos de placa de fagos y que explicó las observaciones fenotípicas en la reducción de las curvas de crecimiento. En conjunto, utilizando un enfoque multiómico (a niveles genómico, genómico comparativo, perturbómico, transcriptómico, proteómico y fenómico), proporcionamos nuevos conocimientos sobre los determinantes genómicos y transcriptómicos asociados con la tolerancia a antibióticos en PaeAG1. Estos resultados no solo explican en parte la condición de alto riesgo de esta cepa que le permite conquistar ambientes nosocomiales y su perfil de multirresistencia, sino que esta información eventualmente podrá ser usada como parte de las estrategias para combatir a este patógeno.UCR::Vicerrectoría de Investigación::Sistema de Estudios de Posgrado::Interdisciplinarias::Doctorado Académico en Ciencia

    Whole genome duplication analysis of the invasive Lonicera maackii (Amur honeysuckle)

    Get PDF
    Invasive Lonicera maackii (L. maackii) is one of the highly successful and problematic bush honeysuckles found in the central and eastern of United States of America, which has been reported to pose a threat to native ecosystems by decreasing biodiversity. The mechanism by which L. maackii negatively impact environments is typically through either the direct effect of increased dominance or the indirect effect of territory modification. Numerous studies have documented the negative effects of L. maackii on native biota and the key traits such as seed dispersal, phenology, resistance to herbivory, rapid growth and environmental plasticity that contribute to invasion of L. maackii. In past decades, the studies mainly focused on negative effects and management of L. maackii invasion, and little was done to explore the genetic traits contributing to devastate the native ecosystem. Chloroplast-based genomic and chemical diversity in L. maackii has been reported. However, the whole genomic diversity in L. maackii has not been reported due to the availability of whole genome sequence of L. maackii. The advances in whole genome sequencing technologies and bioinformatic tools allow for studying the genomic diversity of L. maackii at the whole genome level. Genome duplication is a key evolutionary mechanism providing new genetic materials and new gene functions for plants, which play important roles in speciation and adaptation to biotic/abiotic stress. Given the fact that L. maackii is closely related to L. japonica, and whole genome duplication of Lonicera japonica (L. japonica) has been reported (Pu et al., 2020; Yu et al., 2022), we hypothesize that a whole genome duplication is present in L. maackii. In this study, we aim to investigate whether there is a genome duplication in L. maackii with the purpose of exploring the genomic diversity in L. maackii. We also conducted a comparison of genome duplication among the species in Lonicera genus. With the completion of whole genome assembly of L. maackii (Kesel et al., 2022), we conducted the gene prediction using Exonerate and gene duplications analysis using MCScanX in L. maackii. As a result, we predicted 32,642 genes and identified 5,668 genes, 24,911 genes, 703 genes, 902 genes, and 458 genes deriving from Singleton, Dispersed, Proximal, Tandem, and WGD modes, respectively. To our knowledge, this is the first genome duplication analysis that has been reported in L. maackii. Compared to L. japonica, a higher prevalence of Singleton and Dispersed modes of gene duplication was observed in L. maackii. The different genome duplication patterns between L. maackii and L. japonica may result from the difference of whole genome assembly format. The future directions should focus on improving the chromosome-scale genome assembly and whole genome annotation, promoting our understanding on the genome diversity and evolutionary traits in L. maackii and controlling the expansion of L. maackii

    Learning semantic structures from in-domain documents

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 175-184).Semantic analysis is a core area of natural language understanding that has typically focused on predicting domain-independent representations. However, such representations are unable to fully realize the rich diversity of technical content prevalent in a variety of specialized domains. Taking the standard supervised approach to domainspecific semantic analysis requires expensive annotation effort for each new domain of interest. In this thesis, we study how multiple granularities of semantic analysis can be learned from unlabeled documents within the same domain. By exploiting in-domain regularities in the expression of text at various layers of linguistic phenomena, including lexicography, syntax, and discourse, the statistical approaches we propose induce multiple kinds of structure: relations at the phrase and sentence level, content models at the paragraph and section level, and semantic properties at the document level. Each of our models is formulated in a hierarchical Bayesian framework with the target structure captured as latent variables, allowing them to seamlessly incorporate linguistically-motivated prior and posterior constraints, as well as multiple kinds of observations. Our empirical results demonstrate that the proposed approaches can successfully extract hidden semantic structure over a variety of domains, outperforming multiple competitive baselines.by Harr Chen.Ph.D

    The anthropometric, environmental and genetic determinants of right ventricular structure and function

    Get PDF
    BACKGROUND Measures of right ventricular (RV) structure and function have significant prognostic value. The right ventricle is currently assessed by global measures, or point surrogates, which are insensitive to regional and directional changes. We aim to create a high-resolution three-dimensional RV model to improve understanding of its structural and functional determinants. These may be particularly of interest in pulmonary hypertension (PH), a condition in which RV function and outcome are strongly linked. PURPOSE To investigate the feasibility and additional benefit of applying three-dimensional phenotyping and contemporary statistical and genetic approaches to large patient populations. METHODS Healthy subjects and incident PH patients were prospectively recruited. Using a semi-automated atlas-based segmentation algorithm, 3D models characterising RV wall position and displacement were developed, validated and compared with anthropometric, physiological and genetic influences. Statistical techniques were adapted from other high-dimensional approaches to deal with the problems of multiple testing, contiguity, sparsity and computational burden. RESULTS 1527 healthy subjects successfully completed high-resolution 3D CMR and automated segmentation. Of these, 927 subjects underwent next-generation sequencing of the sarcomeric gene titin and 947 subjects completed genotyping of common variants for genome-wide association study. 405 incident PH patients were recruited, of whom 256 completed phenotyping. 3D modelling demonstrated significant reductions in sample size compared to two-dimensional approaches. 3D analysis demonstrated that RV basal-freewall function reflects global functional changes most accurately and that a similar region in PH patients provides stronger survival prediction than all anthropometric, haemodynamic and functional markers. Vascular stiffness, titin truncating variants and common variants may also contribute to changes in RV structure and function. CONCLUSIONS High-resolution phenotyping coupled with computational analysis methods can improve insights into the determinants of RV structure and function in both healthy subjects and PH patients. Large, population-based approaches offer physiological insights relevant to clinical care in selected patient groups.Open Acces

    Biomedical Image Processing and Classification

    Get PDF
    Biomedical image processing is an interdisciplinary field involving a variety of disciplines, e.g., electronics, computer science, physics, mathematics, physiology, and medicine. Several imaging techniques have been developed, providing many approaches to the study of the human body. Biomedical image processing is finding an increasing number of important applications in, for example, the study of the internal structure or function of an organ and the diagnosis or treatment of a disease. If associated with classification methods, it can support the development of computer-aided diagnosis (CAD) systems, which could help medical doctors in refining their clinical picture
    • …
    corecore