8 research outputs found

    Lossy compression of plant architectures

    Get PDF
    International audiencePlants usually show intricate structures whose representation and management are an important source of complexity of models. Yet plant structures are also repetitive: although not identical, the organs, axes, and branches at different positions are often highly similar. From a formal perspective, this repetitive character of plant structures was first exploited in fractal-based plant models (Barnsley, 2000; Ferraro et al., 2005; Prusinkiewicz and Hanan, 1989; Smith, 1984). In particular, L-systems have extensively been used in the last two decades to amplify parsimonious rule-based models into complex branching structures by specifying how fundamental units are repeatedly duplicated and modified in space and over time (Prusinkiewicz et al., 2001). However, the inverse problem of finding a compact representation of a branching structure has remained largely opened, and is now becoming a key issue in modeling applications as it needs to be solved to both get insight into the complex organization of plants and to decrease time and space complexity of simulation algorithms. The idea is that a compressed version of a plant structure might be much more efficient to manipulate than the original extensive branching structure. For instance, Soler et al. (2003) have shown that the complexity of radiation simulation can be drastically reduced if self-similar representations of plants are used. Unfor- tunately, strict self-similarity has a limited range of applications, because neither real plants nor more sophisticated plant models are exactly self-similar. Consequently, we propose in this paper an algorithm that exploit approximate self-similarity to compress plant structures to various degrees, representing a tradeoff between compression rate and accuracy. This new compression method aims at making possible to efficiently model, simulate and analyze plants using these compressed representations

    Scala AST Persistence

    Get PDF
    The Scala compiler uses ASTs (abstract syntax trees) as an intermediate representation before generating bytecode. With the development of Scala macros which expand trees at compile time, being able to access, modify and recompose ASTs within the compilation scope is becoming more and more important. One of the common scenarios of using macros is inspecting abstract syntax trees within reach in order to learn more about the code being transformed, to apply more powerful optimizations, etc. However, arguments to macros can depend on third-party libraries, which are precompiled as bytecode and don't have their ASTs available. It would therefore be great to have a way to publish ASTs along with the bytecode. The publishing of those ASTs should be a choice of the programmer and should take as little space as possible in order to be transparent to the user

    Lossy compression of plant architectures

    Get PDF
    International audiencePlants usually show intricate structures whose representation and management are an important source of complexity of models. Yet plant structures are also repetitive: although not identical, the organs, axes, and branches at different positions are often highly similar. From a formal perspective, this repetitive character of plant structures was first exploited in fractal-based plant models (Barnsley, 2000; Ferraro et al., 2005; Prusinkiewicz and Hanan, 1989; Smith, 1984). In particular, L-systems have extensively been used in the last two decades to amplify parsimonious rule-based models into complex branching structures by specifying how fundamental units are repeatedly duplicated and modified in space and over time (Prusinkiewicz et al., 2001). However, the inverse problem of finding a compact representation of a branching structure has remained largely opened, and is now becoming a key issue in modeling applications as it needs to be solved to both get insight into the complex organization of plants and to decrease time and space complexity of simulation algorithms. The idea is that a compressed version of a plant structure might be much more efficient to manipulate than the original extensive branching structure. For instance, Soler et al. (2003) have shown that the complexity of radiation simulation can be drastically reduced if self-similar representations of plants are used. Unfor- tunately, strict self-similarity has a limited range of applications, because neither real plants nor more sophisticated plant models are exactly self-similar. Consequently, we propose in this paper an algorithm that exploit approximate self-similarity to compress plant structures to various degrees, representing a tradeoff between compression rate and accuracy. This new compression method aims at making possible to efficiently model, simulate and analyze plants using these compressed representations

    Learning from Partially Labeled Data: Unsupervised and Semi-supervised Learning on Graphs and Learning with Distribution Shifting

    Get PDF
    This thesis focuses on two fundamental machine learning problems:unsupervised learning, where no label information is available, and semi-supervised learning, where a small amount of labels are given in addition to unlabeled data. These problems arise in many real word applications, such as Web analysis and bioinformatics,where a large amount of data is available, but no or only a small amount of labeled data exists. Obtaining classification labels in these domains is usually quite difficult because it involves either manual labeling or physical experimentation. This thesis approaches these problems from two perspectives: graph based and distribution based. First, I investigate a series of graph based learning algorithms that are able to exploit information embedded in different types of graph structures. These algorithms allow label information to be shared between nodes in the graph---ultimately communicating information globally to yield effective unsupervised and semi-supervised learning. In particular, I extend existing graph based learning algorithms, currently based on undirected graphs, to more general graph types, including directed graphs, hypergraphs and complex networks. These richer graph representations allow one to more naturally capture the intrinsic data relationships that exist, for example, in Web data, relational data, bioinformatics and social networks. For each of these generalized graph structures I show how information propagation can be characterized by distinct random walk models, and then use this characterization to develop new unsupervised and semi-supervised learning algorithms. Second, I investigate a more statistically oriented approach that explicitly models a learning scenario where the training and test examples come from different distributions. This is a difficult situation for standard statistical learning approaches, since they typically incorporate an assumption that the distributions for training and test sets are similar, if not identical. To achieve good performance in this scenario, I utilize unlabeled data to correct the bias between the training and test distributions. A key idea is to produce resampling weights for bias correction by working directly in a feature space and bypassing the problem of explicit density estimation. The technique can be easily applied to many different supervised learning algorithms, automatically adapting their behavior to cope with distribution shifting between training and test data

    Identification de motifs au sein des structures biologiques arborescentes

    Get PDF
    Avec l explosion de la quantité de données biologiques disponible, développer de nouvelles méthodes de traitements efficaces est une problématique majeure en bioinformatique. De nombreuses structures biologiques sont modélisées par des structures arborescentes telles que les structures secondaires d ARN et l architecture des plantes. Ces structures contiennent des motifs répétés au sein même de leur structure mais également d une structure à l autre. Nous proposons d exploiter cette propriété fondamentale afin d améliorer le stockage et le traitement de tels objets.En nous inspirant du principe de filtres sur les séquences, nous définissons dans cette thèse une méthode de filtrage sur les arborescences ordonnées permettant de rechercher efficacement dans une base de données un ensemble d arborescences ordonnées proches d une arborescence requête. La méthode se base sur un découpage de l arborescence en graines et sur une recherche de graines communes entre les structures. Nous définissons et résolvons le problème de chainage maximum sur des arborescences. Nous proposons dans le cas des structures secondaires d ARN une définition de graines (l d) centrées.Dans un second temps, en nous basant sur des techniques d instanciations utilisées, par exemple, en infographie et sur la connaissance des propriétés de redondances au sein des structures biologiques, nous présentons une méthode de compression permettant de réduire l espace mémoire nécessaire pour le stockage d arborescences non-ordonnées. Après une détermination des redondances nous utilisons une structure de données plus compacte pour représenter notamment l architecture de la plante, celle-ci pouvant contenir des informations topologiques mais également géométriques.The explosion of available biological data urges the need for bioinformatics methods. Manybiological structures are modeled by tree structures such as RNA secondary structure and plantsarchitecture. These structures contain repeating units within their structure, but also betweendifferent structures. We propose to exploit this fundamental property to improve storage andtreatment of such objects.Following the principle of sequence filtering, we define a filtering method on ordered treesto efficiently retrieve in a database a set of ordered trees close from a query. The method isbased on a decomposition of the tree into seeds and the detection of shared seeds between thesestructures. We define and solve the maximum chaining problem on trees. We propose for RNAsecondary structure applications a definition of (l d) centered seed.Based on instantiation techniques used for instance in computer graphics and the repetitivenessof biological structures, we present a compression method which reduces the memoryspace required for plant architecture storage. A more compact data structure is used in order torepresent plant architecture. The construction of this data structure require the identification ofinternal redundancies and taking into account both topological and geometrical informations.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Memory-Efficient and Parallel Simulation of Super Carbon Nanotubes

    Get PDF
    Carbon nanotubes (CNTs) received much attention since their description in Nature in 1991. In principle, a carbon nanotube is a rolled up sheet of graphene, which can be imagined as a honeycomb grid of carbon atoms. This allotrope of carbon has many interesting properties like high tensile strength at very low weight or its high temperature resistance. This motivates the application of CNTs in material science to create new carbon nanotube enforced materials. They also possess interesting electronic properties since CNTs show either metallic or semiconducting behavior, depending on their configuration. The synthesis of branched carbon nanotubes allows the connection of straight CNTs to carbon nanotubes networks with branched tubes employed as junction elements. One of these networks are the so-called super carbon nanotubes (SCNTs) that were proposed in 2006. In that case, each carbon-carbon bond within the honeycomb grid is replaced by a CNT of equal size and each carbon atom by a Y-branched tube with three arms of equal length and a regular angle of 120° between the arms. This results in a structure that originates from tubes and regains the outer shape of a tube. It is also possible to repeat this process, replacing carbon-carbon bonds not with CNTs but with SCNTs, leading to very regular and self-similar structures of increasingly higher orders. Simulations demonstrate that the SCNTs also exhibit very interesting mechanical properties. They are even more flexible than CNTs and thus are good candidates for high strength com- posites or actuators with very low weight. Other applications arise again in microelectronics because of their configurable electronic behavior and in biology due to the biocompatibility of SCNTs. Despite progress in synthesizing processes for straight and branched CNTs, the production of SCNTs is still beyond current technological capabilities. In addition, real experiments at nanoscale are expensive and complex and hence, simulations are important to predict properties of SCNTs and to guide the experimental research. The atomic-scale finite element method (AFEM) already provides a well-established approach for simulations of CNTs at the atomic level. However, the model size of SCNTs grows very fast for larger tubes and the arising n-body and linear equation systems quickly exceed the memory capacity of available computer systems. This renders infeasible the simulation of large SCNTs on an atomic level, unless the regular structure of SCNTs can be taken into account to reduce the memory footprint. This thesis presents ways to exploit the symmetry and hierarchy within SCNTs enabling the simulation of higher order SCNTs. We develop structure-tailored and memory-saving data struc- tures which allow the storage of very large SCNTs models up to several billions of atoms while providing fast data access. We realize this with a novel graph data structure called Compressed Symmetric Graphs which is able to dynamically recompute large parts of structural information for tubes instead of storing them. We also present a new structure-aware and SMP-parallelized matrix-free solver for the linear equation systems involving the stiffness matrix, which employs an efficient caching mechanism for the data during the sparse matrix-vector multiplication. The matrix-free solver is twice as fast as a compressed row storage format-based reference solver, requiring only half the memory while caching all contributions of the matrix employed. We demonstrate that this solver, in combination with the Compressed Symmetric Graphs, is able to instantiate equation systems with matrices of an order higher than 5∗10^7 on a single compute node, while still fully caching all matrix data

    Efficient Lossless Compression of Trees and Graphs

    No full text
    In this paper, we study the problem of compressing a data structure (e.g. tree, undirected and directed graphs) in an efficient way while keeping a similar structure in the compressed form. To date, there has been no proven optimal algorithm for this problem. We use the idea of building LZW tree in LZW compression to compress a binary tree generated by a stationary ergodic source in an optimal manner. We also extend our tree compression algorithm to compress undirected and directed acyclic graphs
    corecore