1,210 research outputs found

    A Join Index for XML Data Warehouses

    Get PDF
    XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML data.Comment: 2008 International Conference on Information Resources Management (Conf-IRM 08), Niagra Falls : Canada (2008

    Distributed Graph Storage And Querying System

    Get PDF
    Graph databases offer an efficient way to store and access inter-connected data. However, to query large graphs that no longer fit in memory, it becomes necessary to make multiple trips to the storage device to filter and gather data based on the query. But I/O accesses are expensive operations and immensely slow down query response time and prevent us from fully exploiting the graph specific benefits that graph databases offer. The storage models of most existing graph database systems view graphs as indivisible structures and hence do not allow a hierarchical layering of the graph. This adversely affects query performance for large graphs as there is no way to filter the graph on a higher level without actually accessing the entire information from the disk. Distributing the storage and processing is one way to extract better performance. But current distributed solutions to this problem are not entirely effective, again due to the indivisible representation of graphs adopted in the storage format. This causes unnecessary latency due to increased inter-processor communication. In this dissertation, we propose an optimized distributed graph storage system for scalable and faster querying of big graph data. We start with our unique physical storage model, in which the graph is decomposed into three different levels of abstraction, each with a different storage hierarchy. We use a hybrid storage model to store the most critical component and restrict the I/O trips to only when absolutely necessary. This lets us actively make use of multi-level filters while querying, without the need of comprehensive indexes. Our results show that our system outperforms established graph databases for several class of queries. We show that this separation also eases the difficulties in distributing graph data and go on propose a more efficient distributed model for querying general purpose graph data using the Spark framework

    Special Libraries, November 1980

    Get PDF
    Volume 71, Issue 11https://scholarworks.sjsu.edu/sla_sl_1980/1009/thumbnail.jp

    MIL primitives for querying a fragmented world

    Get PDF
    In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation have a significant performance advantage. In order to support relational or object oriented applications on top of such a fragmented data model, a flexible yet powerful intermediate language is needed. This problem has been successfully tackled in Monet, a modern extensible database kernel developed by our group. We focus on the design choices made in the Monet Interpreter Language (MIL), its algebraic query language, and outline how its concept of tactical optimization enhances and simplifies the optimization of complex queries. Finally, we summarize the experience gained in Monet by creating a highly efficient implementation of MIL

    Solution techniques for a crane sequencing problem

    Get PDF
    In shipyards and power plants, relocating resources (items) from existing positions to newly assigned locations are costly and may represent a significant portion of the overall project budget. Since the crane is the most popular material handling equipment for relocating bulky items, it is essential to develop a good crane route to ensure efficient utilization and lower cost. In this research, minimizing the total travel and loading/unloading costs for the crane to relocate resources in multiple time periods is defined as the crane sequencing problem (CSP). In other words, the objective of the CSP is to find routes such that the cost of crane travel and resource loading/unloading is minimized. However, the CSP considers the capacities of locations and intermediate drops (i.e., preemptions) during a multiple period planning horizon. Therefore, the CSP is a unique problem with many applications and is computationally intractable. A mathematical model is developed to obtain optimal solutions for small size problems. Since large size CSPs are computationally intractable, construction algorithms as well as improvement heuristics (e.g., simulated annealing, hybrid ant systems and tabu search heuristics) are proposed to solve the CSPs. Two sets of test problems with different problem sizes are generated to test the proposed heuristics. In other words, extensive computational experiments are conducted to evaluate the performances of the proposed heuristics

    A treatment of stereochemistry in computer aided organic synthesis

    Get PDF
    This thesis describes the author’s contributions to a new stereochemical processing module constructed for the ARChem retrosynthesis program. The purpose of the module is to add the ability to perform enantioselective and diastereoselective retrosynthetic disconnections and generate appropriate precursor molecules. The module uses evidence based rules generated from a large database of literature reactions. Chapter 1 provides an introduction and critical review of the published body of work for computer aided synthesis design. The role of computer perception of key structural features (rings, functions groups etc.) and the construction and use of reaction transforms for generating precursors is discussed. Emphasis is also given to the application of strategies in retrosynthetic analysis. The availability of large reaction databases has enabled a new generation of retrosynthesis design programs to be developed that use automatically generated transforms assembled from published reactions. A brief description of the transform generation method employed by ARChem is given. Chapter 2 describes the algorithms devised by the author for handling the computer recognition and representation of the stereochemical features found in molecule and reaction scheme diagrams. The approach is generalised and uses flexible recognition patterns to transform information found in chemical diagrams into concise stereo descriptors for computer processing. An algorithm for efficiently comparing and classifying pairs of stereo descriptors is described. This algorithm is central for solving the stereochemical constraints in a variety of substructure matching problems addressed in chapter 3. The concise representation of reactions and transform rules as hyperstructure graphs is described. Chapter 3 is concerned with the efficient and reliable detection of stereochemical symmetry in both molecules, reactions and rules. A novel symmetry perception algorithm, based on a constraints satisfaction problem (CSP) solver, is described. The use of a CSP solver to implement an isomorph‐free matching algorithm for stereochemical substructure matching is detailed. The prime function of this algorithm is to seek out unique retron locations in target molecules and then to generate precursor molecules without duplications due to symmetry. Novel algorithms for classifying asymmetric, pseudo‐asymmetric and symmetric stereocentres; meso, centro, and C2 symmetric molecules; and the stereotopicity of trigonal (sp2) centres are described. Chapter 4 introduces and formalises the annotated structural language used to create both retrosynthetic rules and the patterns used for functional group recognition. A novel functional group recognition package is described along with its use to detect important electronic features such as electron‐withdrawing or donating groups and leaving groups. The functional groups and electronic features are used as constraints in retron rules to improve transform relevance. Chapter 5 details the approach taken to design detailed stereoselective and substrate controlled transforms from organised hierarchies of rules. The rules employ a rich set of constraints annotations that concisely describe the keying retrons. The application of the transforms for collating evidence based scoring parameters from published reaction examples is described. A survey of available reaction databases and the techniques for mining stereoselective reactions is demonstrated. A data mining tool was developed for finding the best reputable stereoselective reaction types for coding as transforms. For various reasons it was not possible during the research period to fully integrate this work with the ARChem program. Instead, Chapter 6 introduces a novel one‐step retrosynthesis module to test the developed transforms. The retrosynthesis algorithms use the organisation of the transform rule hierarchy to efficiently locate the best retron matches using all applicable stereoselective transforms. This module was tested using a small set of selected target molecules and the generated routes were ranked using a series of measured parameters including: stereocentre clearance and bond cleavage; example reputation; estimated stereoselectivity with reliability; and evidence of tolerated functional groups. In addition a method for detecting regioselectivity issues is presented. This work presents a number of algorithms using common set and graph theory operations and notations. Appendix A lists the set theory symbols and meanings. Appendix B summarises and defines the common graph theory terminology used throughout this thesis

    QNRs: toward language for intelligent machines

    Get PDF
    Impoverished syntax and nondifferentiable vocabularies make natural language a poor medium for neural representation learning and applications. Learned, quasilinguistic neural representations (QNRs) can upgrade words to embeddings and syntax to graphs to provide a more expressive and computationally tractable medium. Graph-structured, embedding-based quasilinguistic representations can support formal and informal reasoning, human and inter-agent communication, and the development of scalable quasilinguistic corpora with characteristics of both literatures and associative memory. To achieve human-like intellectual competence, machines must be fully literate, able not only to read and learn, but to write things worth retaining as contributions to collective knowledge. In support of this goal, QNR-based systems could translate and process natural language corpora to support the aggregation, refinement, integration, extension, and application of knowledge at scale. Incremental development of QNRbased models can build on current methods in neural machine learning, and as systems mature, could potentially complement or replace today’s opaque, error-prone “foundation models” with systems that are more capable, interpretable, and epistemically reliable. Potential applications and implications are broad

    RICH AND EFFICIENT VISUAL DATA REPRESENTATION

    Get PDF
    Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation

    High-throughput phenotyping of plant leaf morphological, physiological, and biochemical traits on multiple scales using optical sensing

    Get PDF
    Acquisition of plant phenotypic information facilitates plant breeding, sheds light on gene action, and can be applied to optimize the quality of agricultural and forestry products. Because leaves often show the fastest responses to external environmental stimuli, leaf phenotypic traits are indicators of plant growth, health, and stress levels. Combination of new imaging sensors, image processing, and data analytics permits measurement over the full life span of plants at high temporal resolution and at several organizational levels from organs to individual plants to field populations of plants. We review the optical sensors and associated data analytics used for measuring morphological, physiological, and biochemical traits of plant leaves on multiple scales. We summarize the characteristics, advantages and limitations of optical sensing and data-processing methods applied in various plant phenotyping scenarios. Finally, we discuss the future prospects of plant leaf phenotyping research. This review aims to help researchers choose appropriate optical sensors and data processing methods to acquire plant leaf phenotypes rapidly, accurately, and cost-effectively

    Optimal Planning of Container Terminal Operations

    No full text
    Due to globalization and international trade, moving goods using a mixture of transportation modes has become a norm; today, large vessels transport 95% of the international cargos. In the first part of this thesis, the emphasis is on the sea-land intermodal transport. The availability of different modes of transportation (rail/road/direct) in sea-land intermodal transport and container flows (import, export, transhipment) through the terminal are considered simultaneously within a given planning time horizon. We have also formulated this problem as an Integer Programming (IP) model and the objective is to minimise storage cost, loading and transportation cost from/to the customers. To further understand the computational complexity and performance of the model, we have randomly generated a large number of test instances for extensive experimentation of the algorithm. Since, CPLEX was unable to find the optimal solution for the large test problems; a heuristic algorithm has been devised based on the original IP model to find near „optimal‟ solutions with a relative error of less than 4%. Furthermore, we developed and implemented Lagrangian Relaxation (LR) of the IP formulation of the original problem. The bounds derived from LR were improved using sub-gradient optimisation and computational results are presented. In the second part of the thesis, we consider the combined problems of container assignment and yard crane (YC) deployment within the container terminal. A new IP formulation has been developed using a unified approach with the view to determining optimal container flows and YC requirements within a given planning time horizon. We designed a Branch and Cut (B&C) algorithm to solve the problem to optimality which was computationally evaluated. A novel heuristic approach based on the IP formulation was developed and implemented in C++. Detailed computational results are reported for both the exact and heuristic algorithms using a large number of randomly generated test problems. A practical application of the proposed model in the context of a real case-study is also presented. Finally, a simulation model of container terminal operations based on discrete-event simulation has been developed and implemented with the view of validating the above optimisation model and using it as a test bed for evaluating different operational scenarios
    • 

    corecore