75 research outputs found

    Scalable Similarity Search for Molecular Descriptors

    Full text link
    Similarity search over chemical compound databases is a fundamental task in the discovery and design of novel drug-like molecules. Such databases often encode molecules as non-negative integer vectors, called molecular descriptors, which represent rich information on various molecular properties. While there exist efficient indexing structures for searching databases of binary vectors, solutions for more general integer vectors are in their infancy. In this paper we present a time- and space- efficient index for the problem that we call the succinct intervals-splitting tree algorithm for molecular descriptors (SITAd). Our approach extends efficient methods for binary-vector databases, and uses ideas from succinct data structures. Our experiments, on a large database of over 40 million compounds, show SITAd significantly outperforms alternative approaches in practice.Comment: To be appeared in the Proceedings of SISAP'1

    Efficient identification of Tanimoto nearest neighbors; All Pairs Similarity Search Using the Extended Jaccard Coefficient

    Get PDF
    Tanimoto, or extended Jaccard, is an important similarity measure which has seen prominent use in fields such as data mining and chemoinformatics. Many of the existing state-of-the-art methods for market basket analysis, plagiarism and anomaly detection, compound database search, and ligand-based virtual screening rely heavily on identifying Tanimoto nearest neighbors. Given the rapidly increasing size of data that must be analyzed, new algorithms are needed that can speed up nearest neighbor search, while at the same time providing reliable results. While many search algorithms address the complexity of the task by retrieving only some of the nearest neighbors, we propose a method that finds all of the exact nearest neighbors efficiently by leveraging recent advances in similarity search filtering. We provide tighter filtering bounds for the Tanimoto coefficient and show that our method, TAPNN, greatly outperforms existing baselines across a variety of real-world datasets and similarity thresholds

    Optimizing operation of a campus energy system for economic and environmental considerations

    Get PDF
    The objective of this thesis is to determine effective costs of campus utilities, optimal operation of energy conversion and production subsystems through minimization of economic and environmental costs, and to evaluate how changing electrical grid costs and sources will affect future optimal operations at a campus. Characteristic days were developed to typify campus activities and their impact on energy consumption. At current grid electricity and natural gas prices, utilization of a cogeneration unit, a form of combined heat and power plant, is less expensive than purchasing equivalent amounts of electric and gas to produce steam, as long as there is sufficient campus demand for the electricity and steam produced. Carbon dioxide emissions during cogeneration unit operation was nearly the same as purchasing equivalent amounts of electric and gas to produce steam. Simulation of economic and environmental performance of the cogeneration plant, found minor differences between least expensive and greenest operations. Analyses suggested that grid emissions will not become clean enough to merit decommissioning of cogeneration plant early. Operation of the cogeneration plant is favorable for economical and environmental considerations

    Secure and Sustainable Energy System

    Get PDF
    This special issue aims to contribute to the climate actions which called for the need to address Greenhouse Gas (GHG) emissions, keeping global warming to well below 2°C through various means, including accelerating renewables, clean fuels, and clean technologies into the entire energy system. As long as fossil fuels (coal, gas and oil) are still used in the foreseeable future, it is vital to ensure that these fossil fuels are used cleanly through abated technologies. Financing the clean and energy transition technologies is vital to ensure the smooth transition towards net zero emission by 2050 or beyond. The lack of long‐term financing, the low rate of return, the existence of various risks, and the lack of capacity of market players are major challenges to developing sustainable energy systems.This special collected 17 high-quality empirical studies that assess the challenges for developing secure and sustainable energy systems and provide practical policy recommendations. The editors of this special issue wish to thank the Economic Research Institute for ASEAN and East Asia (ERIA) for funding several papers that were published in this special issue

    Algorithms for Constructing Exact Nearest Neighbor Graphs

    Get PDF
    University of Minnesota Ph.D. dissertation.June 2016. Major: Computer Science. Advisor: George Karypis. 1 computer file (PDF); xi, 151 pages.Nearest neighbor graphs (NNGs) contain the set of closest neighbors, and their similarities, for each of the objects in a set of objects. They are widely used in many real-world applications, such as clustering, online advertising, recommender systems, data cleaning, and query refinement. A brute-force method for constructing the graph requires O(n^2) similarity comparisons for a set of n objects. One way to reduce the number of comparisons is to ignore object pairs with low similarity, which are unimportant in many domains. Current methods for construction of the graph tackle the problem by either pruning the similarity search space, avoiding comparisons of objects that can be determined to not meet the similarity bounding conditions, or they solve the problem approximately, which can miss some of the neighbors. This thesis addresses the problem of efficiently constructing the exact nearest neighbor graph for a large set of objects, i.e., the graph that would be found by comparing each object against all other objects in the set. In this context, we address two specific problems. The epsilon-nearest neighbor graph (epsilon-NNG) construction problem, also known as all-pairs similarity search (APSS), seeks to find, for each object, all other objects with a similarity of at least some threshold epsilon. On the other hand, the k-nearest neighbor graph (k-NNG) construction problem seeks to find the k closest other objects to each object in the set. For both problems, we propose filtering techniques that are more effective than previous ones, and efficient serial and parallel algorithms to construct the graph. Our methods are ideally suited for sparse high dimensional data

    Remote sensing of optically active marine components

    Get PDF
    Merged with duplicate record 10026.1/649 on 20.12.2016 by CS (TIS). Merged with duplicate record 10026.1/2083 on 07.02.2017 by CS (TIS)This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.Remote sensing is an efficient tool to monitor the aquatic ecology. The optical signature in coastal marine environment is a reflection of the complex distribution of optically active marine components. It is essential to understand the relationship between the remote sensing signal and marine constituent material to take advantage of high resolution remote sensing data available from spaceborne and airborne platforms. The objective of this research was to develop a semi-analytical forward model to predict the remote sensing optical signature in coastal waters dominated by non-planktonic material. Laboratory and in situ measurements collected over a5 year period (1998-2003) were used to compile a biogeooptical database for coastal waters. The database is exploited to realise various biogeophysical relationships. A major advancement proposed in the thesis towards the modelling of backscattering probability was the synthesis of knowledge from Mie theory and particulate composition from geochemical analysis. This approach was used to derive particulate backscattering from in situ absorption and attenuation measurements. Results show that this model can produce backscattering values in a realistic way than with a constant value as proposed by Petzold. Absorption and backscattering values derived from ac-9 measurements were used to calculate radiance reflectance and remote sensing reflectance. The biogeophysical relationships developed were incorporated into the forward optics model to successfully simulate the inherent optical property ratio. Further development of the model and applications through inversion were discussed and outlined.Plymouth Marine Laborator

    Controlled Ecological Life Support Systems: Natural and Artificial Ecosystems

    Get PDF
    The scientists supported by the NASA sponsored Controlled Ecological Life Support Systems (CELSS) program have played a major role in creating a Committee on Space Research (COSPAR) section devoted to the development of bioregenerative life support for use in space. The series of 22 papers were sponsored by Subcommission F.4. The papers deal with many of the diverse aspects of life support, and with outgrowth technologies that may have commercial applications in fields such as biotechnology and bioengineering. Papers from researchers in France, Canada, Japan and the USSR are also presented
    corecore