19,369 research outputs found

    The NASA Astrophysics Data System: Architecture

    Full text link
    The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself. The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes. To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms. The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion. The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table

    Competitive function approximation for reinforcement learning

    Get PDF
    The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions. We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin

    Incremental Entity Resolution from Linked Documents

    Full text link
    In many government applications we often find that information about entities, such as persons, are available in disparate data sources such as passports, driving licences, bank accounts, and income tax records. Similar scenarios are commonplace in large enterprises having multiple customer, supplier, or partner databases. Each data source maintains different aspects of an entity, and resolving entities based on these attributes is a well-studied problem. However, in many cases documents in one source reference those in others; e.g., a person may provide his driving-licence number while applying for a passport, or vice-versa. These links define relationships between documents of the same entity (as opposed to inter-entity relationships, which are also often used for resolution). In this paper we describe an algorithm to cluster documents that are highly likely to belong to the same entity by exploiting inter-document references in addition to attribute similarity. Our technique uses a combination of iterative graph-traversal, locality-sensitive hashing, iterative match-merge, and graph-clustering to discover unique entities based on a document corpus. A unique feature of our technique is that new sets of documents can be added incrementally while having to re-resolve only a small subset of a previously resolved entity-document collection. We present performance and quality results on two data-sets: a real-world database of companies and a large synthetically generated `population' database. We also demonstrate benefit of using inter-document references for clustering in the form of enhanced recall of documents for resolution.Comment: 15 pages, 8 figures, patented wor

    Seismic Ray Impedance Inversion

    Get PDF
    This thesis investigates a prestack seismic inversion scheme implemented in the ray parameter domain. Conventionally, most prestack seismic inversion methods are performed in the incidence angle domain. However, inversion using the concept of ray impedance, as it honours ray path variation following the elastic parameter variation according to Snell’s law, shows the capacity to discriminate different lithologies if compared to conventional elastic impedance inversion. The procedure starts with data transformation into the ray-parameter domain and then implements the ray impedance inversion along constant ray-parameter profiles. With different constant-ray-parameter profiles, mixed-phase wavelets are initially estimated based on the high-order statistics of the data and further refined after a proper well-to-seismic tie. With the estimated wavelets ready, a Cauchy inversion method is used to invert for seismic reflectivity sequences, aiming at recovering seismic reflectivity sequences for blocky impedance inversion. The impedance inversion from reflectivity sequences adopts a standard generalised linear inversion scheme, whose results are utilised to identify rock properties and facilitate quantitative interpretation. It has also been demonstrated that we can further invert elastic parameters from ray impedance values, without eliminating an extra density term or introducing a Gardner’s relation to absorb this term. Ray impedance inversion is extended to P-S converted waves by introducing the definition of converted-wave ray impedance. This quantity shows some advantages in connecting prestack converted wave data with well logs, if compared with the shearwave elastic impedance derived from the Aki and Richards approximation to the Zoeppritz equations. An analysis of P-P and P-S wave data under the framework of ray impedance is conducted through a real multicomponent dataset, which can reduce the uncertainty in lithology identification.Inversion is the key method in generating those examples throughout the entire thesis as we believe it can render robust solutions to geophysical problems. Apart from the reflectivity sequence, ray impedance and elastic parameter inversion mentioned above, inversion methods are also adopted in transforming the prestack data from the offset domain to the ray-parameter domain, mixed-phase wavelet estimation, as well as the registration of P-P and P-S waves for the joint analysis. The ray impedance inversion methods are successfully applied to different types of datasets. In each individual step to achieving the ray impedance inversion, advantages, disadvantages as well as limitations of the algorithms adopted are detailed. As a conclusion, the ray impedance related analyses demonstrated in this thesis are highly competent compared with the classical elastic impedance methods and the author would like to recommend it for a wider application

    Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising

    Full text link
    Sponsored search represents a major source of revenue for web search engines. This popular advertising model brings a unique possibility for advertisers to target users' immediate intent communicated through a search query, usually by displaying their ads alongside organic search results for queries deemed relevant to their products or services. However, due to a large number of unique queries it is challenging for advertisers to identify all such relevant queries. For this reason search engines often provide a service of advanced matching, which automatically finds additional relevant queries for advertisers to bid on. We present a novel advanced matching approach based on the idea of semantic embeddings of queries and ads. The embeddings were learned using a large data set of user search sessions, consisting of search queries, clicked ads and search links, while utilizing contextual information such as dwell time and skipped ads. To address the large-scale nature of our problem, both in terms of data and vocabulary size, we propose a novel distributed algorithm for training of the embeddings. Finally, we present an approach for overcoming a cold-start problem associated with new ads and queries. We report results of editorial evaluation and online tests on actual search traffic. The results show that our approach significantly outperforms baselines in terms of relevance, coverage, and incremental revenue. Lastly, we open-source learned query embeddings to be used by researchers in computational advertising and related fields.Comment: 10 pages, 4 figures, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Ital

    A Practical Searchable Symmetric Encryption Scheme for Smart Grid Data

    Full text link
    Outsourcing data storage to the remote cloud can be an economical solution to enhance data management in the smart grid ecosystem. To protect the privacy of data, the utility company may choose to encrypt the data before uploading them to the cloud. However, while encryption provides confidentiality to data, it also sacrifices the data owners' ability to query a special segment in their data. Searchable symmetric encryption is a technology that enables users to store documents in ciphertext form while keeping the functionality to search keywords in the documents. However, most state-of-the-art SSE algorithms are only focusing on general document storage, which may become unsuitable for smart grid applications. In this paper, we propose a simple, practical SSE scheme that aims to protect the privacy of data generated in the smart grid. Our scheme achieves high space complexity with small information disclosure that was acceptable for practical smart grid application. We also implement a prototype over the statistical data of advanced meter infrastructure to show the effectiveness of our approach
    corecore