79,558 research outputs found

    A Brief Study of Open Source Graph Databases

    Full text link
    With the proliferation of large irregular sparse relational datasets, new storage and analysis platforms have arisen to fill gaps in performance and capability left by conventional approaches built on traditional database technologies and query languages. Many of these platforms apply graph structures and analysis techniques to enable users to ingest, update, query and compute on the topological structure of these relationships represented as set(s) of edges between set(s) of vertices. To store and process Facebook-scale datasets, they must be able to support data sources with billions of edges, update rates of millions of updates per second, and complex analysis kernels. These platforms must provide intuitive interfaces that enable graph experts and novice programmers to write implementations of common graph algorithms. In this paper, we explore a variety of graph analysis and storage platforms. We compare their capabil- ities, interfaces, and performance by implementing and computing a set of real-world graph algorithms on synthetic graphs with up to 256 million edges. In the spirit of full disclosure, several authors are affiliated with the development of STINGER.Comment: WSSSPE13, 4 Pages, 18 Pages with Appendix, 25 figure

    Security Economics: A Guide for Data Availability and Needs

    Get PDF
    The rapid and accelerating development of security economics has generated great demand for more and better data to accommodate the empirical research agenda. The present paper serves as a guide to policy makers and researchers for security-related databases. The paper focuses on two main issues. Firstly, it takes stock of the existing databases, highlighting their main components and also performs a brief statistical comparison. Secondly, it discusses data shortages and needs that are considered essential for enhancing our understanding of the complex phenomenon of terrorism as well as designing and evaluating policy.

    Exposing Provenance Metadata Using Different RDF Models

    Full text link
    A standard model for exposing structured provenance metadata of scientific assertions on the Semantic Web would increase interoperability, discoverability, reliability, as well as reproducibility for scientific discourse and evidence-based knowledge discovery. Several Resource Description Framework (RDF) models have been proposed to track provenance. However, provenance metadata may not only be verbose, but also significantly redundant. Therefore, an appropriate RDF provenance model should be efficient for publishing, querying, and reasoning over Linked Data. In the present work, we have collected millions of pairwise relations between chemicals, genes, and diseases from multiple data sources, and demonstrated the extent of redundancy of provenance information in the life science domain. We also evaluated the suitability of several RDF provenance models for this crowdsourced data set, including the N-ary model, the Singleton Property model, and the Nanopublication model. We examined query performance against three commonly used large RDF stores, including Virtuoso, Stardog, and Blazegraph. Our experiments demonstrate that query performance depends on both RDF store as well as the RDF provenance model

    A Brief History of Web Crawlers

    Full text link
    Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

    COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls, Reddit posts

    Get PDF
    © 2020 The Authors. Published by MIT Press. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1162/qss_a_00066The COVID-19 pandemic requires a fast response from researchers to help address biological, medical and public health issues to minimize its impact. In this rapidly evolving context, scholars, professionals and the public may need to quickly identify important new studies. In response, this paper assesses the coverage of scholarly databases and impact indicators during 21 March to 18 April 2020. The rapidly increasing volume of research, is particularly accessible through Dimensions, and less through Scopus, the Web of Science, and PubMed. Google Scholar’s results included many false matches. A few COVID-19 papers from the 21,395 in Dimensions were already highly cited, with substantial news and social media attention. For this topic, in contrast to previous studies, there seems to be a high degree of convergence between articles shared in the social web and citation counts, at least in the short term. In particular, articles that are extensively tweeted on the day first indexed are likely to be highly read and relatively highly cited three weeks later. Researchers needing wide scope literature searches (rather than health focused PubMed or medRxiv searches) should start with Dimensions (or Google Scholar) and can use tweet and Mendeley reader counts as indicators of likely importance

    Towards a Novel Cooperative Logistics Information System Framework

    Get PDF
    Supply Chains and Logistics have a growing importance in global economy. Supply Chain Information Systems over the world are heterogeneous and each one can both produce and receive massive amounts of structured and unstructured data in real-time, which are usually generated by information systems, connected objects or manually by humans. This heterogeneity is due to Logistics Information Systems components and processes that are developed by different modelling methods and running on many platforms; hence, decision making process is difficult in such multi-actor environment. In this paper we identify some current challenges and integration issues between separately designed Logistics Information Systems (LIS), and we propose a Distributed Cooperative Logistics Platform (DCLP) framework based on NoSQL, which facilitates real-time cooperation between stakeholders and improves decision making process in a multi-actor environment. We included also a case study of Hospital Supply Chain (HSC), and a brief discussion on perspectives and future scope of work
    • …
    corecore