41 research outputs found

    BIP! NDR (NoDoiRefs): A Dataset of Citations From Papers Without DOIs in Computer Science Conferences and Workshops

    Full text link
    In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data. BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains more than 510K citations made by approximately 60K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI

    SECRETA: A System for Evaluating and Comparing RElational and Transaction Anonymization algorithms

    Get PDF
    Publishing data about individuals, in a privacy-preserving way, has led to a large body of research. Meanwhile, algorithms for anonymizing datasets, with relational or transaction attributes, that preserve data truthfulness, have attracted significant interest from organizations. However, selecting the most appropriate algorithm is still far from trivial, and tools that assist data publishers in this task are needed. In response, we develop SECRETA, a system for analyzing the effectiveness and efficiency of anonymization algorithms. Our system allows data publishers to evaluate a specific algorithm, compare multiple algorithms, and combine algorithms for anonymizing datasets with both relational and transaction attributes. The analysis of the algorithm(s) is performed, in an interactive and progressive way, and results, including attribute statistics and various data utility indicators, are summarized and presented graphically

    Peer-to-peer techniques for web information retrieval and filtering

    No full text
    Much information of interest to humans is today available on the Web. People can easily gain access to information but at the same time, they have to cope with the problem of information overload. Consequently, they have to rely on specialised tools and systems designed for searching, querying and retrieving information from the Web. Currently, Web search is controlled by a few search engines that are assigned the burden to follow this information explosion by utilising centralised search infrastructures. Additionally, users are striving to stay informed by sifting through enormous amounts of new information, and by relying on tools and techniques that are not able to capture the dynamic nature of the Web. In this setting, peer-to-peer Web search seems an ideal candidate that can offer adaptivity to high dynamics, scalability, resilience to failures and leverage the functionality of the traditional search engine to offer new features and services. In this thesis, we study the problem of peer-to-peer resource sharing in wide-area networks such as the Internet and the Web. In the architecture that we envision, each peer owns resources which it is willing to share: documents, web pages or files that are appropriately annotated and queried using constructs from information retrieval models. There are two kinds of basic functionality that we expect this architecture to offer: information retrieval and information filtering (also known as publish/subscribe or information dissemination). The main focus of our work is on providing models and languages for expressing publications, queries and subscriptions, protocols that regulate peer interactions in this distributed environment and indexing mechanisms that are utilized locally by each one of the peers.Initially, we present three progressively more expressive data models, WP, AWP and AWPS, that are based on information retrieval concepts and their respective query languages. Then, we study the complexity of query satisfiability and entailment for models WP and AWP using techniques from propositional logic and computational complexity. Subsequently, we propose a peer-to-peer architecture designed to support fullfledged information retrieval and filtering functionality in a single unifying framework. In the context of this architecture, we focus on the problem of information filtering using the model AWPS, and present centralised and distributed algorithmsfor efficient, adaptive information filtering in a peer-to-peer environment. We use two levels of indexing to store queries submitted by users. The first level corresponds to the partitioning of the global query index to different peers using a distributed hash table as the underlying routing infrastructure. Each node is responsible for a fraction of the submitted user queries through a mapping of attribute values to peer identifiers. The distributed hash table infrastructure is used to define the mapping scheme and also manages the routing of messages between different nodes. Our set of protocols, collectively called DHTrie, extend the basic functionality of the distributed hash table to offer filtering functionality in a dynamic peer-to-peer environment. Additionally, the use of a self-maintainable routing table allows efficient communication between the peers, offering significantly lower network load and latency. This extra routing table uses only local informationcollected by each peer to speed up the retrieval and filtering process. The second level of our indexing mechanism is managed locally by each peer, and is used for indexing the user queries the peer is responsible for. In this level of the index, each peer is able to store large numbers of user queries and match them against incoming documents. We have proposed data structures and localindexing algorithms that enable us to solve the filtering problem efficiently for large databases of queries. The main idea behind these algorithms is to store sets of words compactly by exploiting their common elements using trie-like data structures. Since these algorithms use heuristics to cluster user queries, we also consider the periodic re-organisation of the query database when the clustering of queries deteriorates. Our experimental results show the scalability and efficiency of the proposed algorithms in a dynamic setting. The distributed protocols manage to provide exact query answering functionality (precision and recall are the same as those of a centralised system) at a low network cost and low latency. Additionally, the local algorithms we have proposed outperform solutions in the current literature. Our trie-based query indexing algorithms proved more than 20% faster than their counterparts, offering sophisticated clustering of user queries and mechanisms for the adaptive reorganisation of the query database when filtering performance drops
    corecore