2,779 research outputs found

    Comparing and Benchmarking Semantic Measures Using SMComp

    Get PDF
    The goal of the semantic measures is to compare pairs of concepts, words, sentences or named entities. Their categorization depends on what they measure. If a measure only considers taxonomy relationships is a similarity measure; if it considers all type of relationships it is a relatedness measure. The evaluation process of these measures usually relies on semantic gold standards. These datasets, with several pairs of words with a rating assigned by persons, are used to assess how well a semantic measure performs. There are a few frameworks that provide tools to compute and analyze several well-known measures. This paper presents a novel tool - SMComp - a testbed designed for path-based semantic measures. At its current state, it is a domain-specific tool using three different versions of WordNet. SMComp has two views: one to compute semantic measures of a pair of words and another to assess a semantic measure using a dataset. On the first view, it offers several measures described in the literature as well as the possibility of creating a new measure, by introducing Java code snippets on the GUI. The other view offers a large set of semantic benchmarks to use in the assessment process. It also offers the possibility of uploading a custom dataset to be used in the assessment

    uFLIP: Understanding Flash IO Patterns

    Get PDF
    Does the advent of flash devices constitute a radical change for secondary storage? How should database systems adapt to this new form of secondary storage? Before we can answer these questions, we need to fully understand the performance characteristics of flash devices. More specifically, we want to establish what kind of IOs should be favored (or avoided) when designing algorithms and architectures for flash-based systems. In this paper, we focus on flash IO patterns, that capture relevant distribution of IOs in time and space, and our goal is to quantify their performance. We define uFLIP, a benchmark for measuring the response time of flash IO patterns. We also present a benchmarking methodology which takes into account the particular characteristics of flash devices. Finally, we present the results obtained by measuring eleven flash devices, and derive a set of design hints that should drive the development of flash-based systems on current devices.Comment: CIDR 200

    WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks

    Full text link
    Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the 99 largest language editions. The dataset contains yearly snapshots of the network and spans 1717 years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.Comment: 10 pages, 3 figures, 7 tables, LaTeX. Final camera-ready version accepted at the 13TH International AAAI Conference on Web and Social Media (ICWSM 2019) - Munich (Germany), 11-14 June 201

    A short survey on modern virtual environments that utilize AI and synthetic data

    Get PDF
    Within a rather abstract computational framework Artificial Intelligence (AI) may be defined as intelligence exhibited by machines. In computer science, though, the field of AI research defines itself as the study of “intelligent agents.” In this context, interaction with popular virtual environments, as for instance in virtual game playing, has gained a lot of focus recently in the sense that it provides innovative aspects of AI perception that did not occur to researchers until now. Such aspects are typically formed by the computational intelligent behavior captured through interaction with the virtual environment, as well as the study of graphic models and biologically inspired learning techniques, like, for instance, evolutionary computation, neural networks, and reinforcement learning. In this short survey paper, we attempt to provide an overview of the most recent research works on such novel, yet quite interesting, research domains. We feel that this topic forms an attractive candidate for fellow researchers that came into sight over the last years. Thus, we initiate our study by presenting a brief overview of our motivation and continue with some basic information on recent virtual graphic models utilization and the state-of-the-art on virtual environments, which constitutes two clearly identifiable components of the herein attempted summarization. We then continue, by briefly reviewing the interesting video games territory, and by discerning and discriminating its useful types, thus envisioning possible further utilization scenarios for the collected information. A short discussion on the identified trends and a couple of future research directions conclude the paper

    Web development productivity improvement through object-oriented application framework

    Get PDF
    Most of the commercial and industrial web applications are complex, difficult to implement, risky to maintain and requires deep understanding of the requirements for customization. As today's software market is more competitive, productivity has become a major concern in software development industry. The aim of this research is to design and develop an application framework for accelerating web development productivity through object-oriented technology. It allows customization, design reuse and automatic code generation to support productivity improvement as a breakthrough solution for the given problem. This research employed systematic literature review (SLR) to identify the source of complexity and productivity factors. Agile development methodology was used to design the framework and it was validated by empirical data from two commercial projects. Results showed that object-oriented application framework (OOAF) has significant factors that affect productivity and dramatically improve higher productivity over traditional approach. It has fulfilled the current needs by reducing complexities, development efforts and accelerates web development productivity. This research contributes in the area of software engineering, specifically in the field of software productivity improvement and software customization. These will lead to faster development time for software industries

    Fuzz testing containerized digital forensics and incident response tools

    Get PDF
    Abstract. Open source digital forensics and incident response tools are increasingly important in detecting, responding to, and documenting hostile actions against organisations and systems. Programming errors in these tools can at minimum slow down incident response and at maximum be a major security issue potentially leading to arbitrary code execution. Many of these tools are developed by a single individual or a small team of developers and have not been comprehensively tested. The goal of this thesis was to find a way to fuzz test a large amount of containerized open source digital forensics and incident response tools. A framework was designed and implemented that allows fuzz testing any containerized command line based application. The framework was tested against 43 popular containerized open source digital forensics and incident response tools. As a result, out of 43 of the tested tools 24 had potential issues. Most critical issues were disclosed to the respective tools authors via their preferred communication method. The results show that currently many open source digital forensics and incident response tools have robustness issues and reaffirm that fuzzing is an efficient software testing method.Kontitettujen digitaaliforensiikka ja tietoturvahäiriön hallintatyökalujen fuzz-testaus. Tiivistelmä. Avoimen lähdekoodin digitaaliforensiikka ja tietoturvahäiriön hallintaan käytetyt työkalut ovat entistä tärkeämpiä tunnistamiseen, hallintaan ja dokumentoimiseen haitallisia toimintoja vastaan organisaatioissa ja järjestelmissä. Ohjelmointivirheet näissä työkaluissa voivat pienimmillään hidastaa tieturvahäiriön hallintaa ja enimmillään olla suuri tietoturvauhka, joka potentiaalisesti voi johtaa mielivaltaisen koodin suoritukseen. Monet näistä työkaluista ovat kehittäneet joko yksittäiset henkilöt tai pienet ohjelmistokehittäjätiimit eikä niitä välttämättä ole kattavasti testattu. Tämän diplomi-insinöörityön tavoitteena oli löytää tapa fuzz-testata suuri määrä kontitettuja avoimen lähdekoodin digitaaliforensiikka ja tietoturvahäiriön hallintaan käytettyjä työkaluja. Tähän tarkoitettu ohjelmisto suunniteltiin ja toteutettiin, joka mahdollistaa minkä tahansa kontitetun komentolinjaan perustuvan ohjelmiston fuzz-testaamisen. Ohjelmistoa testattiin 43 suosittua kontitettua avoimen lähdekoodin digitaaliforensiikka ja tietoturvahäiriön hallintaan käytettävää työkalua vastaan. Lopputuloksena testatuista työkalusta 24 löytyi ongelmia. Kriittisimmät löydetyistä ongelmista ilmoitettiin työkalujen vastaaville kehittäjille heidän suosimallaan kommunikaatiometodilla. Lopputulokset osoittavat, että tällä hetkellä monet avoimen lähdekoodin digitaaliforensiikka ja tietoturvahäiriöiden hallintaan käytettävät työkalut kärsivät ohjelmointivirheistä ja vahvistaa fuzz-testauksen olevan edelleen tehokas ohjelmistotestaus metodi
    corecore