62,759 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Lost in translation: Exposing hidden compiler optimization opportunities

    Get PDF
    Existing iterative compilation and machine-learning-based optimization techniques have been proven very successful in achieving better optimizations than the standard optimization levels of a compiler. However, they were not engineered to support the tuning of a compiler's optimizer as part of the compiler's daily development cycle. In this paper, we first establish the required properties which a technique must exhibit to enable such tuning. We then introduce an enhancement to the classic nightly routine testing of compilers which exhibits all the required properties, and thus, is capable of driving the improvement and tuning of the compiler's common optimizer. This is achieved by leveraging resource usage and compilation information collected while systematically exploiting prefixes of the transformations applied at standard optimization levels. Experimental evaluation using the LLVM v6.0.1 compiler demonstrated that the new approach was able to reveal hidden cross-architecture and architecture-dependent potential optimizations on two popular processors: the Intel i5-6300U and the Arm Cortex-A53-based Broadcom BCM2837 used in the Raspberry Pi 3B+. As a case study, we demonstrate how the insights from our approach enabled us to identify and remove a significant shortcoming of the CFG simplification pass of the LLVM v6.0.1 compiler.Comment: 31 pages, 7 figures, 2 table. arXiv admin note: text overlap with arXiv:1802.0984

    Segmentation and semantic labelling of RGBD data with convolutional neural networks and surface fitting

    Get PDF
    We present an approach for segmentation and semantic labelling of RGBD data exploiting together geometrical cues and deep learning techniques. An initial over-segmentation is performed using spectral clustering and a set of non-uniform rational B-spline surfaces is fitted on the extracted segments. Then a convolutional neural network (CNN) receives in input colour and geometry data together with surface fitting parameters. The network is made of nine convolutional stages followed by a softmax classifier and produces a vector of descriptors for each sample. In the next step, an iterative merging algorithm recombines the output of the over-segmentation into larger regions matching the various elements of the scene. The couples of adjacent segments with higher similarity according to the CNN features are candidate to be merged and the surface fitting accuracy is used to detect which couples of segments belong to the same surface. Finally, a set of labelled segments is obtained by combining the segmentation output with the descriptors from the CNN. Experimental results show how the proposed approach outperforms state-of-the-art methods and provides an accurate segmentation and labelling

    Integrated Support for Handoff Management and Context-Awareness in Heterogeneous Wireless Networks

    Get PDF
    The overwhelming success of mobile devices and wireless communications is stressing the need for the development of mobility-aware services. Device mobility requires services adapting their behavior to sudden context changes and being aware of handoffs, which introduce unpredictable delays and intermittent discontinuities. Heterogeneity of wireless technologies (Wi-Fi, Bluetooth, 3G) complicates the situation, since a different treatment of context-awareness and handoffs is required for each solution. This paper presents a middleware architecture designed to ease mobility-aware service development. The architecture hides technology-specific mechanisms and offers a set of facilities for context awareness and handoff management. The architecture prototype works with Bluetooth and Wi-Fi, which today represent two of the most widespread wireless technologies. In addition, the paper discusses motivations and design details in the challenging context of mobile multimedia streaming applications

    Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection

    Full text link
    Selective weeding is one of the key challenges in the field of agriculture robotics. To accomplish this task, a farm robot should be able to accurately detect plants and to distinguish them between crop and weeds. Most of the promising state-of-the-art approaches make use of appearance-based models trained on large annotated datasets. Unfortunately, creating large agricultural datasets with pixel-level annotations is an extremely time consuming task, actually penalizing the usage of data-driven techniques. In this paper, we face this problem by proposing a novel and effective approach that aims to dramatically minimize the human intervention needed to train the detection and classification algorithms. The idea is to procedurally generate large synthetic training datasets randomizing the key features of the target environment (i.e., crop and weed species, type of soil, light conditions). More specifically, by tuning these model parameters, and exploiting a few real-world textures, it is possible to render a large amount of realistic views of an artificial agricultural scenario with no effort. The generated data can be directly used to train the model or to supplement real-world images. We validate the proposed methodology by using as testbed a modern deep learning based image segmentation architecture. We compare the classification results obtained using both real and synthetic images as training data. The reported results confirm the effectiveness and the potentiality of our approach.Comment: To appear in IEEE/RSJ IROS 201

    XML Schema-based Minification for Communication of Security Information and Event Management (SIEM) Systems in Cloud Environments

    Get PDF
    XML-based communication governs most of today's systems communication, due to its capability of representing complex structural and hierarchical data. However, XML document structure is considered a huge and bulky data that can be reduced to minimize bandwidth usage, transmission time, and maximize performance. This contributes to a more efficient and utilized resource usage. In cloud environments, this affects the amount of money the consumer pays. Several techniques are used to achieve this goal. This paper discusses these techniques and proposes a new XML Schema-based Minification technique. The proposed technique works on XML Structure reduction using minification. The proposed technique provides a separation between the meaningful names and the underlying minified names, which enhances software/code readability. This technique is applied to Intrusion Detection Message Exchange Format (IDMEF) messages, as part of Security Information and Event Management (SIEM) system communication hosted on Microsoft Azure Cloud. Test results show message size reduction ranging from 8.15% to 50.34% in the raw message, without using time-consuming compression techniques. Adding GZip compression to the proposed technique produces 66.1% shorter message size compared to original XML messages.Comment: XML, JSON, Minification, XML Schema, Cloud, Log, Communication, Compression, XMill, GZip, Code Generation, Code Readability, 9 pages, 12 figures, 5 tables, Journal Articl

    Stateful Testing: Finding More Errors in Code and Contracts

    Full text link
    Automated random testing has shown to be an effective approach to finding faults but still faces a major unsolved issue: how to generate test inputs diverse enough to find many faults and find them quickly. Stateful testing, the automated testing technique introduced in this article, generates new test cases that improve an existing test suite. The generated test cases are designed to violate the dynamically inferred contracts (invariants) characterizing the existing test suite. As a consequence, they are in a good position to detect new errors, and also to improve the accuracy of the inferred contracts by discovering those that are unsound. Experiments on 13 data structure classes totalling over 28,000 lines of code demonstrate the effectiveness of stateful testing in improving over the results of long sessions of random testing: stateful testing found 68.4% new errors and improved the accuracy of automatically inferred contracts to over 99%, with just a 7% time overhead.Comment: 11 pages, 3 figure
    • …
    corecore