1,009 research outputs found

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Compact semantic representations of observational data

    Get PDF
    Das Konzept des Internet der Dinge (IoT) ist in mehreren Bereichen weit verbreitet, damit GerĂ€te miteinander interagieren und bestimmte Aufgaben erfĂŒllen können. IoT-GerĂ€te umfassen verschiedene Konzepte, z.B. Sensoren, Programme, Computer und Aktoren. IoT-GerĂ€te beobachten ihre Umgebung, um Informationen zu sammeln und miteinander zu kommunizieren, um gemeinsame Aufgaben zu erfĂŒllen. Diese Vorrichtungen erzeugen kontinuierlich Beobachtungsdatenströme, die zu historischen Daten werden, wenn diese Beobachtungen gespeichert werden. Durch die Zunahme der Anzahl der IoT-GerĂ€te wird eine große Menge an Streaming- und historischen Beobachtungsdaten erzeugt. DarĂŒber hinaus wurden mehrere Ontologien, wie die Semantic Sensor Network (SSN) Ontologie, fĂŒr die semantische Annotation von Beobachtungsdaten vorgeschlagen - entweder Stream oder historisch. Das Resource Description Framework (RDF) ist ein weit verbreitetes Datenmodell zur semantischen Beschreibung der DatensĂ€tze. Semantische Annotation bietet ein gemeinsames VerstĂ€ndnis fĂŒr die Verarbeitung und Analyse von Beobachtungsdaten. Durch das HinzufĂŒgen von Semantik wird die DatengrĂ¶ĂŸe jedoch weiter erhöht, insbesondere wenn die Beobachtungswerte von mehreren GerĂ€ten redundant erfasst werden. So können beispielsweise mehrere Sensoren Beobachtungen erzeugen, die den gleichen Wert fĂŒr die relative Luftfeuchtigkeit in einem bestimmten Zeitstempel und einer bestimmten Stadt anzeigen. Diese Situation kann in einem RDF-Diagramm mit vier RDF-Tripel dargestellt werden, wobei Beobachtungen als Tripel dargestellt werden, die das beobachtete PhĂ€nomen, die Maßeinheit, den Zeitstempel und die Koordinaten beschreiben. Die RDF-Tripel einer Beobachtung sind mit dem gleichen Thema verbunden. Solche Beobachtungen teilen sich die gleichen Objekte in einer bestimmten Gruppe von Eigenschaften, d.h. sie entsprechen einem Sternmuster, das sich aus diesen Eigenschaften und Objekten zusammensetzt. Wenn die Anzahl dieser SubjektentitĂ€ten oder Eigenschaften in diesen Sternmustern groß ist, wird die GrĂ¶ĂŸe des RDF-Diagramms und der Abfrageverarbeitung negativ beeinflusst; wir bezeichnen diese Sternmuster als hĂ€ufige Sternmuster. Diese Arbeit befasst sich mit dem Problem der Identifizierung von hĂ€ufigen Sternenmustern in RDF-Diagrammen und entwickelt Berechnungsmethoden, um hĂ€ufige Sternmuster zu identifizieren und ein faktorisiertes RDF-Diagramm zu erzeugen, bei dem die Anzahl der hĂ€ufigen Sternmuster minimiert wird. DarĂŒber hinaus wenden wir diese faktorisierten RDF-Darstellungen ĂŒber historische semantische Sensordaten an, die mit der SSN-Ontologie beschrieben werden, und prĂ€sentieren tabellarische Darstellungen von faktorisierten semantischen Sensordaten, um Big Data-Frameworks auszunutzen. DarĂŒber hinaus entwickelt diese Arbeit einen wissensbasierten Ansatz namens DESERT, der in der Lage ist, bei Bedarf Streamdaten zu faktorisieren und semantisch anzureichern (on-Demand factorizE and Semantically Enrich stReam daTa). Wir bewerten die Leistung unserer vorgeschlagenen Techniken anhand mehrerer RDF-Diagramm-Benchmarks. Die Ergebnisse zeigen, dass unsere Techniken in der Lage sind, hĂ€ufige Sternmuster effektiv und effizient zu erkennen, und die GrĂ¶ĂŸe der RDF-Diagramme kann um bis zu 66,56% reduziert werden, wĂ€hrend die im ursprĂŒnglichen RDF-Diagramm dargestellten Daten erhalten bleiben. DarĂŒber hinaus sind die kompakten Darstellungen in der Lage, die Anzahl der RDF-Tripel um mindestens 53,25% in historischen Beobachtungsdaten und bis zu 94,34% in Beobachtungsdatenströmen zu reduzieren. DarĂŒber hinaus reduzieren die Ergebnisse der Anfrageauswertung ĂŒber historische Daten die AusfĂŒhrungszeit der Anfrage um bis zu drei GrĂ¶ĂŸenordnungen. In Beobachtungsdatenströmen wird die GrĂ¶ĂŸe der zur Beantwortung der Anfrage benötigten Daten um 92,53% reduziert, wodurch der Speicherplatzbedarf zur Beantwortung der Anfragen reduziert wird. Diese Ergebnisse belegen, dass IoT-Daten mit den vorgeschlagenen kompakten Darstellungen effizient dargestellt werden können, wodurch die negativen Auswirkungen semantischer Annotationen auf das IoT-Datenmanagement reduziert werden.The Internet of Things (IoT) concept has been widely adopted in several domains to enable devices to interact with each other and perform certain tasks. IoT devices encompass different concepts, e.g., sensors, programs, computers, and actuators. IoT devices observe their surroundings to collect information and communicate with each other in order to perform mutual tasks. These devices continuously generate observational data streams, which become historical data when these observations are stored. Due to an increase in the number of IoT devices, a large amount of streaming and historical observational data is being produced. Moreover, several ontologies, like the Semantic Sensor Network (SSN) Ontology, have been proposed for semantic annotation of observational data-either streams or historical. Resource Description Framework (RDF) is widely adopted data model to semantically describe the datasets. Semantic annotation provides a shared understanding for processing and analysis of observational data. However, adding semantics, further increases the data size especially when the observation values are redundantly sensed by several devices. For example, several sensors can generate observations indicating the same value for relative humidity in a given timestamp and city. This situation can be represented in an RDF graph using four RDF triples where observations are represented as triples that describe the observed phenomenon, the unit of measurement, the timestamp, and the coordinates. The RDF triples of an observation are associated with the same subject. Such observations share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these subject entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. This thesis addresses the problem of identifying frequent star patterns in RDF graphs and develop computational methods to identify frequent star patterns and generate a factorized RDF graph where the number of frequent star patterns is minimized. Furthermore, we apply these factorized RDF representations over historical semantic sensor data described using the SSN ontology and present tabular-based representations of factorized semantic sensor data in order to exploit Big Data frameworks. In addition, this thesis devises a knowledge-driven approach named DESERT that is able to on-Demand factorizE and Semantically Enrich stReam daTa. We evaluate the performance of our proposed techniques on several RDF graph benchmarks. The outcomes show that our techniques are able to effectively and efficiently detect frequent star patterns and RDF graph size can be reduced by up to 66.56% while data represented in the original RDF graph is preserved. Moreover, the compact representations are able to reduce the number of RDF triples by at least 53.25% in historical observational data and upto 94.34% in observational data streams. Additionally, query evaluation results over historical data reduce query execution time by up to three orders of magnitude. In observational data streams the size of the data required to answer the query is reduced by 92.53% reducing the memory space requirements to answer the queries. These results provide evidence that IoT data can be efficiently represented using the proposed compact representations, reducing thus, the negative impact that semantic annotations may have on IoT data management

    A catalog of stream processing optimizations

    Get PDF
    Cataloged from PDF version of article.Various research communities have independently arrived at stream processing as a programming model for efficient and parallel computing. These communities include digital signal processing, databases, operating systems, and complex event processing. Since each community faces applications with challenging performance requirements, each of them has developed some of the same optimizations, but often with conflicting terminology and unstated assumptions. This article presents a survey of optimizations for stream processing. It is aimed both at users who need to understand and guide the system's optimizer and at implementers who need to make engineering tradeoffs. To consolidate terminology, this article is organized as a catalog, in a style similar to catalogs of design patterns or refactorings. To make assumptions explicit and help understand tradeoffs, each optimization is presented with its safety constraints (when does it preserve correctness?) and a profitability experiment (when does it improve performance?). We hope that this survey will help future streaming system builders to stand on the shoulders of giants from not just their own community. © 2014 ACM

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    The MASSIF platform : a modular and semantic platform for the development of flexible IoT services

    Get PDF
    In the Internet of Things (IoT), data-producing entities sense their environment and transmit these observations to a data processing platform for further analysis. Applications can have a notion of context awareness by combining this sensed data, or by processing the combined data. The processes of combining data can consist both of merging the dynamic sensed data, as well as fusing the sensed data with background and historical data. Semantics can aid in this task, as they have proven their use in data integration, knowledge exchange and reasoning. Semantic services performing reasoning on the integrated sensed data, combined with background knowledge, such as profile data, allow extracting useful information and support intelligent decision making. However, advanced reasoning on the combination of this sensed data and background knowledge is still hard to achieve. Furthermore, the collaboration between semantic services allows to reach complex decisions. The dynamic composition of such collaborative workflows that can adapt to the current context, has not received much attention yet. In this paper, we present MASSIF, a data-driven platform for the semantic annotation of and reasoning on IoT data. It allows the integration of multiple modular reasoning services that can collaborate in a flexible manner to facilitate complex decision-making processes. Data-driven workflows are enabled by letting services specify the data they would like to consume. After thorough processing, these services can decide to share their decisions with other consumers. By defining the data these services would like to consume, they can operate on a subset of data, improving reasoning efficiency. Furthermore, each of these services can integrate the consumed data with background knowledge in its own context model, for rapid intelligent decision making. To show the strengths of the platform, two use cases are detailed and thoroughly evaluated

    User-centric privacy preservation in Internet of Things Networks

    Get PDF
    Recent trends show how the Internet of Things (IoT) and its services are becoming more omnipresent and popular. The end-to-end IoT services that are extensively used include everything from neighborhood discovery to smart home security systems, wearable health monitors, and connected appliances and vehicles. IoT leverages different kinds of networks like Location-based social networks, Mobile edge systems, Digital Twin Networks, and many more to realize these services. Many of these services rely on a constant feed of user information. Depending on the network being used, how this data is processed can vary significantly. The key thing to note is that so much data is collected, and users have little to no control over how extensively their data is used and what information is being used. This causes many privacy concerns, especially for a na ̈ıve user who does not know the implications and consequences of severe privacy breaches. When designing privacy policies, we need to understand the different user data types used in these networks. This includes user profile information, information from their queries used to get services (communication privacy), and location information which is much needed in many on-the-go services. Based on the context of the application, and the service being provided, the user data at risk and the risks themselves vary. First, we dive deep into the networks and understand the different aspects of privacy for user data and the issues faced in each such aspect. We then propose different privacy policies for these networks and focus on two main aspects of designing privacy mechanisms: The quality of service the user expects and the private information from the user’s perspective. The novel contribution here is to focus on what the user thinks and needs instead of fixating on designing privacy policies that only satisfy the third-party applications’ requirement of quality of service

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)
    • 

    corecore