830 research outputs found

    A practical and secure multi-keyword search method over encrypted cloud data

    Get PDF
    Cloud computing technologies become more and more popular every year, as many organizations tend to outsource their data utilizing robust and fast services of clouds while lowering the cost of hardware ownership. Although its benefits are welcomed, privacy is still a remaining concern that needs to be addressed. We propose an efficient privacy-preserving search method over encrypted cloud data that utilizes minhash functions. Most of the work in literature can only support a single feature search in queries which reduces the effectiveness. One of the main advantages of our proposed method is the capability of multi-keyword search in a single query. The proposed method is proved to satisfy adaptive semantic security definition. We also combine an effective ranking capability that is based on term frequency-inverse document frequency (tf-idf) values of keyword document pairs. Our analysis demonstrates that the proposed scheme is proved to be privacy-preserving, efficient and effective

    A retrieval-based dialogue system utilizing utterance and context embeddings

    Get PDF
    Finding semantically rich and computer-understandable representations for textual dialogues, utterances and words is crucial for dialogue systems (or conversational agents), as their performance mostly depends on understanding the context of conversations. Recent research aims at finding distributed vector representations (embeddings) for words, such that semantically similar words are relatively close within the vector-space. Encoding the "meaning" of text into vectors is a current trend, and text can range from words, phrases and documents to actual human-to-human conversations. In recent research approaches, responses have been generated utilizing a decoder architecture, given the vector representation of the current conversation. In this paper, the utilization of embeddings for answer retrieval is explored by using Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor (ANN) model, to find similar conversations in a corpus and rank possible candidates. Experimental results on the well-known Ubuntu Corpus (in English) and a customer service chat dataset (in Dutch) show that, in combination with a candidate selection method, retrieval-based approaches outperform generative ones and reveal promising future research directions towards the usability of such a system.Comment: A shorter version is accepted at ICMLA2017 conference; acknowledgement added; typos correcte

    An Overlay Architecture for Personalized Object Access and Sharing in a Peer-to-Peer Environment

    Get PDF
    Due to its exponential growth and decentralized nature, the Internet has evolved into a chaotic repository, making it difficult for users to discover and access resources of interest to them. As a result, users have to deal with the problem of information overload. The Semantic Web's emergence provides Internet users with the ability to associate explicit, self-described semantics with resources. This ability will facilitate in turn the development of ontology-based resource discovery tools to help users retrieve information in an efficient manner. However, it is widely believed that the Semantic Web of the future will be a complex web of smaller ontologies, mostly created by various groups of web users who share a similar interest, referred to as a Community of Interest. This thesis proposes a solution to the information overload problem using a user driven framework, referred to as a Personalized Web, that allows individual users to organize themselves into Communities of Interests based on ontologies agreed upon by all community members. Within this framework, users can define and augment their personalized views of the Internet by associating specific properties and attributes to resources and defining constraint-functions and rules that govern the interpretation of the semantics associated with the resources. Such views can then be used to capture the user's interests and integrate these views into a user-defined Personalized Web. As a proof of concept, a Personalized Web architecture that employs ontology-based semantics and a structured Peer-to-Peer overlay network to provide a foundation of semantically-based resource indexing and advertising is developed. In order to investigate mechanisms that support the resource advertising and retrieval of the Personalized Web architecture, three agent-driven advertising and retrieval schemes, the Aggressive scheme, the Crawler-based scheme, and the Minimum-Cover-Rule scheme, were implemented and evaluated in both stable and churn environments. In addition to the development of a Personalized Web architecture that deals with typical web resources, this thesis used a case study to explore the potential of the Personalized Web architecture to support future web service workflow applications. The results of this investigation demonstrated that the architecture can support the automation of service discovery, negotiation, and invocation, allowing service consumers to actualize a personalized web service workflow. Further investigation will be required to improve the performance of the automation and allow it to be performed in a secure and robust manner. In order to support the next generation Internet, further exploration will be needed for the development of a Personalized Web that includes ubiquitous and pervasive resources

    Structured P2P Technologies for Distributed Command and Control

    Get PDF
    The utility of Peer-to-Peer (P2P) systems extends far beyond traditional file sharing. This paper provides an overview of how P2P systems are capable of providing robust command and control for Distributed Multi-Agent Systems (DMASs). Specifically, this article presents the evolution of P2P architectures to date by discussing supporting technologies and applicability of each generation of P2P systems. It provides a detailed survey of fundamental design approaches found in modern large-scale P2P systems highlighting design considerations for building and deploying scalable P2P applications. The survey includes unstructured P2P systems, content retrieval systems, communications structured P2P systems, flat structured P2P systems and finally Hierarchical Peer-to-Peer (HP2P) overlays. It concludes with a presentation of design tradeoffs and opportunities for future research into P2P overlay systems

    Emergent relational schemas for RDF

    Get PDF

    Audio-Based Retrieval of Musical Score Data

    Get PDF
    Given an audio query, such as polyphonic musical piece, this thesis address the problem of retrieving a matching (similar) musical score data from a collection of musical scores. There are different techniques for measuring similarity between any musical piece such as metadata based similarity measure, collaborative filtering and content-based similarity measure. In this thesis, we use the information in the digital music itself for similarity measures and this technique is known as content-based similarity measure. First we extract chroma features to represents musical segments. Chroma feature captures both melodic information and harmonic information and is robust to timbre variation. Tempo variation in the performance of a same song may cause dissimilarity between them. In order to address this issue we extract beat sequences and combine them with chroma features to obtain beat synchronous chroma features. Next, we use Dynamic Time Warping (DTW) algorithm. This algorithm first computes the DTW matrix between two feature sequences and calculates the cost of traversing from starting point to end point of the matrix. Minimum the cost value, more similar the musical segments are. The performance of DTW is improved by choosing suitable path constraints and path weight. Then, we implement LSH algorithm, which first indexes the data and then searches for a similar item. Processing time of LSH is shorter than that of DTW. For a smaller fragment of query audio, say 30 seconds, LSH outperformed DTW. Performance of LSH depends on the number of hash tables, number of projections per table and width of the projection. Both algorithms were applied in two types of data sets, RWC (where audio and midi are from the same source) and TUT (where audio and midi are from different sources). The contribution of this thesis is twofold. First we proposed a suitable feature representation of a musical segment for melodic similarity. And then we apply two different similarity measure algorithms and enhance their performances. This thesis work also includes development of mobile application capable of recording audio from surroundings and displaying its acoustic features in real time

    Multimedia

    Get PDF
    The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications

    Service Abstractions for Scalable Deep Learning Inference at the Edge

    Get PDF
    Deep learning driven intelligent edge has already become a reality, where millions of mobile, wearable, and IoT devices analyze real-time data and transform those into actionable insights on-device. Typical approaches for optimizing deep learning inference mostly focus on accelerating the execution of individual inference tasks, without considering the contextual correlation unique to edge environments and the statistical nature of learning-based computation. Specifically, they treat inference workloads as individual black boxes and apply canonical system optimization techniques, developed over the last few decades, to handle them as yet another type of computation-intensive applications. As a result, deep learning inference on edge devices still face the ever increasing challenges of customization to edge device heterogeneity, fuzzy computation redundancy between inference tasks, and end-to-end deployment at scale. In this thesis, we propose the first framework that automates and scales the end-to-end process of deploying efficient deep learning inference from the cloud to heterogeneous edge devices. The framework consists of a series of service abstractions that handle DNN model tailoring, model indexing and query, and computation reuse for runtime inference respectively. Together, these services bridge the gap between deep learning training and inference, eliminate computation redundancy during inference execution, and further lower the barrier for deep learning algorithm and system co-optimization. To build efficient and scalable services, we take a unique algorithmic approach of harnessing the semantic correlation between the learning-based computation. Rather than viewing individual tasks as isolated black boxes, we optimize them collectively in a white box approach, proposing primitives to formulate the semantics of the deep learning workloads, algorithms to assess their hidden correlation (in terms of the input data, the neural network models, and the deployment trials) and merge common processing steps to minimize redundancy

    Towards Data Optimization in Storages and Networks

    Get PDF
    Title from PDF of title page, viewed on August 7, 2015Dissertation advisors: Sejun Song and Baek-Young ChoiVitaIncludes bibliographic references (pages 132-140)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2015We are encountering an explosion of data volume, as a study estimates that data will amount to 40 zeta bytes by the end of 2020. This data explosion poses significant burden not only on data storage space but also access latency, manageability, and processing and network bandwidth. However, large portions of the huge data volume contain massive redundancies that are created by users, applications, systems, and communication models. Deduplication is a technique to reduce data volume by removing redundancies. Reliability will be even improved when data is replicated after deduplication. Many deduplication studies such as storage data deduplication and network redundancy elimination have been proposed to reduce storage consumption and network bandwidth consumption. However, existing solutions are not efficient enough to optimize data delivery path from clients to servers through network. Hence we propose a holistic deduplication framework to optimize data in their path. Our deduplication framework consists of three components including data sources or clients, networks, and servers. The client component removes local redundancies in clients, the network component removes redundant transfers coming from different clients, and the server component removes redundancies coming from different networks. We designed and developed components for the proposed deduplication framework. For the server component, we developed the Hybrid Email Deduplication System that achieves a trade-off of space savings and overhead for email systems. For the client component, we developed the Structure Aware File and Email Deduplication for Cloudbased Storage Systems that is very fast as well as having good space savings by using structure-based granularity. For the network component, we developed a system called Software-defined Deduplication as a Network and Storage service that is in-network deduplication, and that chains storage data deduplication and network redundancy elimination functions by using Software Defined Network to achieve both storage space and network bandwidth savings with low processing time and memory size. We also discuss mobile deduplication for image and video files in mobile devices. Through system implementations and experiments, we show that the proposed framework effectively and efficiently optimizes data volume in a holistic manner encompassing the entire data path of clients, networks and storage servers.Introduction -- Deduplication technology -- Existing deduplication approaches -- HEDS: Hybrid Email Deduplication System -- SAFE: Structure-aware File and Email Deduplication for cloud-based storage systems -- SoftDance: Software-defined Deduplication as a Network and Storage Service -- Moblie de-duplication -- Conclusion
    corecore