2,594 research outputs found

    Efficient memory management in VOD disk array servers usingPer-Storage-Device buffering

    Get PDF
    We present a buffering technique that reduces video-on-demand server memory requirements in more than one order of magnitude. This technique, Per-Storage-Device Buffering (PSDB), is based on the allocation of a fixed number of buffers per storage device, as opposed to existing solutions based on per-stream buffering allocation. The combination of this technique with disk array servers is studied in detail, as well as the influence of Variable Bit Streams. We also present an interleaved data placement strategy, Constant Time Length Declustering, that results in optimal performance in the service of VBR streams. PSDB is evaluated by extensive simulation of a disk array server model that incorporates a simulation based admission test.This research was supported in part by the National R&D Program of Spain, Project Number TIC97-0438.Publicad

    Distributed multimedia systems

    Get PDF
    A distributed multimedia system (DMS) is an integrated communication, computing, and information system that enables the processing, management, delivery, and presentation of synchronized multimedia information with quality-of-service guarantees. Multimedia information may include discrete media data, such as text, data, and images, and continuous media data, such as video and audio. Such a system enhances human communications by exploiting both visual and aural senses and provides the ultimate flexibility in work and entertainment, allowing one to collaborate with remote participants, view movies on demand, access on-line digital libraries from the desktop, and so forth. In this paper, we present a technical survey of a DMS. We give an overview of distributed multimedia systems, examine the fundamental concept of digital media, identify the applications, and survey the important enabling technologies.published_or_final_versio

    Indexing Metric Spaces for Exact Similarity Search

    Full text link
    With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

    Techniques for improving efficiency and scalability for the integration of information retrieval and databases

    Get PDF
    PhDThis thesis is on the topic of integration of Information Retrieval (IR) and Databases (DB), with particular focuses on improving efficiency and scalability of integrated IR and DB technology (IR+DB). The main purpose of this study is to develop efficient and scalable techniques for supporting integrated IR and DB technology, which is a popular approach today for handling complex queries over text and structured data. Our specific interest in this thesis is how to efficiently handle queries over large-scale text and structured data. The work is based on a technology that integrates probability theory and relational algebra, where retrievals for text and data are to be expressed in probabilistic logical programs such as probabilistic relational algebra or probabilistic Datalog. To support efficient processing of probabilistic logical programs, we proposed three optimization techniques that focus on aspects covered logical and physical layers, which include: scoring-driven query optimization using scoring expression, query processing with top-k incorporated pipeline, and indexing with relational inverted index. Specifically, scoring expressions are proposed for expressing the scoring or probabilistic semantics of implied scoring functions of PRA expressions, so that efficient query execution plan can be generated by rule-based scoring-driven optimizer. Secondly, to balance efficiency and effectiveness so that to improve query response time, we studied methods for incorporating topk algorithms into pipelined query execution engine for IR+DB systems. Thirdly, the proposed relational inverted index integrates IR-style inverted index and DB-style tuple-based index, which can be used to support efficient probability estimation and aggregation as well as conventional relational operations. Experiments were carried out to investigate the performances of proposed techniques. Experimental results showed that the efficiency and scalability of an IR+DB prototype have been improved, while the system can handle queries efficiently on considerable large data sets for a number of IR tasks

    Sixth Goddard Conference on Mass Storage Systems and Technologies Held in Cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems

    Get PDF
    This document contains copies of those technical papers received in time for publication prior to the Sixth Goddard Conference on Mass Storage Systems and Technologies which is being held in cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems at the University of Maryland-University College Inn and Conference Center March 23-26, 1998. As one of an ongoing series, this Conference continues to provide a forum for discussion of issues relevant to the management of large volumes of data. The Conference encourages all interested organizations to discuss long term mass storage requirements and experiences in fielding solutions. Emphasis is on current and future practical solutions addressing issues in data management, storage systems and media, data acquisition, long term retention of data, and data distribution. This year's discussion topics include architecture, tape optimization, new technology, performance, standards, site reports, vendor solutions. Tutorials will be available on shared file systems, file system backups, data mining, and the dynamics of obsolescence

    Fourth NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This report contains copies of all those technical papers received in time for publication just prior to the Fourth Goddard Conference on Mass Storage and Technologies, held March 28-30, 1995, at the University of Maryland, University College Conference Center, in College Park, Maryland. This series of conferences continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include new storage technology, stability of recorded media, performance studies, storage system solutions, the National Information infrastructure (Infobahn), the future for storage technology, and lessons learned from various projects. There also will be an update on the IEEE Mass Storage System Reference Model Version 5, on which the final vote was taken in July 1994

    Large Scale Hierarchical K-Means Based Image Retrieval With MapReduce

    Get PDF
    Image retrieval remains one of the most heavily researched areas in Computer Vision. Image retrieval methods have been used in autonomous vehicle localization research, object recognition applications, and commercially in projects such as Google Glass. Current methods for image retrieval become problematic when implemented on image datasets that can easily reach billions of images. In order to process these growing datasets, we distribute the necessary computation for image retrieval among a cluster of machines using Apache Hadoop. While there are many techniques for image retrieval, we focus on systems that use Hierarchical K-Means Trees. Successful image retrieval systems based on Hierarchical K-Means Trees have been built using the tree as a Visual Vocabulary to build an Inverted File Index and implementing a Bag of Words retrieval approach, or by building the tree as a Full Representation of every image in the database and implementing a K-Nearest Neighbor voting scheme for retrieval. Both approaches involve different levels of approximation, and each has strengths and weaknesses that must be weighed in accordance with the needs of the application. Both approaches are implemented with MapReduce, for the first time, and compared in terms of image retrieval precision, index creation run-time, and image retrieval throughput. Experiments that include up to 2 million images running on 20 virtual machines are shown

    ATOM : a distributed system for video retrieval via ATM networks

    Get PDF
    The convergence of high speed networks, powerful personal computer processors and improved storage technology has led to the development of video-on-demand services to the desktop that provide interactive controls and deliver Client-selected video information on a Client-specified schedule. This dissertation presents the design of a video-on-demand system for Asynchronous Transfer Mode (ATM) networks, incorporating an optimised topology for the nodes in the system and an architecture for Quality of Service (QoS). The system is called ATOM which stands for Asynchronous Transfer Mode Objects. Real-time video playback over a network consumes large bandwidth and requires strict bounds on delay and error in order to satisfy the visual and auditory needs of the user. Streamed video is a fundamentally different type of traffic to conventional IP (Internet Protocol) data since files are viewed in real-time, not downloaded and then viewed. This streaming data must arrive at the Client decoder when needed or it loses its interactive value. Characteristics of multimedia data are investigated including the use of compression to reduce the excessive bit rates and storage requirements of digital video. The suitability of MPEG-1 for video-on-demand is presented. Having considered the bandwidth, delay and error requirements of real-time video, the next step in designing the system is to evaluate current models of video-on-demand. The distributed nature of four such models is considered, focusing on how Clients discover Servers and locate videos. This evaluation eliminates a centralized approach in which Servers have no logical or physical connection to any other Servers in the network and also introduces the concept of a selection strategy to find alternative Servers when Servers are fully loaded. During this investigation, it becomes clear that another entity (called a Broker) could provide a central repository for Server information. Clients have logical access to all videos on every Server simply by connecting to a Broker. The ATOM Model for distributed video-on-demand is then presented by way of a diagram of the topology showing the interconnection of Servers, Brokers and Clients; a description of each node in the system; a list of the connectivity rules; a description of the protocol; a description of the Server selection strategy and the protocol if a Broker fails. A sample network is provided with an example of video selection and design issues are raised and solved including how nodes discover each other, a justification for using a mesh topology for the Broker connections, how Connection Admission Control (CAC) is achieved, how customer billing is achieved and how information security is maintained. A calculation of the number of Servers and Brokers required to service a particular number of Clients is presented. The advantages of ATOM are described. The underlying distributed connectivity is abstracted away from the Client. Redundant Server/Broker connections are eliminated and the total number of connections in the system are minimized by the rule stating that Clients and Servers may only connect to one Broker at a time. This reduces the total number of Switched Virtual Circuits (SVCs) which are a performance hindrance in ATM. ATOM can be easily scaled by adding more Servers which increases the total system capacity in terms of storage and bandwidth. In order to transport video satisfactorily, a guaranteed end-to-end Quality of Service architecture must be in place. The design methodology for such an architecture is investigated starting with a review of current QoS architectures in the literature which highlights important definitions including a flow, a service contract and flow management. A flow is a single media source which traverses resource modules between Server and Client. The concept of a flow is important because it enables the identification of the areas requiring consideration when designing a QoS architecture. It is shown that ATOM adheres to the principles motivating the design of a QoS architecture, namely the Integration, Separation and Transparency principles. The issue of mapping human requirements to network QoS parameters is investigated and the action of a QoS framework is introduced, including several possible causes of QoS degradation. The design of the ATOM Quality of Service Architecture (AQOSA) is then presented. AQOSA consists of 11 modules which interact to provide end-to-end QoS guarantees for each stream. Several important results arise from the design. It is shown that intelligent choice of stored videos in respect of peak bandwidth can improve overall system capacity. The concept of disk striping over a disk array is introduced and a Data Placement Strategy is designed which eliminates disk hot spots (i.e. Overuse of some disks whilst others lie idle.) A novel parameter (the B-P Ratio) is presented which can be used by the Server to predict future bursts from each video stream. The use of Traffic Shaping to decrease the load on the network from each stream is presented. Having investigated four algorithms for rewind and fast-forward in the literature, a rewind and fast-forward algorithm is presented. The method produces a significant decrease in bandwidth, and the resultant stream is very constant, reducing the chance that the stream will add to network congestion. The C++ classes of the Server, Broker and Client are described emphasizing the interaction between classes. The use of ATOM in the Virtual Private Network and the multimedia teaching laboratory is considered. Conclusions and recommendations for future work are presented. It is concluded that digital video applications require high bandwidth, low error, low delay networks; a video-on-demand system to support large Client volumes must be distributed, not centralized; control and operation (transport) must be separated; the number of ATM Switched Virtual Circuits (SVCs) must be minimized; the increased connections caused by the Broker mesh is justified by the distributed information gain; a Quality of Service solution must address end-to-end issues. It is recommended that a web front-end for Brokers be developed; the system be tested in a wide area A TM network; the Broker protocol be tested by forcing failure of a Broker and that a proprietary file format for disk striping be implemented
    corecore