6,269 research outputs found

    Partial Replica Location And Selection For Spatial Datasets

    Get PDF
    As the size of scientific datasets continues to grow, we will not be able to store enormous datasets on a single grid node, but must distribute them across many grid nodes. The implementation of partial or incomplete replicas, which represent only a subset of a larger dataset, has been an active topic of research. Partial Spatial Replicas extend this functionality to spatial data, allowing us to distribute a spatial dataset in pieces over several locations. We investigate solutions to the partial spatial replica selection problems. First, we describe and develop two designs for an Spatial Replica Location Service (SRLS), which must return the set of replicas that intersect with a query region. Integrating a relational database, a spatial data structure and grid computing software, we build a scalable solution that works well even for several million replicas. In our SRLS, we have improved performance by designing a R-tree structure in the backend database, and by aggregating several queries into one larger query, which reduces overhead. We also use the Morton Space-filling Curve during R-tree construction, which improves spatial locality. In addition, we describe R-tree Prefetching(RTP), which effectively utilizes the modern multi-processor architecture. Second, we present and implement a fast replica selection algorithm in which a set of partial replicas is chosen from a set of candidates so that retrieval performance is maximized. Using an R-tree based heuristic algorithm, we achieve O(n log n) complexity for this NP-complete problem. We describe a model for disk access performance that takes filesystem prefetching into account and is sufficiently accurate for spatial replica selection. Making a few simplifying assumptions, we present a fast replica selection algorithm for partial spatial replicas. The algorithm uses a greedy approach that attempts to maximize performance by choosing a collection of replica subsets that allow fast data retrieval by a client machine. Experiments show that the performance of the solution found by our algorithm is on average always at least 91% and 93.4% of the performance of the optimal solution in 4-node and 8-node tests respectively

    Improving Security and Privacy in Online Social Networks

    Get PDF
    Online social networks (OSNs) have gained soaring popularity and are among the most popular sites on the Web. With OSNs, users around the world establish and strengthen connections by sharing thoughts, activities, photos, locations, and other personal information. However, the immense popularity of OSNs also raises significant security and privacy concerns. Storing millions of users\u27 private information and their social connections, OSNs are susceptible to becoming the target of various attacks. In addition, user privacy will be compromised if the private data collected by OSNs are abused, inadvertently leaked, or under the control of adversaries. as a result, the tension between the value of joining OSNs and the security and privacy risks is rising.;To make OSNs more secure and privacy-preserving, our work follow a bottom-up approach. OSNs are composed of three components, the infrastructure layer, the function layer, and the user data stored on OSNs. For each component of OSNs, in this dissertation, we analyze and address a representative security/privacy issue. Starting from the infrastructure layer of OSNs, we first consider how to improve the reliability of OSN infrastructures, and we propose Fast Mencius, a crash-fault tolerant state machine replication protocol that has low latency and high throughput in wide-area networks. For the function layer of OSNs, we investigate how to prevent the functioning of OSNs from being disturbed by adversaries, and we propose SybilDefender, a centralized sybil defense scheme that can effectively detect sybil nodes by analyzing social network topologies. Finally, we study how to protect user privacy on OSNs, and we propose two schemes. MobiShare is a privacy-preserving location-sharing scheme designed for location-based OSNs (LBSNs), which supports sharing locations between both friends and strangers. LBSNSim is a trace-driven LBSN model that can generate synthetic LBSN datasets used in place of real datasets. Combining our work contributes to improving security and privacy in OSNs

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    ElfStore: A Resilient Data Storage Service for Federated Edge and Fog Resources

    Full text link
    Edge and fog computing have grown popular as IoT deployments become wide-spread. While application composition and scheduling on such resources are being explored, there exists a gap in a distributed data storage service on the edge and fog layer, instead depending solely on the cloud for data persistence. Such a service should reliably store and manage data on fog and edge devices, even in the presence of failures, and offer transparent discovery and access to data for use by edge computing applications. Here, we present Elfstore, a first-of-its-kind edge-local federated store for streams of data blocks. It uses reliable fog devices as a super-peer overlay to monitor the edge resources, offers federated metadata indexing using Bloom filters, locates data within 2-hops, and maintains approximate global statistics about the reliability and storage capacity of edges. Edges host the actual data blocks, and we use a unique differential replication scheme to select edges on which to replicate blocks, to guarantee a minimum reliability and to balance storage utilization. Our experiments on two IoT virtual deployments with 20 and 272 devices show that ElfStore has low overheads, is bound only by the network bandwidth, has scalable performance, and offers tunable resilience.Comment: 24 pages, 14 figures, To appear in IEEE International Conference on Web Services (ICWS), Milan, Italy, 201

    A PARTIAL REPLICATION LOAD BALANCING TECHNIQUE FOR DISTRIBUTED DATA AS A SERVICE ON THE CLOUD

    Get PDF
    Data as a service (DaaS) is an important model on the Cloud, as DaaS provides clients with different types of large files and data sets in fields like finance, science, health, geography, astronomy, and many others. This includes all types of files with varying sizes from a few kilobytes to hundreds of terabytes. DaaS can be implemented and provided using multiple data centers located at different locations and usually connected via the Internet. When data is provided using multiple data centers it is referred to as distributed DaaS. DaaS providers must ensure that their services are fast, reliable, and efficient. However, ensuring these requirements needs to be done while considering the cost associated and will be carried by the DaaS provider and most likely by the users as well. One traditional approach to support a large number of clients is to replicate the services on different servers. However, this requires full replication of all stored data sets, which requires a huge amount of storage. The huge storage consumption will result in increased costs. Therefore, the aim of this research is to provide a fast, efficient distributed DaaS for the clients, while reducing the storage consumption on the Cloud servers used by the DaaS providers. The method I utilize in this research for fast distributed DaaS is the collaborative dual-direction download of a file or dataset partitions from multiple servers to the client, which will enhance the speed of the download process significantly. Moreover, I partially replicate the file partitions among Cloud servers using the previous download experiences I obtain for each partition. As a result, I generate partial sections of the data sets that will collectively be smaller than the total size needed if full replicas are stored on each server. My method is self-managed; and operates only when more storage is needed. I evaluated my approach against other existing approaches and demonstrated that it provides an important enhancement to current approaches in both download performance and storage consumption. I also developed and analyzed the mathematical model supporting my approach and validated its accuracy

    Scalable, Data- intensive Network Computation

    Get PDF
    To enable groups of collaborating researchers at different locations to effectively share large datasets and investigate their spontaneous hypotheses on the fly, we are interested in de- veloping a distributed system that can be easily leveraged by a variety of data intensive applications. The system is composed of (i) a number of best effort logistical depots to en- able large-scale data sharing and in-network data processing, (ii) a set of end-to-end tools to effectively aggregate, manage and schedule a large number of network computations with attendant data movements, and (iii) a Distributed Hash Table (DHT) on top of the generic depot services for scalable data management. The logistical depot is extended by following the end-to-end principles and is modeled with a closed queuing network model. Its performance characteristics are studied by solving the steady state distributions of the model using local balance equations. The modeling results confirm that the wide area network is the performance bottleneck and running concurrent jobs can increase resource utilization and system throughput. As a novel contribution, techniques to effectively support resource demanding data- intensive applications using the ¯ne-grained depot services are developed. These techniques include instruction level scheduling of operations, dynamic co-scheduling of computation and replication, and adaptive workload control. Experiments in volume visualization have proved the effectiveness of these techniques. Due to the unique characteristic of data- intensive applications and our co-scheduling algorithm, a DHT is implemented on top of the basic storage and computation services. It demonstrates the potential of the Logistical Networking infrastructure to serve as a service creation platform

    Accelerated Imaging Techniques for Chemical Shift Magnetic Resonance Imaging

    Get PDF
    Chemical shift imaging is a method for the separation two or more chemical species. The cost of chemical shift encoding is increased acquisition time as multiple acquisitions are acquired at different echo times. Image acceleration techniques, typically parallel imaging, are often used to improve the spatial coverage and resolution. This thesis describes a new technique for estimating the signal to noise ratio for parallel imaging reconstructions and proposes new image reconstructions for accelerated chemical shift imaging using compressed sensing and/or parallel imaging for two applications: water-fat separation and metabolic imaging of hyperpolarized [1-13C] pyruvate. Spatially varying noise in parallel imaging reconstructions makes measurements of the signal to noise ratio, a commonly used metric for image for image quality, difficult. Existing approaches have limitations such as they are not applicable to all reconstructions, require significant computation time, or rely on repeated image acquisitions. A SNR estimation technique is proposed that does not exhibit these limitations. Water-fat imaging of highly undersampled datasets from the liver, calf, knee, and abdominal cavity are demonstrated using a customized IDEAL-SPGR pulse sequence and an integrated compressed sensing, parallel imaging, water-fat reconstruction. This method is shown to offer comparable image quality relative to fully sampled reference images for a range of acceleration factors. At high acceleration factors, this technique is shown to offer improved image quality when compared to the current standard of parallel imaging. Accelerated chemical shift imaging was demonstrated for metabolic of hyperpolarized [1-13C] pyruvate. Pyruvate, lactate, alanine, and bicarbonate images were reconstructed from undersampled datasets. Phantoms were used to validate this technique while retrospectively and prospectively accelerated 3D in vivo datasets were used to demonstrate. Alternatively, acceleration was also achieved through the use of a high performance magnetic field gradient set. This thesis addresses the inherently slow acquisition times of chemical shift imaging by examining the role compressed sensing and parallel imaging can be play in chemical shift imaging. An approach to SNR assessment for parallel imaging reconstruction is proposed and approaches to accelerated chemical shift imaging are described for applications in water-fat imaging and metabolic imaging of hyperpolarized [1-13C] pyruvate
    • …
    corecore