1,077 research outputs found

    Parallel and distributed clustering framework for big spatial data mining

    Get PDF
    Clustering techniques are very attractive for identifying and extracting patterns of interests from datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality, heterogeneity, and high complexity of some algorithms. Distributed clustering techniques constitute a very good alternative to the Big Data challenges (e.g., Volume, Variety, Veracity, and Velocity). In this paper, we developed and implemented a Dynamic Parallel and Distributed clustering (DPDC) approach that can analyse Big Data within a reasonable response time and produce accurate results, by using existing and current computing and storage infrastructure, such as cloud computing. The DPDC approach consists of two phases. The first phase is fully parallel and it generates local clusters and the second phase aggregates the local results to obtain global clusters. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. DPDC was thoroughly tested and compared to well-known clustering algorithms BIRCH and CURE. The results show that the approach not only produces high-quality results but also scales up very well by taking advantage of the Hadoop MapReduce paradigm or any distributed system

    Dynamic re-optimization techniques for stream processing engines and object stores

    Get PDF
    Large scale data storage and processing systems are strongly motivated by the need to store and analyze massive datasets. The complexity of a large class of these systems is rooted in their distributed nature, extreme scale, need for real-time response, and streaming nature. The use of these systems on multi-tenant, cloud environments with potential resource interference necessitates fine-grained monitoring and control. In this dissertation, we present efficient, dynamic techniques for re-optimizing stream-processing systems and transactional object-storage systems.^ In the context of stream-processing systems, we present VAYU, a per-topology controller. VAYU uses novel methods and protocols for dynamic, network-aware tuple-routing in the dataflow. We show that the feedback-driven controller in VAYU helps achieve high pipeline throughput over long execution periods, as it dynamically detects and diagnoses any pipeline-bottlenecks. We present novel heuristics to optimize overlays for group communication operations in the streaming model.^ In the context of object-storage systems, we present M-Lock, a novel lock-localization service for distributed transaction protocols on scale-out object stores to increase transaction throughput. Lock localization refers to dynamic migration and partitioning of locks across nodes in the scale-out store to reduce cross-partition acquisition of locks. The service leverages the observed object-access patterns to achieve lock-clustering and deliver high performance. We also present TransMR, a framework that uses distributed, transactional object stores to orchestrate and execute asynchronous components in amorphous data-parallel applications on scale-out architectures

    Optimizing Virtual Machine I/O Performance in Cloud Environments

    Get PDF
    Maintaining closeness between data sources and data consumers is crucial for workload I/O performance. In cloud environments, this kind of closeness can be violated by system administrative events and storage architecture barriers. VM migration events are frequent in cloud environments. VM migration changes VM runtime inter-connection or cache contexts, significantly degrading VM I/O performance. Virtualization is the backbone of cloud platforms. I/O virtualization adds additional hops to workload data access path, prolonging I/O latencies. I/O virtualization overheads cap the throughput of high-speed storage devices and imposes high CPU utilizations and energy consumptions to cloud infrastructures. To maintain the closeness between data sources and workloads during VM migration, we propose Clique, an affinity-aware migration scheduling policy, to minimize the aggregate wide area communication traffic during storage migration in virtual cluster contexts. In host-side caching contexts, we propose Successor to recognize warm pages and prefetch them into caches of destination hosts before migration completion. To bypass the I/O virtualization barriers, we propose VIP, an adaptive I/O prefetching framework, which utilizes a virtual I/O front-end buffer for prefetching so as to avoid the on-demand involvement of I/O virtualization stacks and accelerate the I/O response. Analysis on the traffic trace of a virtual cluster containing 68 VMs demonstrates that Clique can reduce inter-cloud traffic by up to 40%. Tests of MPI Reduce_scatter benchmark show that Clique can keep VM performance during migration up to 75% of the non-migration scenario, which is more than 3 times of the Random VM choosing policy. In host-side caching environments, Successor performs better than existing cache warm-up solutions and achieves zero VM-perceived cache warm-up time with low resource costs. At system level, we conducted comprehensive quantitative analysis on I/O virtualization overheads. Our trace replay based simulation demonstrates the effectiveness of VIP for data prefetching with ignorable additional cache resource costs

    Privacy-preserving data analytics in cloud computing

    Get PDF
    The evolution of digital content and rapid expansion of data sources has raised the need for streamlined monitoring, collection, storage and analysis of massive, heterogeneous data to extract useful knowledge and support decision-making mechanisms. In this context, cloud computing o↵ers extensive, cost-e↵ective and on demand computing resources that improve the quality of services for users and also help service providers (enterprises, governments and individuals). Service providers can avoid the expense of acquiring and maintaining IT resources while migrating data and remotely managing processes including aggregation, monitoring and analysis in cloud servers. However, privacy and security concerns of cloud computing services, especially in storing sensitive data (e.g. personal, healthcare and financial) are major challenges to the adoption of these services. To overcome such barriers, several privacy-preserving techniques have been developed to protect outsourced data in the cloud. Cryptography is a well-known mechanism that can ensure data confidentiality in the cloud. Traditional cryptography techniques have the ability to protect the data through encryption in cloud servers and data owners can retrieve and decrypt data for their processing purposes. However, in this case, cloud users can use the cloud resources for data storage but they cannot take full advantage of cloud-based processing services. This raises the need to develop advanced cryptosystems that can protect data privacy, both while in storage and in processing in the cloud. Homomorphic Encryption (HE) has gained attention recently because it can preserve the privacy of data while it is stored and processed in the cloud servers and data owners can retrieve and decrypt their processed data to their own secure side. Therefore, HE o↵ers an end-to-end security mechanism that is a preferable feature in cloud-based applications. In this thesis, we developed innovative privacy-preserving cloud-based models based on HE cryptosystems. This allowed us to build secure and advanced analytic models in various fields. We began by designing and implementing a secure analytic cloud-based model based on a lightweight HE cryptosystem. We used a private resident cloud entity, called ”privacy manager”, as an intermediate communication server between data owners and public cloud servers. The privacy manager handles analytical tasks that cannot be accomplished by the lightweight HE cryptosystem. This model is convenient for several application domains that require real-time responses. Data owners delegate their processing tasks to the privacy manager, which then helps to automate analysis tasks without the need to interact with data owners. We then developed a comprehensive, secure analytical model based on a Fully Homomorphic Encryption (FHE), that has more computational capability than the lightweight HE. Although FHE can automate analysis tasks and avoid the use of the privacy manager entity, it also leads to massive computational overhead. To overcome this issue, we took the advantage of the massive cloud resources by designing a MapReduce model that massively parallelises HE analytical tasks. Our parallelisation approach significantly speeds up the performance of analysis computations based on FHE. We then considered distributed analytic models where the data is generated from distributed heterogeneous sources such as healthcare and industrial sensors that are attached to people or installed in a distributed-based manner. We developed a secure distributed analytic model by re-designing several analytic algorithms (centroid-based and distribution-based clustering) to adapt them into a secure distributed-based models based on FHE. Our distributed analytic model was developed not only for distributed-based applications, but also it eliminates FHE overhead obstacle by achieving high efficiency in FHE computations. Furthermore, the distributed approach is scalable across three factors: analysis accuracy, execution time and the amount of resources used. This scalability feature enables users to consider the requirements of their analysis tasks based on these factors (e.g. users may have limited resources or time constrains to accomplish their analysis tasks). Finally, we designed and implemented two privacy-preserving real-time cloud-based applications to demonstrate the capabilities of HE cryptosystems, in terms of both efficiency and computational capabilities for applications that require timely and reliable delivery of services. First, we developed a secure cloud-based billing model for a sensor-enabled smart grid infrastructure by using lightweight HE. This model handled billing analysis tasks for individual users in a secure manner without the need to interact with any trusted parties. Second, we built a real-time secure health surveillance model for smarter health communities in the cloud. We developed a secure change detection model based on an exponential smoothing technique to predict future changes in health vital signs based on FHE. Moreover, we built an innovative technique to parallelise FHE computations which significantly reduces computational overhead

    Enhancing Confidentiality and Privacy Preservation in e-Health to Enhanced Security

    Get PDF
    Electronic health (e-health) system use is growing, which has improved healthcare services significantly but has created questions about the privacy and security of sensitive medical data. This research suggests a novel strategy to overcome these difficulties and strengthen the security of e-health systems while maintaining the privacy and confidentiality of patient data by utilising machine learning techniques. The security layers of e-health systems are strengthened by the comprehensive framework we propose in this paper, which incorporates cutting-edge machine learning algorithms. The suggested framework includes data encryption, access control, and anomaly detection as its three main elements. First, to prevent unauthorised access during transmission and storage, patient data is secured using cutting-edge encryption technologies. Second, to make sure that only authorised staff can access sensitive medical records, access control mechanisms are strengthened using machine learning models that examine user behaviour patterns. This research's inclusion of machine learning-based anomaly detection is its most inventive feature. The technology may identify variations from typical data access and usage patterns, thereby quickly spotting potential security breaches or unauthorised activity, by training models on past e-health data. This proactive strategy improves the system's capacity to successfully address new threats. Extensive experiments were carried out employing a broad dataset made up of real-world e-health scenarios to verify the efficacy of the suggested approach. The findings showed a marked improvement in the protection of confidentiality and privacy, along with a considerable decline in security breaches and unauthorised access events
    corecore