237 research outputs found

    Toward efficient and secure public auditing for dynamic big data storage on cloud

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Cloud and Big Data are two of the most attractive ICT research topics that have emerged in recent years. Requirements of big data processing are now everywhere, while the pay-as-you-go model of cloud systems is especially cost efficient in terms of processing big data applications. However, there are still concerns that hinder the proliferation of cloud, and data security/privacy is a top concern for data owners wishing to migrate their applications into the cloud environment. Compared to users of conventional systems, cloud users need to surrender the local control of their data to cloud servers. Another challenge for big data is the data dynamism which exists in most big data applications. Due to the frequent updates, efficiency becomes a major issue in data management. As security always brings compromises in efficiency, it is difficult but nonetheless important to investigate how to efficiently address security challenges over dynamic cloud data. Data integrity is an essential aspect of data security. Except for server-side integrity protection mechanisms, verification from a third-party auditor is of equal importance because this enables users to verify the integrity of their data through the auditors at any user-chosen timeslot. This type of verification is also named 'public auditing' of data. Existing public auditing schemes allow the integrity of a dataset stored in cloud to be externally verified without retrieval of the whole original dataset. However, in practice, there are many challenges that hinder the application of such schemes. To name a few of these, first, the server still has to aggregate a proof with the cloud controller from data blocks that are distributedly stored and processed on cloud instances and this means that encryption and transfer of these data within the cloud will become time-consuming. Second, security flaws exist in the current designs. The verification processes are insecure against various attacks and this leads to concerns about deploying these schemes in practice. Third, when the dataset is large, auditing of dynamic data becomes costly in terms of communication and storage. This is especially the case for a large number of small data updates and data updates on multi-replica cloud data storage. In this thesis, the research problem of dynamic public data auditing in cloud is systematically investigated. After analysing the research problems, we systematically address the problems regarding secure and efficient public auditing of dynamic big data in cloud by developing, testing and publishing a series of security schemes and algorithms for secure and efficient public auditing of dynamic big data storage on cloud. Specifically, our work focuses on the following aspects: cloud internal authenticated key exchange, authorisation on third-party auditor, fine-grained update support, index verification, and efficient multi-replica public auditing of dynamic data. To the best of our knowledge, this thesis presents the first series of work to systematically analysis and to address this research problem. Experimental results and analyses show that the solutions that are presented in this thesis are suitable for auditing dynamic big data storage on cloud. Furthermore, our solutions represent significant improvements in cloud efficiency and security

    Distributed Processing in Cloud Computing

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.Cloud computing offers a wide range of resources and services through the Internet that can been used for various purposes. The rapid growth of cloud computing has exempted many companies and institutions from the burden of maintaining expensive hardware and software infrastructure. With characteristics like high scalability, availability and fault tolerance, cloud computing meet the new era needs for massive data processing at an affordable cost. In our doctoral research we intend to study, analyze, evaluate and make proposals in order to further improve the performance of cloud computing.European Cooperation in Science and Technology. COS

    Educational Technology and Related Education Conferences for June to December 2015

    Get PDF
    The 33rd edition of the conference list covers selected events that primarily focus on the use of technology in educational settings and on teaching, learning, and educational administration. Only listings until December 2015 are complete as dates, locations, or Internet addresses (URLs) were not available for a number of events held from January 2016 onward. In order to protect the privacy of individuals, only URLs are used in the listing as this enables readers of the list to obtain event information without submitting their e-mail addresses to anyone. A significant challenge during the assembly of this list is incomplete or conflicting information on websites and the lack of a link between conference websites from one year to the next

    Log File Analysis in Cloud with Apache Hadoop and Apache Spark

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.Log files are a very important set of data that can lead to useful information through proper analysis. Due to the high production rate and the number of devices and software that generate logs, the use of cloud services for log analysis is almost necessary. This paper reviews the cloud computational framework ApacheTM Hadoop R, highlights the differences and similarities between Hadoop MapReduce and Apache SparkTM and evaluates the performance of them. Log file analysis applications were developed in both frameworks and performed SQL-type queries in real Apache Web Server log files. Various measurements were taken for each application and query with different parameters in order to extract safe conclusions about the performance of the two frameworks.The authors would like to thank Okeanos the GRNET’s cloud service for the valuable resources

    Improving Automatic Content Type Identification from a Data Set

    Get PDF
    Data file layout inference refers to building the structure and determining the metadata of a text file. The text files dealt within this research are personal information records that have a consistent structure. Traditionally, if the layout structure of a text file is unknown, the human user must undergo manual labor of identifying the metadata. This is inefficient and prone to error. Content-based oracles are the current state-of-the-art automation technology that attempts to solve the layout inference problem by using databases of known metadata. This paper builds upon the information and documentation of the content-based oracles, and improves the databases of the oracles through experimentation

    An efficient MapReduce-based parallel clustering algorithm for distributed traffic subarea division

    Full text link
    Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs). Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K -Means (Par3PKM) algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K -Means and then employ a MapReduce paradigm to redesign the optimized K -Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K -Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data

    Stealth databases : ensuring user-controlled queries in untrusted cloud environments

    Get PDF
    Sensitive data is increasingly being hosted online in ubiquitous cloud storage services. Recent advances in multi-cloud service integration through provider multiplexing and data dispersion have alleviated most of the associated risks for hosting files which are retrieved by users for further processing. However, for structured data managed in databases, many issues remain, including the need to perform operations directly on the remote data to avoid costly transfers. In this paper, we motivate the need for distributed stealth databases which combine properties from structure-preserving dispersed file storage for capacity-saving increased availability with emerging work on structure-preserving encryption for on-demand increased confidentiality with controllable performance degradation. We contribute an analysis of operators executing in map-reduce or map-carry-reduce phases and derive performance statistics. Our prototype, StealthDB, demonstrates that for typical amounts of personal structured data, stealth databases are a convincing concept for taming untrusted and unsafe cloud environments

    Educational Technology and Education Conferences, January to June 2016

    Get PDF

    A Systematic Mapping Study of Cloud Resources Management and Scalability in Brokering, Scheduling, Capacity Planning and Elasticity

    Get PDF
    Cloud computing allows for resource management through various means. Some of these include brokering, scheduling, elasticity and capacity planning and these processes helps in facilitating service utilization. Determining a particular research area especially in terms of resources management and scalability in the cloud is usually a cumbersome process for a researcher, hence the need for reviews and paper surveys in identifying potential research gaps. The objective of this work was to carry out a systematic mapping study of resources management and scalability in the cloud. A systematic mapping study offers a summarized overview of studies that have been carried out in a particular area of interest. It then presents the results of such overviews graphically using a map. Although, the systematic mapping process requires less effort, the results are more coarse-grained. In this study, analysis of publications were done based on their topics, research type and contribution facets. These publications were on research works which focused on resource management, scheduling, capacity planning, scalability and elasticity. This study classified publications into research facets viz., evaluation, validation, solution, philosophical, option and experience and contribution facets based on metrics, tools, processes, models and methods used. Obtained results showed that 31.3% of the considered publications focused on evaluation based research, 19.85% on validation and 32% on processes. About 2.4% focused on metric for capacity planning, 5.6% focused on tools relating to resource management, while 5.6 and 8% of the publications were on model for capacity planning and scheduling method, respectively. Research works focusing on validating capacity planning and elasticity were the least at 2.29 and 0.76%, respectively. This study clearly identified gaps in the field of resources management and scalability in the cloud which should stimulate interest for further studies by both researchers and industry practitioners
    • …
    corecore