11,025 research outputs found

    Cloud Storage Performance and Security Analysis with Hadoop and GridFTP

    Get PDF
    Even though cloud server has been around for a few years, most of the web hosts today have not converted to cloud yet. If the purpose of the cloud server is distributing and storing files on the internet, FTP servers were much earlier than the cloud. FTP server is sufficient to distribute content on the internet. Therefore, is it worth to shift from FTP server to cloud server? The cloud storage provider declares high durability and availability for their users, and the ability to scale up for more storage space easily could save users tons of money. However, does it provide higher performance and better security features? Hadoop is a very popular platform for cloud computing. It is free software under Apache License. It is written in Java and supports large data processing in a distributed environment. Characteristics of Hadoop include partitioning of data, computing across thousands of hosts, and executing application computations in parallel. Hadoop Distributed File System allows rapid data transfer up to thousands of terabytes, and is capable of operating even in the case of node failure. GridFTP supports high-speed data transfer for wide-area networks. It is based on the FTP and features multiple data channels for parallel transfers. This report describes the technology behind HDFS and enhancement to the Hadoop security features with Kerberos. Based on data transfer performance and security features of HDFS and GridFTP server, we can decide if we should replace GridFTP server with HDFS. According to our experiment result, we conclude that GridFTP server provides better throughput than HDFS, and Kerberos has minimal impact to HDFS performance. We proposed a solution which users authenticate with HDFS first, and get the file from HDFS server to the client using GridFTP

    Only Aggressive Elephants are Fast Elephants

    Full text link
    Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We make elephants aggressive; only this will make them very fast. We propose HAIL (Hadoop Aggressive Indexing Library), an enhancement of HDFS and Hadoop MapReduce that dramatically improves runtimes of several classes of MapReduce jobs. HAIL changes the upload pipeline of HDFS in order to create different clustered indexes on each data block replica. An interesting feature of HAIL is that we typically create a win-win situation: we improve both data upload to HDFS and the runtime of the actual Hadoop MapReduce job. In terms of data upload, HAIL improves over HDFS by up to 60% with the default replication factor of three. In terms of query execution, we demonstrate that HAIL runs up to 68x faster than Hadoop. In our experiments, we use six clusters including physical and EC2 clusters of up to 100 nodes. A series of scalability experiments also demonstrates the superiority of HAIL.Comment: VLDB201

    Lustre, Hadoop, Accumulo

    Full text link
    Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest and the most challenging data storage problems. There have been many ad-hoc comparisons of these technologies. This paper describes the foundational principles of each technology, provides simple models for assessing their capabilities, and compares the various technologies on a hypothetical common cluster. These comparisons indicate that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general purpose workloads. Hadoop can provide 4x greater read bandwidth on special purpose workloads. Accumulo provides 10,000x lower latency on random lookups than either Lustre or Hadoop but Accumulo's bulk bandwidth is 10x less. Significant recent work has been done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo to be combined in different ways.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing conference, Waltham, MA, 201

    Agent based modelling helps in understanding the rules by which fibroblasts support keratinocyte colony formation

    Get PDF
    Background: Autologous keratincoytes are routinely expanded using irradiated mouse fibroblasts and bovine serum for clinical use. With growing concerns about the safety of these xenobiotic materials, it is desirable to culture keratinocytes in media without animal derived products. An improved understanding of epithelial/mesenchymal interactions could assist in this. Methodology/Principal Findings: A keratincyte/fibroblast o-culture model was developed by extending an agent-based keratinocyte colony formation model to include the response of keratinocytes to both fibroblasts and serum. The model was validated by comparison of the in virtuo and in vitro multicellular behaviour of keratinocytes and fibroblasts in single and co-culture in Greens medium. To test the robustness of the model, several properties of the fibroblasts were changed to investigate their influence on the multicellular morphogenesis of keratinocyes and fibroblasts. The model was then used to generate hypotheses to explore the interactions of both proliferative and growth arrested fibroblasts with keratinocytes. The key predictions arising from the model which were confirmed by in vitro experiments were that 1) the ratio of fibroblasts to keratinocytes would critically influence keratinocyte colony expansion, 2) this ratio needed to be optimum at the beginning of the co-culture, 3) proliferative fibroblasts would be more effective than irradiated cells in expanding keratinocytes and 4) in the presence of an adequate number of fibroblasts, keratinocyte expansion would be independent of serum. Conclusions: A closely associated computational and biological approach is a powerful tool for understanding complex biological systems such as the interactions between keratinocytes and fibroblasts. The key outcome of this study is the finding that the early addition of a critical ratio of proliferative fibroblasts can give rapid keratinocyte expansion without the use of irradiated mouse fibroblasts and bovine serum
    • …
    corecore