11,025 research outputs found
Cloud Storage Performance and Security Analysis with Hadoop and GridFTP
Even though cloud server has been around for a few years, most of the web hosts today have not converted to cloud yet. If the purpose of the cloud server is distributing and storing files on the internet, FTP servers were much earlier than the cloud. FTP server is sufficient to distribute content on the internet. Therefore, is it worth to shift from FTP server to cloud server? The cloud storage provider declares high durability and availability for their users, and the ability to scale up for more storage space easily could save users tons of money. However, does it provide higher performance and better security features? Hadoop is a very popular platform for cloud computing. It is free software under Apache License. It is written in Java and supports large data processing in a distributed environment. Characteristics of Hadoop include partitioning of data, computing across thousands of hosts, and executing application computations in parallel. Hadoop Distributed File System allows rapid data transfer up to thousands of terabytes, and is capable of operating even in the case of node failure. GridFTP supports high-speed data transfer for wide-area networks. It is based on the FTP and features multiple data channels for parallel transfers. This report describes the technology behind HDFS and enhancement to the Hadoop security features with Kerberos. Based on data transfer performance and security features of HDFS and GridFTP server, we can decide if we should replace GridFTP server with HDFS. According to our experiment result, we conclude that GridFTP server provides better throughput than HDFS, and Kerberos has minimal impact to HDFS performance. We proposed a solution which users authenticate with HDFS first, and get the file from HDFS server to the client using GridFTP
Only Aggressive Elephants are Fast Elephants
Yellow elephants are slow. A major reason is that they consume their inputs
entirely before responding to an elephant rider's orders. Some clever riders
have trained their yellow elephants to only consume parts of the inputs before
responding. However, the teaching time to make an elephant do that is high. So
high that the teaching lessons often do not pay off. We take a different
approach. We make elephants aggressive; only this will make them very fast. We
propose HAIL (Hadoop Aggressive Indexing Library), an enhancement of HDFS and
Hadoop MapReduce that dramatically improves runtimes of several classes of
MapReduce jobs. HAIL changes the upload pipeline of HDFS in order to create
different clustered indexes on each data block replica. An interesting feature
of HAIL is that we typically create a win-win situation: we improve both data
upload to HDFS and the runtime of the actual Hadoop MapReduce job. In terms of
data upload, HAIL improves over HDFS by up to 60% with the default replication
factor of three. In terms of query execution, we demonstrate that HAIL runs up
to 68x faster than Hadoop. In our experiments, we use six clusters including
physical and EC2 clusters of up to 100 nodes. A series of scalability
experiments also demonstrates the superiority of HAIL.Comment: VLDB201
Lustre, Hadoop, Accumulo
Data processing systems impose multiple views on data as it is processed by
the system. These views include spreadsheets, databases, matrices, and graphs.
There are a wide variety of technologies that can be used to store and process
data through these different steps. The Lustre parallel file system, the Hadoop
distributed file system, and the Accumulo database are all designed to address
the largest and the most challenging data storage problems. There have been
many ad-hoc comparisons of these technologies. This paper describes the
foundational principles of each technology, provides simple models for
assessing their capabilities, and compares the various technologies on a
hypothetical common cluster. These comparisons indicate that Lustre provides 2x
more storage capacity, is less likely to loose data during 3 simultaneous drive
failures, and provides higher bandwidth on general purpose workloads. Hadoop
can provide 4x greater read bandwidth on special purpose workloads. Accumulo
provides 10,000x lower latency on random lookups than either Lustre or Hadoop
but Accumulo's bulk bandwidth is 10x less. Significant recent work has been
done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo
to be combined in different ways.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing
conference, Waltham, MA, 201
Agent based modelling helps in understanding the rules by which fibroblasts support keratinocyte colony formation
Background: Autologous keratincoytes are routinely expanded using irradiated mouse fibroblasts and bovine serum for clinical use. With growing concerns about the safety of these xenobiotic materials, it is desirable to culture keratinocytes in media without animal derived products. An improved understanding of epithelial/mesenchymal interactions could assist in this.
Methodology/Principal Findings: A keratincyte/fibroblast o-culture model was developed by extending an agent-based keratinocyte colony formation model to include the response of keratinocytes to both fibroblasts and serum. The model was validated by comparison of the in virtuo and in vitro multicellular behaviour of keratinocytes and fibroblasts in single and co-culture in Greens medium. To test the robustness of the model, several properties of the fibroblasts were changed to investigate their influence on the multicellular morphogenesis of keratinocyes and fibroblasts. The model was then used to generate hypotheses to explore the interactions of both proliferative and growth arrested fibroblasts with keratinocytes. The key predictions arising from the model which were confirmed by in vitro experiments were that 1) the ratio of fibroblasts to keratinocytes would critically influence keratinocyte colony expansion, 2) this ratio needed to be optimum at the beginning of the co-culture, 3) proliferative fibroblasts would be more effective than irradiated cells in expanding keratinocytes and 4) in the presence of an adequate number of fibroblasts, keratinocyte expansion would be independent of serum.
Conclusions: A closely associated computational and biological approach is a powerful tool for understanding complex biological systems such as the interactions between keratinocytes and fibroblasts. The key outcome of this study is the finding that the early addition of a critical ratio of proliferative fibroblasts can give rapid keratinocyte expansion without the use of irradiated mouse fibroblasts and bovine serum
- …