36,047 research outputs found
SEARS: Space Efficient And Reliable Storage System in the Cloud
Today's cloud storage services must offer storage reliability and fast data
retrieval for large amount of data without sacrificing storage cost. We present
SEARS, a cloud-based storage system which integrates erasure coding and data
deduplication to support efficient and reliable data storage with fast user
response time. With proper association of data to storage server clusters,
SEARS provides flexible mixing of different configurations, suitable for
real-time and archival applications.
Our prototype implementation of SEARS over Amazon EC2 shows that it
outperforms existing storage systems in storage efficiency and file retrieval
time. For 3 MB files, SEARS delivers retrieval time of s compared to
s with existing systems.Comment: 4 pages, IEEE LCN 201
Locality-aware scientific workflow engine for fast-evolving spatiotemporal sensor data, A
2017 Spring.Includes bibliographical references.Discerning knowledge from voluminous data involves a series of data manipulation steps. Scientists typically compose and execute workflows for these steps using scientific workflow management systems (SWfMSs). SWfMSs have been developed for several research communities including but not limited to bioinformatics, biology, astronomy, computational science, and physics. Parallel execution of workflows has been widely employed in SWfMSs by exploiting the storage and computing resources of grid and cloud services. However, none of these systems have been tailored for the needs of spatiotemporal analytics on real-time sensor data with high arrival rates. This thesis demonstrates the development and evaluation of a target-oriented workflow model that enables a user to specify dependencies among the workflow components, including data availability. The underlying spatiotemporal data dispersion and indexing scheme provides fast data search and retrieval to plan and execute computations comprising the workflow. This work includes a scheduling algorithm that targets minimizing data movement across machines while ensuring fair and efficient resource allocation among multiple users. The study includes empirical evaluations performed on the Google cloud
Design Architecture-Based on Web Server and Application Cluster in Cloud Environment
Cloud has been a computational and storage solution for many data centric
organizations. The problem today those organizations are facing from the cloud
is in data searching in an efficient manner. A framework is required to
distribute the work of searching and fetching from thousands of computers. The
data in HDFS is scattered and needs lots of time to retrieve. The major idea is
to design a web server in the map phase using the jetty web server which will
give a fast and efficient way of searching data in MapReduce paradigm. For real
time processing on Hadoop, a searchable mechanism is implemented in HDFS by
creating a multilevel index in web server with multi-level index keys. The web
server uses to handle traffic throughput. By web clustering technology we can
improve the application performance. To keep the work down, the load balancer
should automatically be able to distribute load to the newly added nodes in the
server
- …