187,846 research outputs found
Evaluating Cassandra as a manager of large file sets
All companies developing their business on the Web, not only giants like Google or Facebook but also small com- panies focused on niche markets, face scalability issues in data management. The case study of this paper is the content management systems for classified or commercial advertise-ments on the Web. The data involved has a very significant growth rate and a read-intensive access pattern with a reduced update rate. Typically, data is stored in traditional file systems hosted on dedicated servers or Storage Area Network devices due to the generalization and ease of use of file systems. However, this ease in implementation and usage has a disadvantage: the centralized nature of these systems leads to availability, elasticity and scalability problems. The scenario under study, undemanding in terms of the system's consistency and with a simple interaction model, is suitable to a distributed database, such as Cassandra, conceived precisely to dynamically handle large volumes of data. In this paper, we analyze the suitability of Cassandra as a substitute for file systems in content management systems. The evaluation, conducted using real data from a produc- tion system, shows that using Cassandra, one can easily get horizontal scalability of storage, redundancy across multiple independent nodes, and load distribution imposed by the periodic activities of safeguarding data, while ensuring a comparable performance to that of a file system.(undefined
Design Architecture-Based on Web Server and Application Cluster in Cloud Environment
Cloud has been a computational and storage solution for many data centric
organizations. The problem today those organizations are facing from the cloud
is in data searching in an efficient manner. A framework is required to
distribute the work of searching and fetching from thousands of computers. The
data in HDFS is scattered and needs lots of time to retrieve. The major idea is
to design a web server in the map phase using the jetty web server which will
give a fast and efficient way of searching data in MapReduce paradigm. For real
time processing on Hadoop, a searchable mechanism is implemented in HDFS by
creating a multilevel index in web server with multi-level index keys. The web
server uses to handle traffic throughput. By web clustering technology we can
improve the application performance. To keep the work down, the load balancer
should automatically be able to distribute load to the newly added nodes in the
server
- …