2,661 research outputs found
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Auto-Pipe and the X Language: A Toolset and Language for the Simulation, Analysis, and Synthesis of Heterogeneous Pipelined Architectures, Master\u27s Thesis, August 2006
Pipelining an algorithmis a popularmethod of increasing the performance of many computation-intensive applications. Often, one wants to form pipelines composed mostly of commonly used simple building blocks such as DSP components, simple math operations, encryption, or pattern matching stages. Additionally, one may desire to map these processing tasks to different computational resources based on their relative performance attributes (e.g., DSP operations on an FPGA). Auto-Pipe is composed of the X Language, a flexible interface language that aids the description of complex dataflow topologies (including pipelines); X-Com, a compiler for the X Language; X-Sim, a tool for modeling pipelined architectures based on measured, simulated, or derived task and communications behavior; X-Opt, a tool to optimize X applications under various metrics; and X-Dep, a tool for the automatic deployment of X-Com- or X-Sim-generated applications to real or simulated devices. This thesis presents an overview of the Auto-Pipe system, the design and use of the X Language, and an implementation of X-Com. Applications developed using the X Language are presented which demonstrate the effectiveness of describing algorithms using X, and the effectiveness of the Auto-Pipe development flow in analyzing and improving the performance of an application
Recommended from our members
Effective Performance Analysis and Debugging
Performance is once again a first-class concern. Developers can no longer wait for the next generation of processors to automatically optimize their software. Unfortunately, existing techniques for performance analysis and debugging cannot cope with complex modern hardware, concurrent software, or latency-sensitive software services.
While processor speeds have remained constant, increasing transistor counts have allowed architects to increase processor complexity. This complexity often improves performance, but the benefits can be brittle; small changes to a program’s code, inputs, or execution environment can dramatically change performance, resulting in unpredictable performance in deployed software and complicating performance evaluation and debugging. Developers seeking to improve performance must resort to manual performance tuning for large performance gains. Software profilers are meant to guide developers to important code, but conventional profilers do not produce actionable information for concurrent applications. These profilers report where a program spends its time, not where optimizations will yield performance improvements. Furthermore, latency is a critical measure of performance for software services and interactive applications, but conventional profilers measure only throughput. Many performance issues appear only when a system is under high load, but generating this load in development is often impossible. Developers need to identify and mitigate scalability issues before deploying software, but existing tools offer developers little or no assistance.
In this dissertation, I introduce an empirically-driven approach to performance analysis and debugging. I present three systems for performance analysis and debugging. Stabilizer mitigates the performance variability that is inherent in modern processors, enabling both predictable performance in deployment and statistically sound performance evaluation. Coz conducts performance experiments using virtual speedups to create the effect of an optimization in a running application. This approach accurately predicts the effect of hypothetical optimizations, guiding developers to code where optimizations will have the largest effect. Amp allows developers to evaluate system scalability using load amplification to create the effect of high load in a testing environment. In combination, Amp and Coz allow developers to pinpoint code where manual optimizations will improve the scalability of their software
- …