1,576 research outputs found
PABED A Tool for Big Education Data Analysis
Cloud computing and big data have risen to become the most popular
technologies of the modern world. Apparently, the reason behind their immense
popularity is their wide range of applicability as far as the areas of interest
are concerned. Education and research remain one of the most obvious and
befitting application areas. This research paper introduces a big data
analytics tool, PABED Project Analyzing Big Education Data, for the education
sector that makes use of cloud-based technologies. This tool is implemented
using Google BigQuery and R programming language and allows comparison of
undergraduate enrollment data for different academic years. Although, there are
many proposed applications of big data in education, there is a lack of tools
that can actualize the concept into practice. PABED is an effort in this
direction. The implementation and testing details of the project have been
described in this paper. This tool validates the use of cloud computing and big
data technologies in education and shall head start development of more
sophisticated educational intelligence tools
Big Data Analytics: A Survey
Internet-based programs and communication techniques have become widely used and respected in the IT industry recently. A persistent source of "big data," or data that is enormous in volume, diverse in type, and has a complicated multidimensional structure, is internet applications and communications. Today, several measures are routinely performed with no assurance that any of them will be helpful in understanding the phenomenon of interest in an era of automatic, large-scale data collection. Online transactions that involve buying, selling, or even investing are all examples of e-commerce. As a result, they generate data that has a complex structure and a high dimension. The usual data storage techniques cannot handle those enormous volumes of data. There is a lot of work being done to find ways to minimize the dimensionality of big data in order to provide analytics reports that are even more accurate and data visualizations that are more interesting. As a result, the purpose of this survey study is to give an overview of big data analytics along with related problems and issues that go beyond technology
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture
We present the architecture behind Twitter's real-time related query
suggestion and spelling correction service. Although these tasks have received
much attention in the web search literature, the Twitter context introduces a
real-time "twist": after significant breaking news events, we aim to provide
relevant results within minutes. This paper provides a case study illustrating
the challenges of real-time data processing in the era of "big data". We tell
the story of how our system was built twice: our first implementation was built
on a typical Hadoop-based analytics stack, but was later replaced because it
did not meet the latency requirements necessary to generate meaningful
real-time results. The second implementation, which is the system deployed in
production, is a custom in-memory processing engine specifically designed for
the task. This experience taught us that the current typical usage of Hadoop as
a "big data" platform, while great for experimentation, is not well suited to
low-latency processing, and points the way to future work on data analytics
platforms that can handle "big" as well as "fast" data
- …