312,040 research outputs found
NOSQL design for analytical workloads: Variability matters
Big Data has recently gained popularity and has strongly questioned relational databases as universal storage systems, especially in the presence of analytical workloads. As result, co-relational alternatives, commonly known as NOSQL (Not Only SQL) databases, are extensively used for Big Data. As the primary focus of NOSQL is on performance, NOSQL databases are directly designed at the physical level, and consequently the resulting schema is tailored to the dataset and access patterns of the problem in hand. However, we believe that NOSQL design can also benefit from traditional design approaches. In this paper we present a method to design databases for analytical workloads. Starting from the conceptual model and adopting the classical 3-phase design used for relational databases, we propose a novel design method considering the new features brought by NOSQL and encompassing relational and co-relational design altogether.Peer ReviewedPostprint (author's final draft
Big Data Dimensional Analysis
The ability to collect and analyze large amounts of data is a growing problem
within the scientific community. The growing gap between data and users calls
for innovative tools that address the challenges faced by big data volume,
velocity and variety. One of the main challenges associated with big data
variety is automatically understanding the underlying structures and patterns
of the data. Such an understanding is required as a pre-requisite to the
application of advanced analytics to the data. Further, big data sets often
contain anomalies and errors that are difficult to know a priori. Current
approaches to understanding data structure are drawn from the traditional
database ontology design. These approaches are effective, but often require too
much human involvement to be effective for the volume, velocity and variety of
data encountered by big data systems. Dimensional Data Analysis (DDA) is a
proposed technique that allows big data analysts to quickly understand the
overall structure of a big dataset, determine anomalies. DDA exploits
structures that exist in a wide class of data to quickly determine the nature
of the data and its statical anomalies. DDA leverages existing schemas that are
employed in big data databases today. This paper presents DDA, applies it to a
number of data sets, and measures its performance. The overhead of DDA is low
and can be applied to existing big data systems without greatly impacting their
computing requirements.Comment: From IEEE HPEC 201
Exploiting microvariation: How to make the best of your incomplete data
n this article we discuss the use of big corpuses or databases as a first step for qualitative analysis of linguistic data. We concentrate on ASIt, the Syntactic Atlas of Italy, and take into consideration the different types of dialectal data that can be collected from similar corpora and databases. We analyze all the methodological problems derived from the necessary compromise between the strict requirements imposed by a scientific inquiry and the management of big amounts of data. As a possible solution, we propose that the type of variation is per se a tool to derive meaningful generalizations. To implement this idea, we examine three different types of variation patterns that can be used in the study of morpho-syntax: the geographical distribution of properties (and their total or partial overlapping, or complementary distribution), the so-called leopard spots variation, and the lexical variation index, which can be used to determine the internal complexity of functional items
Big Data
Big data implies performing computation and database operations for massive amounts of data, remotely from the data owner�s enterprise .Since a key value proposition of big data is access to data from multiple and diverse domains, security and privacy will play a very important role in big data research and technology. Making effective use of big data requires access from any domain to data in that domain, or any other domain it is authorized to access. Big data to date has been all about the technologies-NOSQL databases, HADOOP in memory processing etc. However at the end of the day, Big data is about how to create value from data
Evaluation of Hadoop/Mapreduce Framework Migration Tools
In distributed systems, database migration is not an easy task. Companies will encounter challenges moving data including legacy data to the big data platform. This paper reviews some tools for migrating from traditional databases to the big data platform and thus suggests a model, based on the review
Big Data Model Simulation on a Graph Database for Surveillance in Wireless Multimedia Sensor Networks
Sensors are present in various forms all around the world such as mobile
phones, surveillance cameras, smart televisions, intelligent refrigerators and
blood pressure monitors. Usually, most of the sensors are a part of some other
system with similar sensors that compose a network. One of such networks is
composed of millions of sensors connect to the Internet which is called
Internet of things (IoT). With the advances in wireless communication
technologies, multimedia sensors and their networks are expected to be major
components in IoT. Many studies have already been done on wireless multimedia
sensor networks in diverse domains like fire detection, city surveillance,
early warning systems, etc. All those applications position sensor nodes and
collect their data for a long time period with real-time data flow, which is
considered as big data. Big data may be structured or unstructured and needs to
be stored for further processing and analyzing. Analyzing multimedia big data
is a challenging task requiring a high-level modeling to efficiently extract
valuable information/knowledge from data. In this study, we propose a big
database model based on graph database model for handling data generated by
wireless multimedia sensor networks. We introduce a simulator to generate
synthetic data and store and query big data using graph model as a big
database. For this purpose, we evaluate the well-known graph-based NoSQL
databases, Neo4j and OrientDB, and a relational database, MySQL.We have run a
number of query experiments on our implemented simulator to show that which
database system(s) for surveillance in wireless multimedia sensor networks is
efficient and scalable
A Mathematical Theory of Big Data
This article presents a cardinality approach to big data, a fuzzy logicbased approach to big data, a similarity-based approach to big data, and a logical approach to the marketing strategy of social networking services. All these together constitute a mathematical theory of big data. This article also examines databases with infinite attributes. The research results reveal that relativity and infinity are two characteristics of big data. The relativity of big data is based on the theory of fuzzy sets. The relativity of big data leads to the continuum from small data to big data, big data-driven small data analytics to become statistical significance. The infinity of big data is based on the calculus and cardinality theory. The infinity of big data leads to the infinite similarity of big data. The proposed theory in this article might facilitate the mathematical research and development of big data, big data analytics, big data computing, and data science with applications in intelligent business analytics and business intelligence
- …