1,080 research outputs found
Schema Inference for Massive JSON Datasets
In the recent years JSON affirmed as a very popular data format for representing massive data collections. JSON data collections are usually schemaless. While this ensures sev- eral advantages, the absence of schema information has im- portant negative consequences: the correctness of complex queries and programs cannot be statically checked, users cannot rely on schema information to quickly figure out the structural properties that could speed up the formulation of correct queries, and many schema-based optimizations are not possible.
In this paper we deal with the problem of inferring a schema from massive JSON datasets. We first identify a JSON type language which is simple and, at the same time, expressive enough to capture irregularities and to give com- plete structural information about input data. We then present our main contribution, which is the design of a schema inference algorithm, its theoretical study, and its implemen- tation based on Spark, enabling reasonable schema infer- ence time for massive collections. Finally, we report about an experimental analysis showing the effectiveness of our ap- proach in terms of execution time, precision, and conciseness of inferred schemas, and scalability
A Type System for Interactive JSON Schema Inference (Extended Abstract)
In this paper we present the first JSON type system that provides the possibility of inferring a schema by adopting different levels of precision/succinctness for different parts of the dataset, under user control. This feature gives the data analyst the possibility to have detailed schemas for parts of the data of greater interest, while more succinct schema is provided for other parts, and the decision can be changed as many times as needed, in order to explore the schema in a gradual fashion, moving the focus to different parts of the collection, without the need of reprocessing data and by only performing type rewriting operations on the most precise schema
JSONoid: Monoid-based Enrichment for Configurable and Scalable Data-Driven Schema Discovery
Schema discovery is an important aspect to working with data in formats such
as JSON. Unlike relational databases, JSON data sets often do not have
associated structural information. Consumers of such datasets are often left to
browse through data in an attempt to observe commonalities in structure across
documents to construct suitable code for data processing. However, this process
is time-consuming and error-prone. Existing distributed approaches to mining
schemas present a significant usability advantage as they provide useful
metadata for large data sources. However, depending on the data source, ad hoc
queries for estimating other properties to help with crafting an efficient data
pipeline can be expensive. We propose JSONoid, a distributed schema discovery
process augmented with additional metadata in the form of monoid data
structures that are easily maintainable in a distributed setting. JSONoid
subsumes several existing approaches to distributed schema discovery with
similar performance. Our approach also adds significant useful additional
information about data values to discovered schemas with linear scalability
Big Data Mining and Semantic Technologies: Challenges and Opportunities
Big data a term coined due to the explosion in the quantity and diversity of high frequency digital data which is having a potential for valuable insights has drawn the most attention in the area of research and development. Converting big data to actionable insights requires depth understanding of big data, its characteristics, challenges and current technological trends. A rise of big data is changing the existing data storage, management, processing and analytical mechanisms and leads to the new architecture/ecosystems to handle big data applications. This paper covers finding of our research study about big data characteristic, various types of analysis associated with it and basic big data types. First, we are presenting the big data study from data mining and analysis perspective and discuss the challenges and next, we present the result of research study on meaningful use of big data in the context of semantic technologies. Moreover, we discuss various case studies related to social media analysis and recent development trends to identify potential research directions for big data with semantic technologies.
DOI: 10.17762/ijritcc2321-8169.150711
- …