16,401 research outputs found
Temporal JSON
JavaScript Object Notation (JSON) is a format for representing data. In this thesis we show how to capture the history of changes to a JSON document. Capturing the history is important in many applications, where not only the current version of a document is required, but all the previous versions. Conceptually the history can be thought of as a sequence of non-temporal JSON documents, one for each instant of time. Each document in the sequence is called a snapshot. Since changes to a document are few and infrequent, the sequence of snapshots largely duplicates a document across many time instants, so the snapshot model is (wildly) inefficient in terms of space needed to represent the history and time taken to navigate within it. A more efficient representation can be achieved by “gluing the snapshots together to form a temporal model. Data that remains unchanged across snapshots is represented only once in a temporal model. But we show that the temporal model is not a JSON document, and it is important to represent a history as JSON to ensure compatibility with web services and scripting languages that use JSON. So we describe a representational model that captures the information in a temporal model. We implement the representational model in Python and extensively experiment with the model. Our experiments show that the model is efficient
A Graphical Conceptual Model for Conventional and Time-varying JSON Data
Abstract Today, although there is an increasing interest in temporal JSON instance documents, since they allow tracking data changes, recovering past data versions, and executing temporal queries, there is no support (data model, modelling language, method, or tool) for conceptual modelling of temporal JSON data. Moreover, even though there are some graphical editors to build JSON Schemata (like JSON Schema Editor of Altova), they do not provide any built-in support for modelling temporal aspects of JSON data. Therefore, designers of JSON-based NoSQL data stores are proceeding in an ad hoc manner when they have to model some temporal requirements. To fill this theoretical and practical gap, we propose in this paper a graphical conceptual model for time-varying JSON data, named Temporal JSON Conceptual Model (TempoJCM). To this purpose, first we define a graphical conceptual model for conventional (i.e., non temporal) JSON data, called JSON Conceptual Model (JCM), and then we extend it to support modelling of temporal aspects of JSON data. TempoJCM facilitates conceptual modelling of both conventional and temporal JSON data, in a graphical and user-friendly manner. An editor supporting TempoJCM is planned to become the user interface for temporal JSON schema design in the τJSchema framework
tauJUpdate: A Temporal Update Language for JSON Data
Time-varying JSON data are being used and exchanged in
various today's application frameworks like IoT platforms, Web services,
cloud computing, online social networks, and mobile systems. However,
in the state-of-the-art of JSON data management, there is neither a
consensual nor a standard language for updating (i.e., inserting, modifying,
and deleting) temporal JSON data, like the TSQL2 or SQL:2016
language for temporal relational data. Moreover, existing JSON-based
NoSQL DBMSs (e.g., MongoDB, Couchbase, CouchDB, OrientDB, and
Riak) and both commercial DBMSs (e.g., IBM DB2 12, Oracle 19c, and
MS SQL Server 2019) and open-source ones (e.g., PostgreSQL 15, and
MySQL 8.0) do not provide any support for maintaining temporal JSON
data. Also in our previously proposed temporal JSON framework, called
tauJSchema, there was no feature for temporal JSON instance update. For
these reasons, we propose in this paper a temporal update language,
named tauJUpdate (Temporal JUpdate), for JSON data in the tauJSchema
environment. We define it as a temporal extension of our previously introduced
non-temporal JSON update language, named JUpdate (JSON
Update). Both the syntax and the semantics of the data modification
operations of JUpdate have been extended to support temporal aspects.
tauJUpdate allows (i) to specify temporal JSON updates in a user-friendly
manner, and (ii) to efficiently execute them
A Graphical Conceptual Model for Conventional and Time-varying JSON Data
Today, although there is an increasing interest in temporal JSON instance documents, since they allow tracking data changes, recovering past data versions, and executing temporal queries, there is no support (data model, modelling language, method, or tool) for conceptual modelling of temporal JSON data. Moreover, even though there are some graphical editors to build JSON Schemata (like JSON Schema Editor of Altova), they do not provide any built-in support for modelling temporal aspects of JSON data. Therefore, designers of JSON-based NoSQL data stores are proceeding in an ad hoc manner when they have to model some temporal requirements. To fill this theoretical and practical gap, we propose in this paper a graphical conceptual model for time-varying JSON data, named Temporal JSON Conceptual Model (TempoJCM). To this purpose, first we define a graphical conceptual model for conventional (i.e., non temporal) JSON data, called JSON Conceptual Model (JCM), and then we extend it to support modelling of temporal aspects of JSON data. TempoJCM facilitates conceptual modelling of both conventional and temporal JSON data, in a graphical and user-friendly manner. An editor supporting TempoJCM is planned to become the user interface for temporal JSON schema design in the tauJSchema framework
Right HTML, Wrong JSON: Challenges in Replaying Archived Webpages Built with Client-Side Rendering
Many web sites are transitioning how they construct their pages. The
conventional model is where the content is embedded server-side in the HTML and
returned to the client in an HTTP response. Increasingly, sites are moving to a
model where the initial HTTP response contains only an HTML skeleton plus
JavaScript that makes API calls to a variety of servers for the content
(typically in JSON format), and then builds out the DOM client-side, more
easily allowing for periodically refreshing the content in a page and allowing
dynamic modification of the content. This client-side rendering, now
predominant in social media platforms such as Twitter and Instagram, is also
being adopted by news outlets, such as CNN.com. When conventional web archiving
techniques, such as crawling with Heritrix, are applied to pages that render
their content client-side, the JSON responses can become out of sync with the
HTML page in which it is to be embedded, resulting in temporal violations on
replay. Because the violative JSON is not directly observable in the page
(i.e., in the same manner a violative embedded image is), the temporal
violations can be difficult to detect. We describe how the top level CNN.com
page has used client-side rendering since April 2015 and the impact this has
had on web archives. Between April 24, 2015 and July 21, 2016, we found almost
15,000 mementos with a temporal violation of more than 2 days between the base
CNN.com HTML and the JSON responses used to deliver the content under the main
story. One way to mitigate this problem is to use browser-based crawling
instead of conventional crawlers like Heritrix, but browser-based crawling is
currently much slower than non-browser-based tools such as Heritrix.Comment: 20 pages, preprint version of paper accepted at the 2023 ACM/IEEE
Joint Conference on Digital Libraries (JCDL
ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation
Web archives are a valuable resource for researchers of various disciplines.
However, to use them as a scholarly source, researchers require a tool that
provides efficient access to Web archive data for extraction and derivation of
smaller datasets. Besides efficient access we identify five other objectives
based on practical researcher needs such as ease of use, extensibility and
reusability.
Towards these objectives we propose ArchiveSpark, a framework for efficient,
distributed Web archive processing that builds a research corpus by working on
existing and standardized data formats commonly held by Web archiving
institutions. Performance optimizations in ArchiveSpark, facilitated by the use
of a widely available metadata index, result in significant speed-ups of data
processing. Our benchmarks show that ArchiveSpark is faster than alternative
approaches without depending on any additional data stores while improving
usability by seamlessly integrating queries and derivations with external
tools.Comment: JCDL 2016, Newark, NJ, US
- …