16,401 research outputs found

    Temporal JSON

    Get PDF
    JavaScript Object Notation (JSON) is a format for representing data. In this thesis we show how to capture the history of changes to a JSON document. Capturing the history is important in many applications, where not only the current version of a document is required, but all the previous versions. Conceptually the history can be thought of as a sequence of non-temporal JSON documents, one for each instant of time. Each document in the sequence is called a snapshot. Since changes to a document are few and infrequent, the sequence of snapshots largely duplicates a document across many time instants, so the snapshot model is (wildly) inefficient in terms of space needed to represent the history and time taken to navigate within it. A more efficient representation can be achieved by “gluing the snapshots together to form a temporal model. Data that remains unchanged across snapshots is represented only once in a temporal model. But we show that the temporal model is not a JSON document, and it is important to represent a history as JSON to ensure compatibility with web services and scripting languages that use JSON. So we describe a representational model that captures the information in a temporal model. We implement the representational model in Python and extensively experiment with the model. Our experiments show that the model is efficient

    A Graphical Conceptual Model for Conventional and Time-varying JSON Data

    Get PDF
    Abstract Today, although there is an increasing interest in temporal JSON instance documents, since they allow tracking data changes, recovering past data versions, and executing temporal queries, there is no support (data model, modelling language, method, or tool) for conceptual modelling of temporal JSON data. Moreover, even though there are some graphical editors to build JSON Schemata (like JSON Schema Editor of Altova), they do not provide any built-in support for modelling temporal aspects of JSON data. Therefore, designers of JSON-based NoSQL data stores are proceeding in an ad hoc manner when they have to model some temporal requirements. To fill this theoretical and practical gap, we propose in this paper a graphical conceptual model for time-varying JSON data, named Temporal JSON Conceptual Model (TempoJCM). To this purpose, first we define a graphical conceptual model for conventional (i.e., non temporal) JSON data, called JSON Conceptual Model (JCM), and then we extend it to support modelling of temporal aspects of JSON data. TempoJCM facilitates conceptual modelling of both conventional and temporal JSON data, in a graphical and user-friendly manner. An editor supporting TempoJCM is planned to become the user interface for temporal JSON schema design in the τJSchema framework

    tauJUpdate: A Temporal Update Language for JSON Data

    Get PDF
    Time-varying JSON data are being used and exchanged in various today's application frameworks like IoT platforms, Web services, cloud computing, online social networks, and mobile systems. However, in the state-of-the-art of JSON data management, there is neither a consensual nor a standard language for updating (i.e., inserting, modifying, and deleting) temporal JSON data, like the TSQL2 or SQL:2016 language for temporal relational data. Moreover, existing JSON-based NoSQL DBMSs (e.g., MongoDB, Couchbase, CouchDB, OrientDB, and Riak) and both commercial DBMSs (e.g., IBM DB2 12, Oracle 19c, and MS SQL Server 2019) and open-source ones (e.g., PostgreSQL 15, and MySQL 8.0) do not provide any support for maintaining temporal JSON data. Also in our previously proposed temporal JSON framework, called tauJSchema, there was no feature for temporal JSON instance update. For these reasons, we propose in this paper a temporal update language, named tauJUpdate (Temporal JUpdate), for JSON data in the tauJSchema environment. We define it as a temporal extension of our previously introduced non-temporal JSON update language, named JUpdate (JSON Update). Both the syntax and the semantics of the data modification operations of JUpdate have been extended to support temporal aspects. tauJUpdate allows (i) to specify temporal JSON updates in a user-friendly manner, and (ii) to efficiently execute them

    A Graphical Conceptual Model for Conventional and Time-varying JSON Data

    Get PDF
    Today, although there is an increasing interest in temporal JSON instance documents, since they allow tracking data changes, recovering past data versions, and executing temporal queries, there is no support (data model, modelling language, method, or tool) for conceptual modelling of temporal JSON data. Moreover, even though there are some graphical editors to build JSON Schemata (like JSON Schema Editor of Altova), they do not provide any built-in support for modelling temporal aspects of JSON data. Therefore, designers of JSON-based NoSQL data stores are proceeding in an ad hoc manner when they have to model some temporal requirements. To fill this theoretical and practical gap, we propose in this paper a graphical conceptual model for time-varying JSON data, named Temporal JSON Conceptual Model (TempoJCM). To this purpose, first we define a graphical conceptual model for conventional (i.e., non temporal) JSON data, called JSON Conceptual Model (JCM), and then we extend it to support modelling of temporal aspects of JSON data. TempoJCM facilitates conceptual modelling of both conventional and temporal JSON data, in a graphical and user-friendly manner. An editor supporting TempoJCM is planned to become the user interface for temporal JSON schema design in the tauJSchema framework

    Right HTML, Wrong JSON: Challenges in Replaying Archived Webpages Built with Client-Side Rendering

    Full text link
    Many web sites are transitioning how they construct their pages. The conventional model is where the content is embedded server-side in the HTML and returned to the client in an HTTP response. Increasingly, sites are moving to a model where the initial HTTP response contains only an HTML skeleton plus JavaScript that makes API calls to a variety of servers for the content (typically in JSON format), and then builds out the DOM client-side, more easily allowing for periodically refreshing the content in a page and allowing dynamic modification of the content. This client-side rendering, now predominant in social media platforms such as Twitter and Instagram, is also being adopted by news outlets, such as CNN.com. When conventional web archiving techniques, such as crawling with Heritrix, are applied to pages that render their content client-side, the JSON responses can become out of sync with the HTML page in which it is to be embedded, resulting in temporal violations on replay. Because the violative JSON is not directly observable in the page (i.e., in the same manner a violative embedded image is), the temporal violations can be difficult to detect. We describe how the top level CNN.com page has used client-side rendering since April 2015 and the impact this has had on web archives. Between April 24, 2015 and July 21, 2016, we found almost 15,000 mementos with a temporal violation of more than 2 days between the base CNN.com HTML and the JSON responses used to deliver the content under the main story. One way to mitigate this problem is to use browser-based crawling instead of conventional crawlers like Heritrix, but browser-based crawling is currently much slower than non-browser-based tools such as Heritrix.Comment: 20 pages, preprint version of paper accepted at the 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL

    ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

    Full text link
    Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.Comment: JCDL 2016, Newark, NJ, US
    corecore