Search CORE

86 research outputs found

XStamps: A multiversion timestamps concurrency control protocol for XML data

Author: LIM Ee Peng
NG Wee-Keong
WIN Khin-Myo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2003
Field of study

Institutional Knowledge at Singapore Management University

Indexing Temporal XML documents

Author: A MENDELZON
A VAISMAN
F RIZZOLO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Crossref

Doctor of Philosophy

Author: Le Wangchao
Publication venue: University of Utah
Publication date: 01/12/2013
Field of study

dissertationLinked data are the de-facto standard in publishing and sharing data on the web. To date, we have been inundated with large amounts of ever-increasing linked data in constantly evolving structures. The proliferation of the data and the need to access and harvest knowledge from distributed data sources motivate us to revisit several classic problems in query processing and query optimization. The problem of answering queries over views is commonly encountered in a number of settings, including while enforcing security policies to access linked data, or when integrating data from disparate sources. We approach this problem by efficiently rewriting queries over the views to equivalent queries over the underlying linked data, thus avoiding the costs entailed by view materialization and maintenance. An outstanding problem of query rewriting is the number of rewritten queries is exponential to the size of the query and the views, which motivates us to study problem of multiquery optimization in the context of linked data. Our solutions are declarative and make no assumption for the underlying storage, i.e., being store-independent. Unlike relational and XML data, linked data are schema-less. While tracking the evolution of schema for linked data is hard, keyword search is an ideal tool to perform data integration. Existing works make crippling assumptions for the data and hence fall short in handling massive linked data with tens to hundreds of millions of facts. Our study for keyword search on linked data brought together the classical techniques in the literature and our novel ideas, which leads to much better query efficiency and quality of the results. Linked data also contain rich temporal semantics. To cope with the ever-increasing data, we have investigated how to partition and store large temporal or multiversion linked data for distributed and parallel computation, in an effort to achieve load-balancing to support scalable data analytics for massive linked data

The University of Utah: J. Willard Marriott Digital Library

Temporal modelling and management of normative documents in XML format

Author: Ahn
Amagasa
Andersen
Bertino
C.E.
Cao
Chawathe
Chien
Chien
Combi
Currim
Fabio Grandi
Federica Mandreoli
Gadia
Gao
Grandi
Grandi
Jensen
Jones
Kane
Kim
Nørvåg
Palmirani
Paolo Tiberio
Pizzorusso
Vitali
Wang
Wong
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Improving Software Project Health Using Machine Learning

Author: Partachi Profir-Petru
Publication venue: UCL (University College London)
Publication date: 28/12/2020
Field of study

In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools

UCL Discovery

XIST: An XML Index Selection Tool

Author: H.V. Jagadish
P. Valduriez
S.-Y. Chien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Indexing Temporal XML documents

Author: Chien
Chien
Chung
Gao
Kaushik
Kaushik
Qun
Rizzolo
Salzberg
Santoro
Vaisman
World Wide Web Consortium
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Crossref

Parallel text retrieval on temporally versioned document collections

Author: Gür Özlem
Publication venue: Bilkent University
Publication date: 01/01/2008
Field of study

Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.Thesis (Master's) -- Bilkent University, 2008.Includes bibliographical references leaves 57-61.In recent years, as the access to the Internet is getting easier and cheaper, the amount and the rate of change of the online data presented to the Internet users are increasing at an astonishing rate. This ever-changing nature of the Internet causes an ever-decaying and replenishing information collection where newly presented data generally replaces old and sometimes valuable data. There are many recent studies aiming to preserve this valuable temporal data and size and number of temporal Web data collections are increasing. We believe that soon, information retrieval systems responding to time-range queries in a reasonable amount of time will emerge as a means of accessing vast temporal Web data collections. Due to tremendous size of temporal data and excessive number of query submissions per unit time, temporal information retrieval systems will have to utilize parallelism as much as possible. In parallel systems, in order to index collections using inverted indices, a strategy on distribution of the inverted indices has to be followed. In this study, the feasibility of time-based partitioned versus term-based partitioned temporalweb inverted-indices is analyzed and a novel parallel text retrieval system for answering temporal web queries is implemented considering the number of queries processed in unit time. Moreover, we investigate the performance of skip-list based and randomized-select based ranking schemes on time-based and termbased partitioned inverted indexes. Finally, we compare time-balanced and sizebalanced time-based partitioning schemes. The experimental results at small to medium number of processors reveal that for medium to long length queries time-based partitioning works better.Gür, ÖzlemM.S

Bilkent University Institutional Repository

Cost models for overlapping and multiversion structures

Author: Bercken J. V. D.
Bhide A.
Carey M.
Chien S.
Dimitris Papadias
Jiang L.
Jun Zhang
Papadias D.
Soo M.
Tao Y.
Tao Y.
Tao Y.
Tzouramanis T.
Xu X.
Yang J.
Yufei Tao
Zhang D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref