Search CORE

11,155 research outputs found

DataHub: Collaborative Data Science & Dataset Version Management at Scale

Author: Bhardwaj Anant
Bhattacherjee Souvik
Chavan Amit
Deshpande Amol
Elmore Aaron J.
Madden Samuel
Parameswaran Aditya G.
Publication venue
Publication date: 02/09/2014
Field of study

Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

XML content warehousing: Improving sociological studies of mailing lists and web data

Author: Colazzo Dario
Dudouet François-Xavier
Manolescu Ioana
Nguyen Benjamin
Senellart Pierre
Vion Antoine
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Crossref

INRIA a CCSD electronic archive server

HAL UVSQ

HAL-Rennes 1

1st INCF Workshop on Sustainability of Neuroscience Databases

Author: Jaap van Pelt
Jack Van Horn
Publication venue
Publication date: 17/06/2008
Field of study

The goal of the workshop was to discuss issues related to the sustainability of neuroscience databases, identify problems and propose solutions, and formulate recommendations to the INCF. The report summarizes the discussions of invited participants from the neuroinformatics community as well as from other disciplines where sustainability issues have already been approached. The recommendations for the INCF involve rating, ranking, and supporting database sustainability

Crossref

Nature Precedings

Database integrated analytics using R : initial experiences with SQL-Server + R

Author: Berral Josep Ll.
Poggi Nicolas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Most data scientists use nowadays functional or semi-functional languages like SQL, Scala or R to treat data, obtained directly from databases. Such process requires to fetch data, process it, then store again, and such process tends to be done outside the DB, in often complex data-flows. Recently, database service providers have decided to integrate “R-as-a-Service” in their DB solutions. The analytics engine is called directly from the SQL query tree, and results are returned as part of the same query. Here we show a first taste of such technology by testing the portability of our ALOJA-ML analytics framework, coded in R, to Microsoft SQL-Server 2016, one of the SQL+R solutions released recently. In this work we discuss some data-flow schemes for porting a local DB + analytics engine architecture towards Big Data, focusing specially on the new DB Integrated Analytics approach, and commenting the first experiences in usability and performance obtained from such new services and capabilities.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Abmash: Mashing Up Legacy Web Applications by Automated Imitation of Human Actions

Author: Mezini Mira
Monperrus Martin
Ortac Alper
Publication venue: 'Wiley'
Publication date: 01/01/2013
Field of study

Many business web-based applications do not offer applications programming interfaces (APIs) to enable other applications to access their data and functions in a programmatic manner. This makes their composition difficult (for instance to synchronize data between two applications). To address this challenge, this paper presents Abmash, an approach to facilitate the integration of such legacy web applications by automatically imitating human interactions with them. By automatically interacting with the graphical user interface (GUI) of web applications, the system supports all forms of integrations including bi-directional interactions and is able to interact with AJAX-based applications. Furthermore, the integration programs are easy to write since they deal with end-user, visual user-interface elements. The integration code is simple enough to be called a "mashup".Comment: Software: Practice and Experience (2013)

arXiv.org e-Print Archive

HAL - Lille 3

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

SWISH: SWI-Prolog for Sharing

Author: Lager Torbjörn
Riguzzi Fabrizio
Wielemaker Jan
Publication venue
Publication date: 01/01/2015
Field of study

Recently, we see a new type of interfaces for programmers based on web technology. For example, JSFiddle, IPython Notebook and R-studio. Web technology enables cloud-based solutions, embedding in tutorial web pages, atractive rendering of results, web-scale cooperative development, etc. This article describes SWISH, a web front-end for Prolog. A public website exposes SWI-Prolog using SWISH, which is used to run small Prolog programs for demonstration, experimentation and education. We connected SWISH to the ClioPatria semantic web toolkit, where it allows for collaborative development of programs and queries related to a dataset as well as performing maintenance tasks on the running server and we embedded SWISH in the Learn Prolog Now! online Prolog book.Comment: International Workshop on User-Oriented Logic Programming (IULP 2015), co-located with the 31st International Conference on Logic Programming (ICLP 2015), Proceedings of the International Workshop on User-Oriented Logic Programming (IULP 2015), Editors: Stefan Ellmauthaler and Claudia Schulz, pages 99-113, August 201

arXiv.org e-Print Archive

VU Research Portal

Archivio istituzionale della ricerca - Università di Ferrara

A Shared Ontology Approach to Semantic Representation of BIM Data

Author: Karshenas Saeed
Niknam Mehrdad
Publication venue: e-Publications@Marquette
Publication date: 01/08/2017
Field of study

Architecture, engineering, construction and facility management (AEC-FM) projects involve a large number of participants that must exchange information and combine their knowledge for successful completion of a project. Currently, most of the AEC-FM domains store their information about a project in text documents or use XML, relational, or object-oriented formats that make information integration difficult. The AEC-FM industry is not taking advantage of the full potential of the Semantic Web for streamlining sharing, connecting, and combining information from different domains. The Semantic Web is designed to solve the information integration problem by creating a web of structured and connected data that can be processed by machines. It allows combining information from different sources with different underlying schemas distributed over the Internet. In the Semantic Web, all data instances and data schema are stored in a graph data store, which makes it easy to merge data from different sources. This paper presents a shared ontology approach to semantic representation of building information. The semantic representation of building information facilitates finding and integrating building information distributed in several knowledge bases. A case study demonstrates the development of a semantic based building design knowledge base

epublications@Marquette