254,594 research outputs found

    Observing observatories: web observatories should use linked data

    No full text
    Web Observatories are a major international scientific collaboration concerned with data sources of a heterogeneous nature, and often quite large. Of course, they are not the first such collaboration; the Web itself was born as a response to a similar scientific endeavor. It is therefore appropriate to look at other col-laborative activities, and try to learn and use the lessons they have learnt.We argue that Web Observatories should build in interoperability using current best practices right from the start. We also argue that Linked Data is a best practice, and can provide the basis for a research environment that will deliver the vision of a large group of cooperating Observatories, sharing data and re-search results to the benefit of all. In addition, we argue that the activity should not start with a major standardization process, but should grow around appro-priate standards as required

    GLOBAL INDEX CONSTRUCTION FOR DATA INTEGRATION IN LARGE SCALE SYSTEM

    Get PDF
    Several scientific projects focused on the creation of Peer-to-Peer data management system. The main objective of these systems is to allow data sharing and integration among a large set of distributed, heterogeneous data sources. The emergence of large scale systems provides solutions and brings to surface new challenging unsolved problems, among which, we address the data integration problem. In order to address this problem, we propose a new data integration approach that allows the semantic integration of heterogeneous and distributed data sources in a Peer-to-Peer environment with high distribution and evolution support. In this paper, we provide an introduction to the approaches; problems and research issues encountered when dealing with data integration.We present our approach and describe the several methods for constructing a global index that is the core of our approach by using semantic similarities. We end our work by an application example

    Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies

    Full text link
    Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing resources that require secure resource sharing across organizational boundaries. This makes Grid application management and deployment a complex undertaking. Grid middlewares provide users with seamless computing ability and uniform access to resources in the heterogeneous Grid environment. Several software toolkits and systems have been developed, most of which are results of academic research projects, all over the world. This chapter will focus on four of these middlewares--UNICORE, Globus, Legion and Gridbus. It also presents our implementation of a resource broker for UNICORE as this functionality was not supported in it. A comparison of these systems on the basis of the architecture, implementation model and several other features is included.Comment: 19 pages, 10 figure

    Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating and Analyzing Large-Scale e-Commerce Data

    Get PDF
    Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and nonmarket data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet-based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and nonmarket activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points toward a significant leap in our ability to understand the functioning of electronic commerce.Comment: Published at http://dx.doi.org/10.1214/088342306000000231 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Parallel Asynchronous Particle Swarm Optimization For Job Scheduling In Grid Environment

    Get PDF
    Grid computing is a new, large and powerful self managing virtual computer out of large collection of connected heterogeneous systems sharing various combination of resources and it is the combination of computer resources from multiple administrative domains applied to achieve a goal, it is used to solve scientific, technical or business problem that requires a great number of processing cycles and needs large amounts of data. One primary issue associated with the efficient utilization of heterogeneous resources in a grid environment is task scheduling. Task Scheduling is an important issue of current implementation of grid computing. The demand for scheduling is to achieve high performance computing. If large number of tasks is computed on the geographically distributed resources, a reasonable scheduling algorithm must be adopted in order to get the minimum completion time. Typically, it is difficult to find an optimal resource allocation for specific job that minimizes the schedule length of jobs. So the scheduling problem is defined as NP-complete problem and it is not trivial. Heuristic algorithms are used to solve the task scheduling problem in the grid environment and may provide high performance or high throughput computing or both. In this paper, a parallel asynchronous particle swarm optimization algorithm is proposed for job scheduling. The proposed scheduler allocates the best suitable resources to each task with minimal makespan and execution time. The experimental results are compared which shows that the algorithm produces better results when compared with the existing ant colony algorithm

    A semantic framework for web-based accommodation information integration

    Full text link
    University of Technology, Sydney. Faculty of Engineering and Information Technology.With the tremendous growth of the Web, a broad spectrum of accommodation information is to be found on the Internet. In order to adequately support information users in collecting and sharing information online, it is important to create an effective information integration solution, and to provide integrated access to the vast numbers of online information sources. In addition to the problem of distributed information sources, information users also need to cope with the heterogeneous nature of the online information sources, where individual information sources are stored and presented following their own structures and formats. In this thesis, we explore some of the challenges in the field of information integration, and propose solutions to some of the arising challenges. We focus on the utilization of ontology for integrating heterogeneous, structured and semi-structured information sources, where instance level data are stored and presented according to meta-data level schemas. In particular, we looked at XML-based data that is stored according to XML schemas. In a first step towards a large-scale information integration solution, we propose a semantic integration framework. The proposed framework solves the problem of information integration on three levels: the data level, process level and architecture level. On the data level, we leverage the benefit of ontology, and use ontology as a mediator for enabling semantic interoperability among heterogeneous data sources. On the process level, we alter the process of information integration, and propose a three step integration process named as the publish-combine-use mechanism. The primary goal is to distribute the efforts of collecting and integrating information sources to various types of end users. In the proposed approach, information providers have more control over their own data sources, as data sources are able to join and leave the information sharing network according to their own preferences. On the architecture level, we combine the flexibility offered by the emerging distributed P2P approach with the query processing capability provided by the centralized approach. The joint architecture is similar to the structure of the online accommodation industry. This thesis also demonstrates the practical applicability of the proposed semantic integration framework by implementing a prototype system. The prototype system named the "accommodation hub" is specifically developed for integrating online accommodation information in the large, distributed, heterogeneous online environment. The proposed semantic integration solution and the implemented prototype system are evaluated to provide a measure of the system performance and usage. Results show that the proposed solution delivers better performance with respect to some of the evaluation criteria than some related approaches in information integration

    Music business models and piracy

    Get PDF
    The purpose of this paper is to estimate the scale of illegal file-sharing activity across ten countries and to correlate this activity with country revenues. The work aims to elucidate an under-explored business model challenge which exists in parallel with a music piracy challenge. The study data are drawn from a number of sources, including a data set of a survey of more than 44,000 consumers in ten different countries undertaken in 2010. Following analysis, all findings are validated by a panel of industry experts. Results show that non-legitimate file-sharing activity is a heterogeneous issue across countries. The scale of activity varies from 14 per cent in Germany to 44 per cent in Spain, with an average of 28 per cent. File-sharing activity negatively correlates to music industry revenue per capita. This research finds many consumers are not engaging with online business models. Almost one fourth of the population claim that they do not consume digital music in either legal or illegal forms. This phenomenon is also negatively correlated with sales per capita. Results support the need for policy makers to introduce strong intellectual property rights (IPR) regulation which reduces file-sharing activity. The work also identifies a large percentage of non-participants in the digital market who may be re-engaged with music through business model innovation. This research presents a map of the current file-sharing activity in ten countries using a rich and unique dataset. The work identifies that a country's legal origin correlates to data on file-sharing activity, with countries from a German legal origin illegally file sharing least. Approximately, half of the survey respondents chose not to answer the question related to file-sharing activity. Different estimates of the true scale of file-sharing activity are given based upon three different assumptions of the file sharing activity of non-respondents to this question. The challenge of engaging consumers in the digital market through different business models is discussed in light of digital music's high velocity environment. © 2013, Emerald Group Publishing Limited. All rights reserved
    • 

    corecore