75,265 research outputs found

    Grid-Brick Event Processing Framework in GEPS

    Full text link
    Experiments like ATLAS at LHC involve a scale of computing and data management that greatly exceeds the capability of existing systems, making it necessary to resort to Grid-based Parallel Event Processing Systems (GEPS). Traditional Grid systems concentrate the data in central data servers which have to be accessed by many nodes each time an analysis or processing job starts. These systems require very powerful central data servers and make little use of the distributed disk space that is available in commodity computers. The Grid-Brick system, which is described in this paper, follows a different approach. The data storage is split among all grid nodes having each one a piece of the whole information. Users submit queries and the system will distribute the tasks through all the nodes and retrieve the result, merging them together in the Job Submit Server. The main advantage of using this system is the huge scalability it provides, while its biggest disadvantage appears in the case of failure of one of the nodes. A workaround for this problem involves data replication or backup.Comment: 6 pages; document for CHEP'03 conferenc

    Exploring user and system requirements of linked data visualization through a visual dashboard approach

    Get PDF
    One of the open problems in SemanticWeb research is which tools should be provided to users to explore linked data. This is even more urgent now that massive amount of linked data is being released by governments worldwide. The development of single dedicated visualization applications is increasing, but the problem of exploring unknown linked data to gain a good understanding of what is contained is still open. An effective generic solution must take into account the user’s point of view, their tasks and interaction, as well as the system’s capabilities and the technical constraints the technology imposes. This paper is a first step in understanding the implications of both, user and system by evaluating our dashboard-based approach. Though we observe a high user acceptance of the dashboard approach, our paper also highlights technical challenges arising out of complexities involving current infrastructure that need to be addressed while visualising linked data. In light of the findings, guidelines for the development of linked data visualization (and manipulation) are provided

    A perspective on the Healthgrid initiative

    Full text link
    This paper presents a perspective on the Healthgrid initiative which involves European projects deploying pioneering applications of grid technology in the health sector. In the last couple of years, several grid projects have been funded on health related issues at national and European levels. A crucial issue is to maximize their cross fertilization in the context of an environment where data of medical interest can be stored and made easily available to the different actors in healthcare, physicians, healthcare centres and administrations, and of course the citizens. The Healthgrid initiative, represented by the Healthgrid association (http://www.healthgrid.org), was initiated to bring the necessary long term continuity, to reinforce and promote awareness of the possibilities and advantages linked to the deployment of GRID technologies in health. Technologies to address the specific requirements for medical applications are under development. Results from the DataGrid and other projects are given as examples of early applications.Comment: 6 pages, 1 figure. Accepted by the Second International Workshop on Biomedical Computations on the Grid, at the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004). Chicago USA, April 200

    Data locality in Hadoop

    Get PDF
    Current market tendencies show the need of storing and processing rapidly growing amounts of data. Therefore, it implies the demand for distributed storage and data processing systems. The Apache Hadoop is an open-source framework for managing such computing clusters in an effective, fault-tolerant way. Dealing with large volumes of data, Hadoop, and its storage system HDFS (Hadoop Distributed File System), face challenges to keep the high efficiency with computing in a reasonable time. The typical Hadoop implementation transfers computation to the data, rather than shipping data across the cluster. Otherwise, moving the big quantities of data through the network could significantly delay data processing tasks. However, while a task is already running, Hadoop favours local data access and chooses blocks from the nearest nodes. Next, the necessary blocks are moved just when they are needed in the given ask. For supporting the Hadoop’s data locality preferences, in this thesis, we propose adding an innovative functionality to its distributed file system (HDFS), that enables moving data blocks on request. In-advance shipping of data makes it possible to forcedly redistribute data between nodes in order to easily adapt it to the given processing tasks. New functionality enables the instructed movement of data blocks within the cluster. Data can be shifted either by user running the proper HDFS shell command or programmatically by other module like an appropriate scheduler. In order to develop such functionality, the detailed analysis of Apache Hadoop source code and its components (specifically HDFS) was conducted. Research resulted in a deep understanding of internal architecture, what made it possible to compare the possible approaches to achieve the desired solution, and develop the chosen one

    A horizontally-scalable multiprocessing platform based on Node.js

    Full text link
    This paper presents a scalable web-based platform called Node Scala which allows to split and handle requests on a parallel distributed system according to pre-defined use cases. We applied this platform to a client application that visualizes climate data stored in a NoSQL database MongoDB. The design of Node Scala leads to efficient usage of available computing resources in addition to allowing the system to scale simply by adding new workers. Performance evaluation of Node Scala demonstrated a gain of up to 74 % compared to the state-of-the-art techniques.Comment: 8 pages, 7 figures. Accepted for publication as a conference paper for the 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA-15

    Database independent Migration of Objects into an Object-Relational Database

    Get PDF
    This paper reports on the CERN-based WISDOM project which is studying the serialisation and deserialisation of data to/from an object database (objectivity) and ORACLE 9i.Comment: 26 pages, 18 figures; CMS CERN Conference Report cr02_01
    • …
    corecore