75,265 research outputs found
Grid-Brick Event Processing Framework in GEPS
Experiments like ATLAS at LHC involve a scale of computing and data
management that greatly exceeds the capability of existing systems, making it
necessary to resort to Grid-based Parallel Event Processing Systems (GEPS).
Traditional Grid systems concentrate the data in central data servers which
have to be accessed by many nodes each time an analysis or processing job
starts. These systems require very powerful central data servers and make
little use of the distributed disk space that is available in commodity
computers. The Grid-Brick system, which is described in this paper, follows a
different approach. The data storage is split among all grid nodes having each
one a piece of the whole information. Users submit queries and the system will
distribute the tasks through all the nodes and retrieve the result, merging
them together in the Job Submit Server. The main advantage of using this system
is the huge scalability it provides, while its biggest disadvantage appears in
the case of failure of one of the nodes. A workaround for this problem involves
data replication or backup.Comment: 6 pages; document for CHEP'03 conferenc
Exploring user and system requirements of linked data visualization through a visual dashboard approach
One of the open problems in SemanticWeb research is which tools should be provided to users to explore linked data. This is even more urgent now that massive amount of linked data is being released by governments worldwide. The development of single dedicated visualization applications is increasing, but the problem of exploring unknown linked data to gain a good understanding of what is contained is still open. An effective generic solution must take into account the user’s point of view, their tasks and interaction, as well as the system’s capabilities and the technical constraints the technology imposes. This paper is a first step in understanding the implications of both, user and system by evaluating our dashboard-based approach. Though we observe a high user acceptance of the dashboard approach, our paper also highlights technical challenges arising out of complexities involving current infrastructure that need to be addressed while visualising linked data. In light of the findings, guidelines for the development of linked data visualization (and manipulation) are provided
A perspective on the Healthgrid initiative
This paper presents a perspective on the Healthgrid initiative which involves
European projects deploying pioneering applications of grid technology in the
health sector. In the last couple of years, several grid projects have been
funded on health related issues at national and European levels. A crucial
issue is to maximize their cross fertilization in the context of an environment
where data of medical interest can be stored and made easily available to the
different actors in healthcare, physicians, healthcare centres and
administrations, and of course the citizens. The Healthgrid initiative,
represented by the Healthgrid association (http://www.healthgrid.org), was
initiated to bring the necessary long term continuity, to reinforce and promote
awareness of the possibilities and advantages linked to the deployment of GRID
technologies in health. Technologies to address the specific requirements for
medical applications are under development. Results from the DataGrid and other
projects are given as examples of early applications.Comment: 6 pages, 1 figure. Accepted by the Second International Workshop on
Biomedical Computations on the Grid, at the 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid (CCGrid 2004). Chicago USA, April
200
Data locality in Hadoop
Current market tendencies show the need of storing and processing rapidly
growing amounts of data. Therefore, it implies the demand for distributed
storage and data processing systems. The Apache Hadoop is an open-source
framework for managing such computing clusters in an effective, fault-tolerant
way.
Dealing with large volumes of data, Hadoop, and its storage system HDFS
(Hadoop Distributed File System), face challenges to keep the high efficiency
with computing in a reasonable time. The typical Hadoop implementation
transfers computation to the data, rather than shipping data across the cluster.
Otherwise, moving the big quantities of data through the network could significantly
delay data processing tasks. However, while a task is already running,
Hadoop favours local data access and chooses blocks from the nearest nodes.
Next, the necessary blocks are moved just when they are needed in the given
ask.
For supporting the Hadoop’s data locality preferences, in this thesis, we propose
adding an innovative functionality to its distributed file system (HDFS), that
enables moving data blocks on request. In-advance shipping of data makes it
possible to forcedly redistribute data between nodes in order to easily adapt it to
the given processing tasks. New functionality enables the instructed movement
of data blocks within the cluster. Data can be shifted either by user running
the proper HDFS shell command or programmatically by other module like an
appropriate scheduler.
In order to develop such functionality, the detailed analysis of Apache Hadoop
source code and its components (specifically HDFS) was conducted. Research
resulted in a deep understanding of internal architecture, what made it possible
to compare the possible approaches to achieve the desired solution, and develop
the chosen one
A horizontally-scalable multiprocessing platform based on Node.js
This paper presents a scalable web-based platform called Node Scala which
allows to split and handle requests on a parallel distributed system according
to pre-defined use cases. We applied this platform to a client application that
visualizes climate data stored in a NoSQL database MongoDB. The design of Node
Scala leads to efficient usage of available computing resources in addition to
allowing the system to scale simply by adding new workers. Performance
evaluation of Node Scala demonstrated a gain of up to 74 % compared to the
state-of-the-art techniques.Comment: 8 pages, 7 figures. Accepted for publication as a conference paper
for the 13th IEEE International Symposium on Parallel and Distributed
Processing with Applications (IEEE ISPA-15
Database independent Migration of Objects into an Object-Relational Database
This paper reports on the CERN-based WISDOM project which is studying the
serialisation and deserialisation of data to/from an object database
(objectivity) and ORACLE 9i.Comment: 26 pages, 18 figures; CMS CERN Conference Report cr02_01
- …