25,012 research outputs found
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
FĂschlĂĄr-DiamondTouch: collaborative video searching on a table
In this paper we present the system we have developed for our participation in the annual TRECVid benchmarking activity, specically the system we have developed, FĂschlĂĄr-DT, for participation in the interactive search
task of TRECVid 2005. Our back-end search engine uses a combination of a text search which operates over the automatic speech recognised text, and an image search which uses low-level image features matched against video keyframes. The two novel aspects of our work are the fact that we are evaluating collaborative, team-based search among groups of users working together, and that we are using a novel touch-sensitive tabletop interface and interaction device known as the DiamondTouch to support this collaborative search. The paper summarises the backend search systems as well as presenting the interface we have developed, in detail
Grid-Brick Event Processing Framework in GEPS
Experiments like ATLAS at LHC involve a scale of computing and data
management that greatly exceeds the capability of existing systems, making it
necessary to resort to Grid-based Parallel Event Processing Systems (GEPS).
Traditional Grid systems concentrate the data in central data servers which
have to be accessed by many nodes each time an analysis or processing job
starts. These systems require very powerful central data servers and make
little use of the distributed disk space that is available in commodity
computers. The Grid-Brick system, which is described in this paper, follows a
different approach. The data storage is split among all grid nodes having each
one a piece of the whole information. Users submit queries and the system will
distribute the tasks through all the nodes and retrieve the result, merging
them together in the Job Submit Server. The main advantage of using this system
is the huge scalability it provides, while its biggest disadvantage appears in
the case of failure of one of the nodes. A workaround for this problem involves
data replication or backup.Comment: 6 pages; document for CHEP'03 conferenc
Experiments on domain adaptation for English-Hindi SMT
Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which again results in a low translation quality. In this paper, (i) we explore domain-adaptation techniques to combine large out-of-domain training data with small-scale in-domain training data for EnglishâHindi statistical machine translation and (ii) we cluster large out-of-domain training data to extract sentences similar to in-domain sentences and apply adaptation techniques to combine clustered sub-corpora
with in-domain training data into a unified framework, achieving a 0.44 absolute corresponding to a 4.03% relative improvement in terms of BLEU over the baseline
Towards using web-crawled data for domain adaptation in statistical machine translation
This paper reports on the ongoing work focused on domain adaptation of statistical machine translation using domain-speciïŹc data obtained by domain-focused web crawling. We present a strategy for crawling monolingual and parallel data and their exploitation for testing, language modelling, and system tuning in a phrase--based machine translation framework. The proposed approach is evaluated on the domains of Natural Environment and Labour Legislation and two language
pairs: EnglishâFrench and EnglishâGreek
The Digital Puglia Project: An Active Digital Library of Remote Sensing Data
The growing need of software infrastructure able to create, maintain and ease the evolution of scientific data, promotes the development of digital libraries in order to provide the user with fast and reliable access to data. In a world that is rapidly changing, the standard view of a digital library as a data repository specialized to a community of users and provided with some search tools is no longer tenable. To be effective, a digital library should be an active digital library, meaning that users can process available data not just to retrieve a particular piece of information, but to infer new knowledge about the data at hand. Digital Puglia is a new project, conceived to emphasize not only retrieval of data to the client's workstation, but also customized processing of the data. Such processing tasks may include data mining, filtering and knowledge discovery in huge databases, compute-intensive image processing (such as principal component analysis, supervised classification, or pattern matching) and on demand computing sessions. We describe the issues, the requirements and the underlying technologies of the Digital Puglia Project, whose final goal is to build a high performance distributed and active digital library of remote sensing data
- âŠ