14 research outputs found
GATECloud.net: a Platform for Large-Scale, Open-Source Text Processing on the Cloud
Cloud computing is increasingly being regarded as a key enabler of the ‘democratization of science’, because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research—GATECloud.
net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and
fault tolerance. We also include a cost–benefit analysis
and usage evaluation
Distributed web-scale infrastructure for crawling, indexing and search with semantic support
In this paper, we describe our work in progress in the scope of web-scale information extraction and information retrieval utilizing distributed computing. We present a distributed architecture built on top of the MapReduce paradigm for information retrieval, information processing and intelligent search supported by spatial capabilities. Proposed architecture is focused on crawling documents in several different formats, information extraction, lightweight semantic annotation of the extracted information, indexing of extracted information and finally on indexing of documents based on the geo-spatial information found in a document. We demonstrate the architecture on two use cases, where the first is search in job offers retrieved from the LinkedIn portal and the second is search in BBC news feeds and discuss several problems we had to face during the implemen-tation. We also discuss spatial search applications for both cases because both LinkedIn job offer pages and BBC news feeds contain a lot of spatial information to extract and process
Annotations: A Way to Capture Experience
International audienceThis paper describes a cooperative tool used by a mechanical engineering team carrying out an asynchronous and distributed work. The tool supports communication phases as well as document and product creation phases by means of annotations. Memorizing all the annotations and creating documents from these annotations enable the team members to capture their experience. Annotation is here defined as a continuum fulfilling several purposes from communication and argumentation to indexation
Supporting Dynamic, People-Driven Processes through Self-learning of Message Flows
Abstract. Flexibility and automatic learning are key aspects to support users in dynamic business environments such as value chains across SMEs or when organizing a large event. Process centric information systems need to adapt to changing environmental constraints as reflected in the user’s behavior in order to provide suitable activity recommendations. This paper addresses the problem of automatically detecting and managing message flows in evolving people-driven processes. We introduce a probabilistic process model and message state model to learn message-activity dependencies, predict message occurrence, and keep the process model in line with real world user behavior. Our probabilistic process engine demonstrates rapid learning of message flow evolution while maintaining the quality of activity recommendations