14 research outputs found

    GATECloud.net: a Platform for Large-Scale, Open-Source Text Processing on the Cloud

    Get PDF
    Cloud computing is increasingly being regarded as a key enabler of the ‘democratization of science’, because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research—GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost–benefit analysis and usage evaluation

    Distributed web-scale infrastructure for crawling, indexing and search with semantic support

    No full text
    In this paper, we describe our work in progress in the scope of web-scale information extraction and information retrieval utilizing distributed computing. We present a distributed architecture built on top of the MapReduce paradigm for information retrieval, information processing and intelligent search supported by spatial capabilities. Proposed architecture is focused on crawling documents in several different formats, information extraction, lightweight semantic annotation of the extracted information, indexing of extracted information and finally on indexing of documents based on the geo-spatial information found in a document. We demonstrate the architecture on two use cases, where the first is search in job offers retrieved from the LinkedIn portal and the second is search in BBC news feeds and discuss several problems we had to face during the implemen-tation. We also discuss spatial search applications for both cases because both LinkedIn job offer pages and BBC news feeds contain a lot of spatial information to extract and process

    Ontology Alignment for Contract Based Virtual Organizations Negotiation and Operation

    No full text

    Annotations: A Way to Capture Experience

    No full text
    International audienceThis paper describes a cooperative tool used by a mechanical engineering team carrying out an asynchronous and distributed work. The tool supports communication phases as well as document and product creation phases by means of annotations. Memorizing all the annotations and creating documents from these annotations enable the team members to capture their experience. Annotation is here defined as a continuum fulfilling several purposes from communication and argumentation to indexation

    Supporting Dynamic, People-Driven Processes through Self-learning of Message Flows

    No full text
    Abstract. Flexibility and automatic learning are key aspects to support users in dynamic business environments such as value chains across SMEs or when organizing a large event. Process centric information systems need to adapt to changing environmental constraints as reflected in the user’s behavior in order to provide suitable activity recommendations. This paper addresses the problem of automatically detecting and managing message flows in evolving people-driven processes. We introduce a probabilistic process model and message state model to learn message-activity dependencies, predict message occurrence, and keep the process model in line with real world user behavior. Our probabilistic process engine demonstrates rapid learning of message flow evolution while maintaining the quality of activity recommendations
    corecore