106 research outputs found

    GIRNet: Interleaved Multi-Task Recurrent State Sequence Models

    Full text link
    In several natural language tasks, labeled sequences are available in separate domains (say, languages), but the goal is to label sequences with mixed domain (such as code-switched text). Or, we may have available models for labeling whole passages (say, with sentiments), which we would like to exploit toward better position-specific label inference (say, target-dependent sentiment annotation). A key characteristic shared across such tasks is that different positions in a primary instance can benefit from different `experts' trained from auxiliary data, but labeled primary instances are scarce, and labeling the best expert for each position entails unacceptable cognitive burden. We propose GITNet, a unified position-sensitive multi-task recurrent neural network (RNN) architecture for such applications. Auxiliary and primary tasks need not share training instances. Auxiliary RNNs are trained over auxiliary instances. A primary instance is also submitted to each auxiliary RNN, but their state sequences are gated and merged into a novel composite state sequence tailored to the primary inference task. Our approach is in sharp contrast to recent multi-task networks like the cross-stitch and sluice network, which do not control state transfer at such fine granularity. We demonstrate the superiority of GIRNet using three applications: sentiment classification of code-switched passages, part-of-speech tagging of code-switched text, and target position-sensitive annotation of sentiment in monolingual passages. In all cases, we establish new state-of-the-art performance beyond recent competitive baselines.Comment: Accepted at AAAI 201

    Ontologies and Information Extraction

    Full text link
    This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

    Vers une approche interactive pour l'annotation sémantique

    No full text
    International audienceNous présentons une méthodologie permettant la constitution d'une ressource destinée à l'annotation sémantique de corpus. Notre démarche s'inscrit dans le cadre des plateformes d'annotation linguistique. Elle permet de créer un étage d'annotation sémantique constitué de règles d'annotation qui tirent profit dans leur expression des différents niveaux inférieurs d'annotation linguistique de la plateforme. La particularité de l'approche présentée est d'assister l'utilisateur à travers un processus interactif et itératif où il est possible de travailler de manière duale sur les règles d'annotation ainsi que sur des exemples d'annotation

    Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval

    Get PDF
    Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems, i.e., image tag assignment, refinement, and tag-based image retrieval is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, i.e. estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this paper introduces a taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison between the state-of-the-art, a new experimental protocol is presented, with training sets containing 10k, 100k and 1m images and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.Comment: to appear in ACM Computing Survey
    • …
    corecore