2,822 research outputs found

    Emerging Artificial Societies Through Learning

    Get PDF
    The NewTies project is implementing a simulation in which societies of agents are expected to de-velop autonomously as a result of individual, population and social learning. These societies are expected to be able to solve environmental challenges by acting collectively. The challenges are in-tended to be analogous to those faced by early, simple, small-scale human societies. This report on work in progress outlines the major features of the system as it is currently conceived within the project, including the design of the agents, the environment, the mechanism for the evolution of language and the peer-to-peer infrastructure on which the simulation runs.Artificial Societies, Evolution of Language, Decision Trees, Peer-To-Peer Networks, Social Learning

    Adversarial Sampling and Training for Semi-Supervised Information Retrieval

    Full text link
    Ad-hoc retrieval models with implicit feedback often have problems, e.g., the imbalanced classes in the data set. Too few clicked documents may hurt generalization ability of the models, whereas too many non-clicked documents may harm effectiveness of the models and efficiency of training. In addition, recent neural network-based models are vulnerable to adversarial examples due to the linear nature in them. To solve the problems at the same time, we propose an adversarial sampling and training framework to learn ad-hoc retrieval models with implicit feedback. Our key idea is (i) to augment clicked examples by adversarial training for better generalization and (ii) to obtain very informational non-clicked examples by adversarial sampling and training. Experiments are performed on benchmark data sets for common ad-hoc retrieval tasks such as Web search, item recommendation, and question answering. Experimental results indicate that the proposed approaches significantly outperform strong baselines especially for high-ranked documents, and they outperform IRGAN in NDCG@5 using only 5% of labeled data for the Web search task.Comment: Published in WWW 201

    Acquiring Word-Meaning Mappings for Natural Language Interfaces

    Full text link
    This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. WOLFIE is part of an integrated system that learns to transform sentences into representations such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by WOLFIE are compared to those acquired by a similar system, with results favorable to WOLFIE. A second set of experiments demonstrates WOLFIE's ability to scale to larger and more difficult, albeit artificially generated, corpora. In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance

    A Formal Framework for Linguistic Annotation

    Get PDF
    `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

    Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    Get PDF
    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

    Incremental Unit Networks for Distributed, Symbolic Multimodal Processing and Representation

    Get PDF
    Incremental dialogue processing has been an important topic in spoken dialogue systems research, but the broader research community that makes use of language interaction (e.g., chatbots, conversational AI, spoken interaction with robots) have not adopted incremental processing despite research showing that humans perceive incremental dialogue as more natural. In this paper, we extend prior work that identifies the requirements for making spoken interaction with a system natural with the goal that our framework will be generalizable to many domains where speech is the primary method of communication. The Incremental Unit framework offers a model of incremental processing that has been extended to be multimodal, temporally aligned, enables real-time information updates, and creates complex network of information as a fine-grained information state. One challenge is that multimodal dialogue systems often have computationally expensive modules, requiring computation to be distributive. Most importantly, when speech is the means of communication, it brings the added expectation that systems understand what they (humans) say, but also that systems understand and respond without delay. In this paper, we build on top of the Incremental Unit framework and make it amenable to a distributive architecture made up of a robot and spoken dialogue system modules. To enable fast communication between the modules and to maintain module state histories, we compared two different implementations of a distributed Incremental Unit architecture. We compare both implementations systematically then with real human users and show that the implementation that uses an external attribute-value database is preferred, but there is some flexibility in which variant to use depending on the circumstances. This work offers the Incremental Unit framework as an architecture for building powerful, complete, and natural dialogue systems, specifically applicable to robots and multimodal systems researchers

    “A Good Algorithm Does Not Steal – It Imitates” : The Originality Report as a Means of Measuring When a Music Generation Algorithm Copies Too Much

    Get PDF
    Research on automatic music generation lacks consideration of the originality of musical outputs, creating risks of plagiarism and/or copyright infringement. We present the originality report – a set of analyses for measuring the extent to which an algorithm copies from the input music on which it is trained. First, a baseline is constructed, determining the extent to which human composers borrow from themselves and each other in some existing music corpus. Second, we apply a similar analysis to musical outputs of runs of MAIA Markov and Music Transformer generation algorithms, and compare the results to the baseline. Third, we investigate how originality varies as a function of Transformer’s training epoch. Results from the second analysis indicate that the originality of Transformer’s output is below the 95%-confidence interval of the baseline. Musicological interpretation of the analyses shows that the Transformer model obtained via the conventional stopping criteria produces single-note repetition patterns, resulting in outputs of low quality and originality, while in later training epochs, the model tends to overfit, producing copies of excerpts of input pieces. We recommend the originality report as a new means of evaluating algorithm training processes and outputs in future, and question the reported success of language-based deep learning models for music generation. Supporting materials (code, dataset) will be made available via https://​osf.​io/​96emr/​

    Emerging Artificial Societies Through Learning

    Get PDF
    The NewTies project is implementing a simulation in which societies of agents are expected to de-velop autonomously as a result of individual, population and social learning. These societies are expected to be able to solve environmental challenges by acting collectively. The challenges are in-tended to be analogous to those faced by early, simple, small-scale human societies. This report on work in progress outlines the major features of the system as it is currently conceived within the project, including the design of the agents, the environment, the mechanism for the evolution of language and the peer-to-peer infrastructure on which the simulation runs

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to users’ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the users’ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections
    • …
    corecore