62 research outputs found

    An integration framework for managing rich organisational process knowledge

    Get PDF
    The problem we have addressed in this dissertation is that of designing a pragmatic framework for integrating the synthesis and management of organisational process knowledge which is based on domain-independent AI planning and plan representations. Our solution has focused on a set of framework components which provide methods, tools and representations to accomplish this task.In the framework we address a lifecycle of this knowledge which begins with a methodological approach to acquiring information about the process domain. We show that this initial domain specification can be translated into a common constraint-based model of activity (based on the work of Tate, 1996c and 1996d) which can then be operationalised for use in an AI planner. This model of activity is ontologically underpinned and may be expressed with a flexible and extensible language based on a sorted first-order logic. The model combines perspectives covering both the space of behaviour as well as the space of decisions. Synthesised or modified processes/plans can be translated to and from the common representation in order to support knowledge sharing, visualisation and mixed-initiative interaction.This work united past and present Edinburgh research on planning and infused it with perspectives from design rationale, requirements engineering, and process knowledge sharing. The implementation has been applied to a portfolio of scenarios which include process examples from business, manufacturing, construction and military operations. An archive of this work is available at: http://www.aiai.ed.ac.uk/~oplan/cpf

    Joint Perceptual Learning and Natural Language Acquisition for Autonomous Robots

    Get PDF
    Understanding how children learn the components of their mother tongue and the meanings of each word has long fascinated linguists and cognitive scientists. Equally, robots face a similar challenge in understanding language and perception to allow for a natural and effortless human-robot interaction. Acquiring such knowledge is a challenging task, unless this knowledge is preprogrammed, which is no easy task either, nor does it solve the problem of language difference between individuals or learning the meaning of new words. In this thesis, the problem of bootstrapping knowledge in language and vision for autonomous robots is addressed through novel techniques in grammar induction and word grounding to the perceptual world. The learning is achieved in a cognitively plausible loosely-supervised manner from raw linguistic and visual data. The visual data is collected using different robotic platforms deployed in real-world and simulated environments and equipped with different sensing modalities, while the linguistic data is collected using online crowdsourcing tools and volunteers. The presented framework does not rely on any particular robot or any specific sensors; rather it is flexible to what the modalities of the robot can support. The learning framework is divided into three processes. First, the perceptual raw data is clustered into a number of Gaussian components to learn the ‘visual concepts’. Second, frequent co-occurrence of words and visual concepts are used to learn the language grounding, and finally, the learned language grounding and visual concepts are used to induce probabilistic grammar rules to model the language structure. In this thesis, the visual concepts refer to: (i) people’s faces and the appearance of their garments; (ii) objects and their perceptual properties; (iii) pairwise spatial relations; (iv) the robot actions; and (v) human activities. The visual concepts are learned by first processing the raw visual data to find people and objects in the scene using state-of-the-art techniques in human pose estimation, object segmentation and tracking, and activity analysis. Once found, the concepts are learned incrementally using a combination of techniques: Incremental Gaussian Mixture Models and a Bayesian Information Criterion to learn simple visual concepts such as object colours and shapes; spatio-temporal graphs and topic models to learn more complex visual concepts, such as human activities and robot actions. Language grounding is enabled by seeking frequent co-occurrence between words and learned visual concepts. Finding the correct language grounding is formulated as an integer programming problem to find the best many-to-many matches between words and concepts. Grammar induction refers to the process of learning a formal grammar (usually as a collection of re-write rules or productions) from a set of observations. In this thesis, Probabilistic Context Free Grammar rules are generated to model the language by mapping natural language sentences to learned visual concepts, as opposed to traditional supervised grammar induction techniques where the learning is only made possible by using manually annotated training examples on large datasets. The learning framework attains its cognitive plausibility from a number of sources. First, the learning is achieved by providing the robot with pairs of raw linguistic and visual inputs in a “show-and-tell” procedure akin to how human children learn about their environment. Second, no prior knowledge is assumed about the meaning of words or the structure of the language, except that there are different classes of words (corresponding to observable actions, spatial relations, and objects and their observable properties). Third, the knowledge in both language and vision is obtained in an incremental manner where the gained knowledge can evolve to adapt to new observations without the need to revisit previously seen ones (previous observations). Fourth, the robot learns about the visual world first, then it learns about how it maps to language, which aligns with the findings of cognitive studies on language acquisition in human infants that suggest children come to develop considerable cognitive understanding about their environment in the pre-linguistic period of their lives. It should be noted that this work does not claim to be modelling how humans learn about objects in their environments, but rather it is inspired by it. For validation, four different datasets are used which contain temporally aligned video clips of people or robots performing activities, and sentences describing these video clips. The video clips are collected using four robotic platforms, three robot arms in simple block-world scenarios and a mobile robot deployed in a challenging real-world office environment observing different people performing complex activities. The linguistic descriptions for these datasets are obtained using Amazon Mechanical Turk and volunteers. The analysis performed on these datasets suggest that the learning framework is suitable to learn from complex real-world scenarios. The experimental results show that the learning framework enables (i) acquiring correct visual concepts from visual data; (ii) learning the word grounding for each of the extracted visual concepts; (iii) inducing correct grammar rules to model the language structure; (iv) using the gained knowledge to understand previously unseen linguistic commands; and (v) using the gained knowledge to generate well-formed natural language descriptions of novel scenes

    Inquiry: The University of Arkansas Undergraduate Research Journal - Volume 7 - Fall 2006

    Get PDF

    Language Learning in Interactive Environments

    Get PDF
    Natural language communication has long been considered a defining characteristic of human intelligence. I am motivated by the question of how learning agents can understand and generate contextually relevant natural language in service of achieving a goal. In pursuit of this objective, I have been studying Interactive Narratives, or text-adventures: simulations in which an agent interacts with the world purely through natural language—"seeing” and “acting upon” the world using textual descriptions and commands. These games are usually structured as puzzles or quests in which a player must complete a sequence of actions to succeed. My work studies two closely related aspects of Interactive Narratives: operating in these environments and creating them in addition to their intersection—each presenting its own set of unique challenges. Operating in these environments presents three challenges: (1) Knowledge representation—an agent must maintain a persistent memory of what it has learned through its experiences with a partially observable world; (2) Commonsense reasoning to endow the agent with priors on how to interact with the world around it; and (3) Scaling to effectively explore sparse-reward, combinatorially-sized natural language state-action spaces. On the other hand, creating these environments can be split into two complementary considerations: (1) World generation, or the problem of creating a world that defines the limits of the actions an agent can perform; and (2) Quest generation, i.e. defining actionable objectives grounded in a given world. I will present my work thus far—showcasing how structured, interpretable data representations in the form of knowledge graphs aid in each of these tasks—in addition to proposing how exactly these two aspects of Interactive Narratives can be combined to improve language learning and generalization across this board of challenges.Ph.D

    Approaching algorithmic power

    Get PDF
    Contemporary power manifests in the algorithmic. Emerging quite recently as an object of study within media and communications, cultural research, gender and race studies, and urban geography, the algorithm often seems ungraspable. Framed as code, it becomes proprietary property, black-boxed and inaccessible. Framed as a totality, its becomes overwhelmingly complex, incomprehensible in its operations. Framed as a procedure, it becomes a technique to be optimised, bracketing out the political. In struggling to adequately grasp the algorithmic as an object of study, to unravel its mechanisms and materialities, these framings offer limited insight into how algorithmic power is initiated and maintained. This thesis instead argues for an alternative approach: firstly, that the algorithmic is coordinated by a coherent internal logic, a knowledge-structure that understands the world in particular ways; second, that the algorithmic is enacted through control, a material and therefore observable performance which purposively influences people and things towards a predetermined outcome; and third, that this complex totality of architectures and operations can be productively analysed as strategic sociotechnical clusters of machines. This method of inquiry is developed with and tested against four contemporary examples: Uber, Airbnb, Amazon Alexa, and Palantir Gotham. Highly profitable, widely adopted and globally operational, they exemplify the algorithmic shift from whiteboard to world. But if the world is productive, it is also precarious, consisting of frictional spaces and antagonistic subjects. Force cannot be assumed as unilinear, but is incessantly negotiated—operations of parsing data and processing tasks forming broader operations that strive to establish subjectivities and shape relations. These negotiations can fail, destabilised by inadequate logics and weak control. A more generic understanding of logic and control enables a historiography of the algorithmic. The ability to index information, to structure the flow of labor, to exert force over subjects and spaces— these did not emerge with the microchip and the mainframe, but are part of a longer lineage of calculation. Two moments from this lineage are examined: house-numbering in the Habsburg Empire and punch-card machines in the Third Reich. Rather than revolutionary, this genealogy suggests an evolutionary process, albeit uneven, linking the computation of past and present. The thesis makes a methodological contribution to the nascent field of algorithmic studies. But more importantly, it renders algorithmic power more intelligible as a material force. Structured and implemented in particular ways, the design of logic and control construct different versions, or modalities, of algorithmic power. This power is political, it calibrates subjectivities towards certain ends, it prioritises space in specific ways, and it privileges particular practices whilst suppressing others. In apprehending operational logics, the practice of method thus foregrounds the sociopolitical dimensions of algorithmic power. As the algorithmic increasingly infiltrates into and governs the everyday, the ability to understand, critique, and intervene in this new field of power becomes more urgent

    Towards collaborative dialogue in Minecraft

    Get PDF
    This dissertation describes our work in building interactive agents that can communicate with humans to collaboratively solve tasks in grounded scenarios. To investigate the challenges of building such agents, we define a novel instantiation of a situated, Minecraft-based, Collaborative Building Task in which one player (A, the Architect) is shown a target structure, denoted Target, and needs to instruct the other player (B, the Builder) to build a copy of this structure, denoted Built, in a predefined build region. While both players can interact asynchronously via a chat interface, we define the roles to be asymmetric: A can observe B and Target, but is invisible and cannot place blocks; meanwhile, B can freely place and remove blocks, but has no explicit knowledge of the target structure. Each agent requires a different set of abilities in order to be successful at this task: specifically, A's main challenges arise in the task of generating situated instructions by comparing Built and Target, while B's responsibilities focus mainly on comprehending A's situated instructions using both dialogue and world context. Both agents must be able to interact asynchronously in an evolving dialogue context and a dynamic world state within which they are embodied. In this work, we specifically examine how well end-to-end neural models can learn to be instruction givers (i.e., Architects) from a limited amount of real human-human data. In order to examine how humans complete the Collaborative Building Task, as well as use human-human data as a gold standard for training and evaluating models, we present the Minecraft Dialogue Corpus, a collection of 509 conversations and game logs. We then introduce baseline models for the challenging subtask of Architect utterance generation, and evaluate them offline, using both automated metrics and human evaluation. We show that while conditioning our model on a simple representation of the world gives our model improved ability to generate correct instructions, there are still many obvious shortcomings, and it is difficult for these models to learn the large variety of abilities needed to be successful Architects in an entirely end-to-end manner. To combat this, we show that including meaningful, structured inputs about the world and discourse state as additional inputs -- specifically, by adding oracle information about the Builder's next actions, as well as enriching our linguistic representation with Architect dialogue acts -- improves the performance of our utterance generation models. We also augment the data with shape information by pretraining 3D shape localization models on synthetically generated block configurations. Finally, we integrate Architect utterance generation models into actual Minecraft agents and evaluate them in a fully interactive setting

    Management and Visualisation of Non-linear History of Polygonal 3D Models

    Get PDF
    The research presented in this thesis concerns the problems of maintenance and revision control of large-scale three dimensional (3D) models over the Internet. As the models grow in size and the authoring tools grow in complexity, standard approaches to collaborative asset development become impractical. The prevalent paradigm of sharing files on a file system poses serious risks with regards, but not limited to, ensuring consistency and concurrency of multi-user 3D editing. Although modifications might be tracked manually using naming conventions or automatically in a version control system (VCS), understanding the provenance of a large 3D dataset is hard due to revision metadata not being associated with the underlying scene structures. Some tools and protocols enable seamless synchronisation of file and directory changes in remote locations. However, the existing web-based technologies are not yet fully exploiting the modern design patters for access to and management of alternative shared resources online. Therefore, four distinct but highly interconnected conceptual tools are explored. The first is the organisation of 3D assets within recent document-oriented No Structured Query Language (NoSQL) databases. These "schemaless" databases, unlike their relational counterparts, do not represent data in rigid table structures. Instead, they rely on polymorphic documents composed of key-value pairs that are much better suited to the diverse nature of 3D assets. Hence, a domain-specific non-linear revision control system 3D Repo is built around a NoSQL database to enable asynchronous editing similar to traditional VCSs. The second concept is that of visual 3D differencing and merging. The accompanying 3D Diff tool supports interactive conflict resolution at the level of scene graph nodes that are de facto the delta changes stored in the repository. The third is the utilisation of HyperText Transfer Protocol (HTTP) for the purposes of 3D data management. The XML3DRepo daemon application exposes the contents of the repository and the version control logic in a Representational State Transfer (REST) style of architecture. At the same time, it manifests the effects of various 3D encoding strategies on the file sizes and download times in modern web browsers. The fourth and final concept is the reverse-engineering of an editing history. Even if the models are being version controlled, the extracted provenance is limited to additions, deletions and modifications. The 3D Timeline tool, therefore, implies a plausible history of common modelling operations such as duplications, transformations, etc. Given a collection of 3D models, it estimates a part-based correspondence and visualises it in a temporal flow. The prototype tools developed as part of the research were evaluated in pilot user studies that suggest they are usable by the end users and well suited to their respective tasks. Together, the results constitute a novel framework that demonstrates the feasibility of a domain-specific 3D version control

    Grounding linguistic analysis in control applications

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis. Vita.Includes bibliographical references (p. 175-182).This thesis addresses the problem of grounding linguistic analysis in control applications, such as automated maintenance of computers and game playing. We assume access to natural language documents that describe the desired behavior of a control algorithm, either via explicit step-by-step instructions, via high-level strategy advice, or by specifying the dynamics of the control domain. Our goal is to develop techniques for automatically interpreting such documents, and leveraging the textual information to effectively guide control actions. We show that in this setting, langauge analysis can be learnt effectively via feedback signals inherent to the control application, obviating the need for manual annotations. Moreover we demonstrate how information automatically acquired from text can be used to improve the performance of the target control application. We apply our ideas to three applications of increasing linguistic and control complexity - interpreting step-by-step instructions into commands in a graphical user interface; interpreting high-level strategic advice to play a complex strategy game; and leveraging text descriptions of world dynamics to guide high-level planning. In all cases, our methods produce text analyses that agree with human notions of correctness, while yielding significant improvements over strong text-unaware methods in the target control application.by Satchuthananthavale Rasiah Kuhan Branavan.Ph.D
    • 

    corecore