674 research outputs found

    MORSE: Semantic-ally Drive-n MORpheme SEgment-er

    Full text link
    We present in this paper a novel framework for morpheme segmentation which uses the morpho-syntactic regularities preserved by word representations, in addition to orthographic features, to segment words into morphemes. This framework is the first to consider vocabulary-wide syntactico-semantic information for this task. We also analyze the deficiencies of available benchmarking datasets and introduce our own dataset that was created on the basis of compositionality. We validate our algorithm across datasets and present state-of-the-art results

    The Minimum Description Length Principle for Pattern Mining: A Survey

    Full text link
    This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems

    Mining time-series data using discriminative subsequences

    Get PDF
    Time-series data is abundant, and must be analysed to extract usable knowledge. Local-shape-based methods offer improved performance for many problems, and a comprehensible method of understanding both data and models. For time-series classification, we transform the data into a local-shape space using a shapelet transform. A shapelet is a time-series subsequence that is discriminative of the class of the original series. We use a heterogeneous ensemble classifier on the transformed data. The accuracy of our method is significantly better than the time-series classification benchmark (1-nearest-neighbour with dynamic time-warping distance), and significantly better than the previous best shapelet-based classifiers. We use two methods to increase interpretability: First, we cluster the shapelets using a novel, parameterless clustering method based on Minimum Description Length, reducing dimensionality and removing duplicate shapelets. Second, we transform the shapelet data into binary data reflecting the presence or absence of particular shapelets, a representation that is straightforward to interpret and understand. We supplement the ensemble classifier with partial classifocation. We generate rule sets on the binary-shapelet data, improving performance on certain classes, and revealing the relationship between the shapelets and the class label. To aid interpretability, we use a novel algorithm, BruteSuppression, that can substantially reduce the size of a rule set without negatively affecting performance, leading to a more compact, comprehensible model. Finally, we propose three novel algorithms for unsupervised mining of approximately repeated patterns in time-series data, testing their performance in terms of speed and accuracy on synthetic data, and on a real-world electricity-consumption device-disambiguation problem. We show that individual devices can be found automatically and in an unsupervised manner using a local-shape-based approach

    Parsing Motion and Composing Behavior for Semi-Autonomous Manipulation

    Get PDF
    Robots are becoming an ever bigger part of our day to day life. They take up simple tasks in households, like vacuum cleaning and lawn mowing. They ensure a steady and reliable process at many work places in large scale manufacturing, like the automotive and electronics industry. Furthermore, robots are becoming more and more socially accepted, for instance as autonomous drivers. They even start to engage in special and elderly care, aiming to fill a void created by a rapidly aging population. Additionally, the increasing complexity and capability of robotic systems allows to solve ever more complicated tasks in increasingly difficult scenarios and environments. Soon, encountering and interacting with robots will be considered as natural as interacting with other humans. However, when it comes to defining and understanding the behavior of robots, experts are still necessary. Robots usually follow predefined routines which are programmed and tuned by people with years of experience. Unintended behavior is traced back to a certain part of the source code which can be modified using a specific programming language. Most of the people that will interact with robotic servants or coworkers in the future, will not have the necessary skill set to instruct robots in such detail. This need for an expert represents a significant bottleneck to the deployment of robots as our everyday companion in households and at work. This thesis presents several novel approaches aiming at facilitating the interaction between non-expert humans and robots in terms of intuitive instruction and simple understanding of the robot capabilities with respect to a given task. Chapter 3 introduces a novel method that segments unlabeled demonstrations into sequence of movement primitives while simultaneously learning a movement primitive library. This method allows the non-expert to teach an entire task rather than every single primitive. Movement primitives represent a simple, atomic and commonly parameterized motion. The presented method segments each demonstration by identifying similar patterns across all demonstrations and treating them as samples drawn from a learned probabilistic representation of a movement primitive. The method is formulated as an expectation-maximization approach and was evaluated in several tasks,including a chair assembly and segmenting table tennis demonstrations. In Chapter 4 the previously segmented demonstrations and the learned primitive library are used to induce a formal grammar for movements. Formal grammars are a well established concept in formal language theory and have been applied in several fields, reaching from linguistics, over compiler architecture to robotics. The simplest class of grammars, regular grammars, correspond in their probabilistic form to Hidden Markov Models. However, the intuitive, hierarchical representation of transitions as a set of rules makes it easier for non-experts to comprehend the possible behaviors the grammar implies. A sequence of movements can now be considered a sentence produced by the learned grammar. The production of each sentence can be illustrated by a tree structure, allowing an easy understanding of the involved rules. Probabilistic context-free grammars are a superset of regular grammars and, hence, are more expressive and exceed the capabilities of Hidden Markov Models. While the induction of probabilistic context-free grammars is considered a difficult, unsolved problem for natural languages, the observed sequences of movement primitives show much simpler structures, making the induction more feasible. The method was successfully evaluated on several tasks, such as a pick-and-place task in a tic-tac-toe setting or a handover task in a collaborative tool box assembly. Chapter 5 introduces the concept of reinforcement learning into the domain of formal grammars. Given an objective, we apply a natural policy gradient approach in order to learn the grammar parameters that produces sequences of primitives that solve that objective. This allows the autonomous improvement of robot behavior. For instance, a cleaning up task can be optimized for efficiency while avoiding self collisions. The parameters of the grammar are the probabilities of each production. Therefore, probability constraints have to be maintained while learning the parameters. The applied natural policy gradient method ensures reasonably small parameter updates, such that the grammar probabilities change gradually. We derive the natural policy gradient method for formal grammars and evaluate the method on several tasks. Together, the individual contributions presented in this thesis form an imitation learning pipeline that facilitates the instruction, interaction and collaboration with robots. Starting from unlabeled demonstrations, an underlying movement primitive library is learned while simultaneously segmenting the given demonstrations into sequences of primitives. These sequences are than used to induce a formal grammar. The structure of the grammar and the produced parse trees form a comprehensible representation of the robot capabilities with respect to the demonstrated task. Finally, a reinforcement learning approach allows the autonomous optimization of the grammar given an objective

    Continuous Modeling of 3D Building Rooftops From Airborne LIDAR and Imagery

    Get PDF
    In recent years, a number of mega-cities have provided 3D photorealistic virtual models to support the decisions making process for maintaining the cities' infrastructure and environment more effectively. 3D virtual city models are static snap-shots of the environment and represent the status quo at the time of their data acquisition. However, cities are dynamic system that continuously change over time. Accordingly, their virtual representation need to be regularly updated in a timely manner to allow for accurate analysis and simulated results that decisions are based upon. The concept of "continuous city modeling" is to progressively reconstruct city models by accommodating their changes recognized in spatio-temporal domain, while preserving unchanged structures. However, developing a universal intelligent machine enabling continuous modeling still remains a challenging task. Therefore, this thesis proposes a novel research framework for continuously reconstructing 3D building rooftops using multi-sensor data. For achieving this goal, we first proposes a 3D building rooftop modeling method using airborne LiDAR data. The main focus is on the implementation of an implicit regularization method which impose a data-driven building regularity to noisy boundaries of roof planes for reconstructing 3D building rooftop models. The implicit regularization process is implemented in the framework of Minimum Description Length (MDL) combined with Hypothesize and Test (HAT). Secondly, we propose a context-based geometric hashing method to align newly acquired image data with existing building models. The novelty is the use of context features to achieve robust and accurate matching results. Thirdly, the existing building models are refined by newly proposed sequential fusion method. The main advantage of the proposed method is its ability to progressively refine modeling errors frequently observed in LiDAR-driven building models. The refinement process is conducted in the framework of MDL combined with HAT. Markov Chain Monte Carlo (MDMC) coupled with Simulated Annealing (SA) is employed to perform a global optimization. The results demonstrates that the proposed continuous rooftop modeling methods show a promising aspects to support various critical decisions by not only reconstructing 3D rooftop models accurately, but also by updating the models using multi-sensor data
    • …
    corecore