11 research outputs found

    End-to-End Feature-Aware Label Space Encoding for Multilabel Classification With Many Classes

    Get PDF
    To make the problem of multilabel classification with many classes more tractable, in recent years, academia has seen efforts devoted to performing label space dimension reduction (LSDR). Specifically, LSDR encodes high-dimensional label vectors into low-dimensional code vectors lying in a latent space, so as to train predictive models at much lower costs. With respect to the prediction, it performs classification for any unseen instance by recovering a label vector from its predicted code vector via a decoding process. In this paper, we propose a novel method, namely End-to-End Feature-aware label space Encoding (E²FE), to perform LSDR. Instead of requiring an encoding function like most previous works, E²FE directly learns a code matrix formed by code vectors of the training instances in an end-to-end manner. Another distinct property of E²FE is its feature awareness attributable to the fact that the code matrix is learned by jointly maximizing the recoverability of the label space and the predictability of the latent space. Based on the learned code matrix, E²FE further trains predictive models to map instance features into code vectors, and also learns a linear decoding matrix for efficiently recovering the label vector of any unseen instance from its predicted code vector. Theoretical analyses show that both the code matrix and the linear decoding matrix in E²FE can be efficiently learned. Moreover, similar to previous works, E²FE can be specified to learn an encoding function. And it can also be extended with kernel tricks to handle nonlinear correlations between the feature space and the latent space. Comprehensive experiments conducted on diverse benchmark data sets with many classes show consistent performance gains of E²FE over the state-of-the-art methods

    Hybrid ASP-based Approach to Pattern Mining

    Full text link
    Detecting small sets of relevant patterns from a given dataset is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like Answer Set Programming (ASP) seem well-suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods either focus on scalability or on generality. In this paper we make steps towards combining local (frequency, size, cost) and global (various condensed representations like maximal, closed, skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework we apply it to a problem of approximately tiling a database. Experiments on real-world datasets show the effectiveness of the proposed method and computational gains for itemset, sequence and graph mining, as well as approximate tiling. Under consideration in Theory and Practice of Logic Programming (TPLP).Comment: 29 pages, 7 figures, 5 table

    Integrating Distributional, Compositional, and Relational Approaches to Neural Word Representations

    Get PDF
    When the field of natural language processing (NLP) entered the era of deep neural networks, the task of representing basic units of language, an inherently sparse and symbolic medium, using low-dimensional dense real-valued vectors, or embeddings, became crucial. The dominant technique to perform this task has for years been to segment input text sequences into space-delimited words, for which embeddings are trained over a large corpus by means of leveraging distributional information: a word is reducible to the set of contexts it appears in. This approach is powerful but imperfect; words not seen during the embedding learning phase, known as out-of-vocabulary words (OOVs), emerge in any plausible application where embeddings are used. One approach applied in order to combat this and other shortcomings is the incorporation of compositional information obtained from the surface form of words, enabling the representation of morphological regularities and increasing robustness to typographical errors. Another approach leverages word-sense information and relations curated in large semantic graph resources, offering a supervised signal for embedding space structure and improving representations for domain-specific rare words. In this dissertation, I offer several analyses and remedies for the OOV problem based on the utilization of character-level compositional information in multiple languages and the structure of semantic knowledge in English. In addition, I provide two novel datasets for the continued exploration of vocabulary expansion in English: one with a taxonomic emphasis on novel word formation, and the other generated by a real-world data-driven use case in the entity graph domain. Finally, recognizing the recent shift in NLP towards contextualized representations of subword tokens, I describe the form in which the OOV problem still appears in these methods, and apply an integrative compositional model to address it.Ph.D

    Acta Scientiarum Mathematicarum : Tomus 52. Fasc. 3-4.

    Get PDF

    Habitat and diet selection by the African elephant at the landscape level : a functional integration of multi-scale foraging process

    Get PDF
    Abstract Understanding the role of elephant in the structure and functioning of African savanna ecosystems requires a mechanistic understanding of their habitat and diet selection at the landscape scale. To this end a functional approach based on optimal foraging theory was devised. Given a short foraging time per unit nutrient, a low level of cell wall digestion, and a limited ability to recycle microbial protein, it was hypothesised that elephants take advantage of a high passage rate of ingesta and process a large amount of food with cell solubles rich in energy and nutrients per unit time to meet their nutritional demands. Accordingly, elephants were predicted to select habitats and diets that maximize their rate of intake of digestible energy and nutrients relative to what is available in the landscape. However, because safety, distance from surface water and shade also potentially influence foraging decisions by elephants, habitat and diet selection were predicted to be the result of a trade-off between the rate of intake of energy and nutrients and these non-dietary factors. To test this prediction, a mechanistic ingestion model that was developed specifically for elephants was used to estimate the spatio-temporal pattern of the rate of protein intake achieved by elephants inhabiting a medium-sized reserve in a semi-arid savanna environment. The rate of protein intake was assumed to be a proxy measure of the rate of intake of digestible energy and nutrients. The response of elephants to the estimated pattern of intake was as per prediction, with both habitats and food types being selected in accordance with the rate maximizing premise. The mechanistic approach to foraging used in the study provided possible functional explanations for several well known characteristics of the feeding behaviour of elephants. Differences in diet selection between bulls and cows were potentially explained in terms of sex related differences in the rate of protein intake across food types that were largely due to adult bulls harvesting heavier trunkloads than members of family units. Sexual segregation in habitat selection was potentially explained in terms of (1) sex related differences in the rate of protein intake across food types, and (2) the tendency of the short-term rate of protein intake to be a more important explanatory variable than cost distance from water for the spatial distribution of family units, with the converse being true for the spatial distribution of bulls. Seasonal change in the diet of elephants was well explained by temporal variation in the rate of protein intake across food types. Distance from water was shown to have a strong influence on the distribution of elephants even in a medium-sized reserve that was relatively well supplied with surface water. The influence of surface water differed between the sexes and was strongly dependent on the spatio-temporal pattern of the rate of protein intake. The study showed elephants to be primarily grazers, only switching to browse from woody plants when herbaceous forage of adequate quantity and quality is unavailable. This finding was used to construct an alternative hypothesis for the “elephant problem” that explains elephant-induced woodland loss in terms of (1) a man-induced shift in the diet of elephants from a historic diet of grass to a modern diet primarily composed of browse, and (2) a break down of the natural controls of elephant populations. Implications for the management of elephant-vegetation systems are discussed, with proposed foci for management being the restoration of the historic diet and distribution of elephants by altering the boundaries of protected areas to incorporate key grass resources and restricting surface water to historic sites

    2008-2009 Louisiana Tech University Catalog

    Get PDF
    The Louisiana Tech University Catalog includes announcements and course descriptions for courses offered at Louisiana Tech University for the academic year of 2008-2009.https://digitalcommons.latech.edu/university-catalogs/1006/thumbnail.jp

    BMaD : a boolean matrix decomposition framework

    No full text
    Boolean matrix decomposition is a method to obtain a compressed representation of a matrix with Boolean entries. We present a modular framework that unifies several Boolean matrix decomposition algorithms, and provide methods to evaluate their performance. The main advantages of the framework are its modular approach and hence the flexible combination of the steps of a Boolean matrix decomposition and the capability of handling missing values. The framework is licensed under the GPLv3 and can be downloaded freely at http://projects.informatik.uni-mainz.de/bma
    corecore