2,026 research outputs found

    The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey

    Full text link
    Graph processing is becoming increasingly prevalent across many application domains. In spite of this prevalence, there is little research about how graphs are actually used in practice. We performed an extensive study that consisted of an online survey of 89 users, a review of the mailing lists, source repositories, and whitepapers of a large suite of graph software products, and in-person interviews with 6 users and 2 developers of these products. Our online survey aimed at understanding: (i) the types of graphs users have; (ii) the graph computations users run; (iii) the types of graph software users use; and (iv) the major challenges users face when processing their graphs. We describe the participants' responses to our questions highlighting common patterns and challenges. Based on our interviews and survey of the rest of our sources, we were able to answer some new questions that were raised by participants' responses to our online survey and understand the specific applications that use graph data and software. Our study revealed surprising facts about graph processing in practice. In particular, real-world graphs represent a very diverse range of entities and are often very large, scalability and visualization are undeniably the most pressing challenges faced by participants, and data integration, recommendations, and fraud detection are very popular applications supported by existing graph software. We hope these findings can guide future research

    Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks

    Full text link
    Neural networks (NNs) have become the state of the art in many machine learning applications, especially in image and sound processing [1]. The same, although to a lesser extent [2,3], could be said in natural language processing (NLP) tasks, such as named entity recognition. However, the success of NNs remains dependent on the availability of large labelled datasets, which is a significant hurdle in many important applications. One such case are electronic health records (EHRs), which are arguably the largest source of medical data, most of which lies hidden in natural text [4,5]. Data access is difficult due to data privacy concerns, and therefore annotated datasets are scarce. With scarce data, NNs will likely not be able to extract this hidden information with practical accuracy. In our study, we develop an approach that solves these problems for named entity recognition, obtaining 94.6 F1 score in I2B2 2009 Medical Extraction Challenge [6], 4.3 above the architecture that won the competition. Beyond the official I2B2 challenge, we further achieve 82.4 F1 on extracting relationships between medical terms. To reach this state-of-the-art accuracy, our approach applies transfer learning to leverage on datasets annotated for other I2B2 tasks, and designs and trains embeddings that specially benefit from such transfer.Comment: 11 pages, 4 figures, 8 table

    Visualizing and Understanding Sum-Product Networks

    Full text link
    Sum-Product Networks (SPNs) are recently introduced deep tractable probabilistic models by which several kinds of inference queries can be answered exactly and in a tractable time. Up to now, they have been largely used as black box density estimators, assessed only by comparing their likelihood scores only. In this paper we explore and exploit the inner representations learned by SPNs. We do this with a threefold aim: first we want to get a better understanding of the inner workings of SPNs; secondly, we seek additional ways to evaluate one SPN model and compare it against other probabilistic models, providing diagnostic tools to practitioners; lastly, we want to empirically evaluate how good and meaningful the extracted representations are, as in a classic Representation Learning framework. In order to do so we revise their interpretation as deep neural networks and we propose to exploit several visualization techniques on their node activations and network outputs under different types of inference queries. To investigate these models as feature extractors, we plug some SPNs, learned in a greedy unsupervised fashion on image datasets, in supervised classification learning tasks. We extract several embedding types from node activations by filtering nodes by their type, by their associated feature abstraction level and by their scope. In a thorough empirical comparison we prove them to be competitive against those generated from popular feature extractors as Restricted Boltzmann Machines. Finally, we investigate embeddings generated from random probabilistic marginal queries as means to compare other tractable probabilistic models on a common ground, extending our experiments to Mixtures of Trees.Comment: Machine Learning Journal paper (First Online), 24 page

    Manifold learning for coherent design interpolation based on geometrical and topological descriptors

    Get PDF
    [EN] In the context of intellectual property in the manufacturing industry, know-how is referred to practical knowledge on how to accomplish a specific task. This know-how is often difficult to be synthesised in a set of rules or steps as it remains in the intuition and expertise of engineers, designers, and other professionals. Today, a new research line in this concern spot-up thanks to the explosion of Artificial Intelligence and Machine Learning algorithms and its alliance with Computational Mechanics and Optimisation tools. However, a key aspect with industrial design is the scarcity of available data, making it problematic to rely on deep-learning approaches. Assuming that the existing designs live in a manifold, in this paper, we propose a synergistic use of existing Machine Learning tools to infer a reduced manifold from the existing limited set of designs and, then, to use it to interpolate between the individuals, working as a generator basis, to create new and coherent designs. For this, a key aspect is to be able to properly interpolate in the reduced manifold, which requires a proper clustering of the individuals. From our experience, due to the scarcity of data, adding topological descriptors to geometrical ones considerably improves the quality of the clustering. Thus, a distance, mixing topology and geometry is proposed. This distance is used both, for the clustering and for the interpolation. For the interpolation, relying on optimal transport appear to be mandatory. Examples of growing complexity are proposed to illustrate the goodness of the method.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).The authors gratefully acknowledge the financial support of Ministerio de Educacion, Spain (FPU16/07121),Generalitat Valenciana, Spain (Prometeo/2021/046 and CIAICO/2021/226), Ministerio de Economia, Industriay Competitividad, Spain (DPI2017-89816-R) and FEDER. O. Allix would like to thank the French National University Council and ENS Paris-Saclay for supporting his sabbatical at UPV, which made it possible to closely interact with the colleagues from I2MB-UPV. Funding for open access charge: CRUE-Universitat Politecnica de ValenciaMuñoz-Pellicer, D.; Allix, O.; Chinesta Soria, FJ.; Ródenas, JJ.; Nadal, E. (2023). Manifold learning for coherent design interpolation based on geometrical and topological descriptors. Computer Methods in Applied Mechanics and Engineering. 405. https://doi.org/10.1016/j.cma.2022.11585940

    Advances in Learning and Understanding with Graphs through Machine Learning

    Get PDF
    Graphs have increasingly become a crucial way of representing large, complex and disparate datasets from a range of domains, including many scientific disciplines. Graphs are particularly useful at capturing complex relationships or interdependencies within or even between datasets, and enable unique insights which are not possible with other data formats. Over recent years, significant improvements in the ability of machine learning approaches to automatically learn from and identify patterns in datasets have been made. However due to the unique nature of graphs, and the data they are used to represent, employing machine learning with graphs has thus far proved challenging. A review of relevant literature has revealed that key challenges include issues arising with macro-scale graph learning, interpretability of machine learned representations and a failure to incorporate the temporal dimension present in many datasets. Thus, the work and contributions presented in this thesis primarily investigate how modern machine learning techniques can be adapted to tackle key graph mining tasks, with a particular focus on optimal macro-level representation, interpretability and incorporating temporal dynamics into the learning process. The majority of methods employed are novel approaches centered around attempting to use artificial neural networks in order to learn from graph datasets. Firstly, by devising a novel graph fingerprint technique, it is demonstrated that this can successfully be applied to two different tasks whilst out-performing established baselines, namely graph comparison and classification. Secondly, it is shown that a mapping can be found between certain topological features and graph embeddings. This, for perhaps the the first time, suggests that it is possible that machines are learning something analogous to human knowledge acquisition, thus bringing interpretability to the graph embedding process. Thirdly, in exploring two new models for incorporating temporal information into the graph learning process, it is found that including such information is crucial to predictive performance in certain key tasks, such as link prediction, where state-of-the-art baselines are out-performed. The overall contribution of this work is to provide greater insight into and explanation of the ways in which machine learning with respect to graphs is emerging as a crucial set of techniques for understanding complex datasets. This is important as these techniques can potentially be applied to a broad range of scientific disciplines. The thesis concludes with an assessment of limitations and recommendations for future research
    • …
    corecore