2,026 research outputs found
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey
Graph processing is becoming increasingly prevalent across many application
domains. In spite of this prevalence, there is little research about how graphs
are actually used in practice. We performed an extensive study that consisted
of an online survey of 89 users, a review of the mailing lists, source
repositories, and whitepapers of a large suite of graph software products, and
in-person interviews with 6 users and 2 developers of these products. Our
online survey aimed at understanding: (i) the types of graphs users have; (ii)
the graph computations users run; (iii) the types of graph software users use;
and (iv) the major challenges users face when processing their graphs. We
describe the participants' responses to our questions highlighting common
patterns and challenges. Based on our interviews and survey of the rest of our
sources, we were able to answer some new questions that were raised by
participants' responses to our online survey and understand the specific
applications that use graph data and software. Our study revealed surprising
facts about graph processing in practice. In particular, real-world graphs
represent a very diverse range of entities and are often very large,
scalability and visualization are undeniably the most pressing challenges faced
by participants, and data integration, recommendations, and fraud detection are
very popular applications supported by existing graph software. We hope these
findings can guide future research
Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks
Neural networks (NNs) have become the state of the art in many machine
learning applications, especially in image and sound processing [1]. The same,
although to a lesser extent [2,3], could be said in natural language processing
(NLP) tasks, such as named entity recognition. However, the success of NNs
remains dependent on the availability of large labelled datasets, which is a
significant hurdle in many important applications. One such case are electronic
health records (EHRs), which are arguably the largest source of medical data,
most of which lies hidden in natural text [4,5]. Data access is difficult due
to data privacy concerns, and therefore annotated datasets are scarce. With
scarce data, NNs will likely not be able to extract this hidden information
with practical accuracy. In our study, we develop an approach that solves these
problems for named entity recognition, obtaining 94.6 F1 score in I2B2 2009
Medical Extraction Challenge [6], 4.3 above the architecture that won the
competition. Beyond the official I2B2 challenge, we further achieve 82.4 F1 on
extracting relationships between medical terms. To reach this state-of-the-art
accuracy, our approach applies transfer learning to leverage on datasets
annotated for other I2B2 tasks, and designs and trains embeddings that
specially benefit from such transfer.Comment: 11 pages, 4 figures, 8 table
Visualizing and Understanding Sum-Product Networks
Sum-Product Networks (SPNs) are recently introduced deep tractable
probabilistic models by which several kinds of inference queries can be
answered exactly and in a tractable time. Up to now, they have been largely
used as black box density estimators, assessed only by comparing their
likelihood scores only. In this paper we explore and exploit the inner
representations learned by SPNs. We do this with a threefold aim: first we want
to get a better understanding of the inner workings of SPNs; secondly, we seek
additional ways to evaluate one SPN model and compare it against other
probabilistic models, providing diagnostic tools to practitioners; lastly, we
want to empirically evaluate how good and meaningful the extracted
representations are, as in a classic Representation Learning framework. In
order to do so we revise their interpretation as deep neural networks and we
propose to exploit several visualization techniques on their node activations
and network outputs under different types of inference queries. To investigate
these models as feature extractors, we plug some SPNs, learned in a greedy
unsupervised fashion on image datasets, in supervised classification learning
tasks. We extract several embedding types from node activations by filtering
nodes by their type, by their associated feature abstraction level and by their
scope. In a thorough empirical comparison we prove them to be competitive
against those generated from popular feature extractors as Restricted Boltzmann
Machines. Finally, we investigate embeddings generated from random
probabilistic marginal queries as means to compare other tractable
probabilistic models on a common ground, extending our experiments to Mixtures
of Trees.Comment: Machine Learning Journal paper (First Online), 24 page
Manifold learning for coherent design interpolation based on geometrical and topological descriptors
[EN] In the context of intellectual property in the manufacturing industry, know-how is referred to practical knowledge on how to accomplish a specific task. This know-how is often difficult to be synthesised in a set of rules or steps as it remains in the intuition and expertise of engineers, designers, and other professionals. Today, a new research line in this concern spot-up thanks to the explosion of Artificial Intelligence and Machine Learning algorithms and its alliance with Computational Mechanics and Optimisation tools. However, a key aspect with industrial design is the scarcity of available data, making it problematic to rely on deep-learning approaches. Assuming that the existing designs live in a manifold, in this paper, we propose a synergistic use of existing Machine Learning tools to infer a reduced manifold from the existing limited set of designs and, then, to use it to interpolate between the individuals, working as a generator basis, to create new and coherent designs. For this, a key aspect is to be able to properly interpolate in the reduced manifold, which requires a proper clustering of the individuals. From our experience, due to the scarcity of data, adding topological descriptors to geometrical ones considerably improves the quality of the clustering. Thus, a distance, mixing topology and geometry is proposed. This distance is used both, for the clustering and for the interpolation. For the interpolation, relying on optimal transport appear to be mandatory. Examples of growing complexity are proposed to illustrate the goodness of the method.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).The authors gratefully acknowledge the financial support of Ministerio de Educacion, Spain (FPU16/07121),Generalitat Valenciana, Spain (Prometeo/2021/046 and CIAICO/2021/226), Ministerio de Economia, Industriay Competitividad, Spain (DPI2017-89816-R) and FEDER. O. Allix would like to thank the French National University Council and ENS Paris-Saclay for supporting his sabbatical at UPV, which made it possible to closely interact with the colleagues from I2MB-UPV. Funding for open access charge: CRUE-Universitat Politecnica de ValenciaMuñoz-Pellicer, D.; Allix, O.; Chinesta Soria, FJ.; Ródenas, JJ.; Nadal, E. (2023). Manifold learning for coherent design interpolation based on geometrical and topological descriptors. Computer Methods in Applied Mechanics and Engineering. 405. https://doi.org/10.1016/j.cma.2022.11585940
Advances in Learning and Understanding with Graphs through Machine Learning
Graphs have increasingly become a crucial way of representing large, complex and disparate datasets from a range of domains, including many scientific disciplines. Graphs are particularly useful at capturing complex relationships or interdependencies within or even between datasets, and enable unique insights which are not possible with other data formats. Over recent years, significant improvements in the ability of machine learning approaches to automatically learn from and identify patterns in datasets have been made.
However due to the unique nature of graphs, and the data they are used to represent, employing machine learning with graphs has thus far proved challenging. A review of relevant literature has revealed that key challenges include issues arising with macro-scale graph learning, interpretability of machine learned representations and a failure to incorporate the temporal dimension present in many datasets. Thus, the work and contributions presented in this thesis primarily investigate how modern machine learning techniques can be adapted to tackle key graph mining tasks, with a particular focus on optimal macro-level representation, interpretability and incorporating temporal dynamics into the learning process. The majority of methods employed are novel approaches centered around attempting to use artificial neural networks in order to learn from graph datasets.
Firstly, by devising a novel graph fingerprint technique, it is demonstrated that this can successfully be applied to two different tasks whilst out-performing established baselines, namely graph comparison and classification. Secondly, it is shown that a mapping can be found between certain topological features and graph embeddings. This, for perhaps the the first time, suggests that it is possible that machines are learning something analogous to human knowledge acquisition, thus bringing interpretability to the graph embedding process. Thirdly, in exploring two new models for incorporating temporal information into the graph learning process, it is found that including such information is crucial to predictive performance in certain key tasks, such as link prediction, where state-of-the-art baselines are out-performed.
The overall contribution of this work is to provide greater insight into and explanation of the ways in which machine learning with respect to graphs is emerging as a crucial set of techniques for understanding complex datasets. This is important as these techniques can potentially be applied to a broad range of scientific disciplines. The thesis concludes with an assessment of limitations and recommendations for future research
- …