1,280 research outputs found

    Extending local features with contextual information in graph kernels

    Full text link
    Graph kernels are usually defined in terms of simpler kernels over local substructures of the original graphs. Different kernels consider different types of substructures. However, in some cases they have similar predictive performances, probably because the substructures can be interpreted as approximations of the subgraphs they induce. In this paper, we propose to associate to each feature a piece of information about the context in which the feature appears in the graph. A substructure appearing in two different graphs will match only if it appears with the same context in both graphs. We propose a kernel based on this idea that considers trees as substructures, and where the contexts are features too. The kernel is inspired from the framework in [6], even if it is not part of it. We give an efficient algorithm for computing the kernel and show promising results on real-world graph classification datasets.Comment: To appear in ICONIP 201

    Approximate Minimum Diameter

    Full text link
    We study the minimum diameter problem for a set of inexact points. By inexact, we mean that the precise location of the points is not known. Instead, the location of each point is restricted to a contineus region (\impre model) or a finite set of points (\indec model). Given a set of inexact points in one of \impre or \indec models, we wish to provide a lower-bound on the diameter of the real points. In the first part of the paper, we focus on \indec model. We present an O(21Ï”d⋅ϔ−2d⋅n3)O(2^{\frac{1}{\epsilon^d}} \cdot \epsilon^{-2d} \cdot n^3 ) time approximation algorithm of factor (1+Ï”)(1+\epsilon) for finding minimum diameter of a set of points in dd dimensions. This improves the previously proposed algorithms for this problem substantially. Next, we consider the problem in \impre model. In dd-dimensional space, we propose a polynomial time d\sqrt{d}-approximation algorithm. In addition, for d=2d=2, we define the notion of α\alpha-separability and use our algorithm for \indec model to obtain (1+Ï”)(1+\epsilon)-approximation algorithm for a set of α\alpha-separable regions in time O(21Ï”2.n3Ï”10.sin⁥(α/2)3)O(2^{\frac{1}{\epsilon^2}}\allowbreak . \frac{n^3}{\epsilon^{10} .\sin(\alpha/2)^3} )

    The Early Bird Catches The Term: Combining Twitter and News Data For Event Detection and Situational Awareness

    Full text link
    Twitter updates now represent an enormous stream of information originating from a wide variety of formal and informal sources, much of which is relevant to real-world events. In this paper we adapt existing bio-surveillance algorithms to detect localised spikes in Twitter activity corresponding to real events with a high level of confidence. We then develop a methodology to automatically summarise these events, both by providing the tweets which fully describe the event and by linking to highly relevant news articles. We apply our methods to outbreaks of illness and events strongly affecting sentiment. In both case studies we are able to detect events verifiable by third party sources and produce high quality summaries

    Mining Uncertain Sequential Patterns in Iterative MapReduce

    Get PDF
    This paper proposes a sequential pattern mining (SPM) algorithm in large scale uncertain databases. Uncertain sequence databases are widely used to model inaccurate or imprecise timestamped data in many real applications, where traditional SPM algorithms are inapplicable because of data uncertainty and scalability. In this paper, we develop an efficient approach to manage data uncertainty in SPM and design an iterative MapReduce framework to execute the uncertain SPM algorithm in parallel. We conduct extensive experiments in both synthetic and real uncertain datasets. And the experimental results prove that our algorithm is efficient and scalable

    Conformative Filtering for Implicit Feedback Data

    Full text link
    Implicit feedback is the simplest form of user feedback that can be used for item recommendation. It is easy to collect and is domain independent. However, there is a lack of negative examples. Previous work tackles this problem by assuming that users are not interested or not as much interested in the unconsumed items. Those assumptions are often severely violated since non-consumption can be due to factors like unawareness or lack of resources. Therefore, non-consumption by a user does not always mean disinterest or irrelevance. In this paper, we propose a novel method called Conformative Filtering (CoF) to address the issue. The motivating observation is that if there is a large group of users who share the same taste and none of them have consumed an item before, then it is likely that the item is not of interest to the group. We perform multidimensional clustering on implicit feedback data using hierarchical latent tree analysis (HLTA) to identify user `tastes' groups and make recommendations for a user based on her memberships in the groups and on the past behavior of the groups. Experiments on two real-world datasets from different domains show that CoF has superior performance compared to several common baselines

    Towards Efficient Sequential Pattern Mining in Temporal Uncertain Databases

    Get PDF
    Uncertain sequence databases are widely used to model data with inaccurate or imprecise timestamps in many real world applications. In this paper, we use uniform distributions to model uncertain timestamps and adopt possible world semantics to interpret temporal uncertain database. We design an incremental approach to manage temporal uncertainty efficiently, which is integrated into the classic pattern-growth SPM algorithm to mine uncertain sequential patterns. Extensive experiments prove that our algorithm performs well in both efficiency and scalability

    Measuring Relations Between Concepts In Conceptual Spaces

    Full text link
    The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a high-dimensional space and concepts are represented by regions in this space. Our recent mathematical formalization of this framework is capable of representing correlations between different domains in a geometric way. In this paper, we extend our formalization by providing quantitative mathematical definitions for the notions of concept size, subsethood, implication, similarity, and betweenness. This considerably increases the representational power of our formalization by introducing measurable ways of describing relations between concepts.Comment: Accepted at SGAI 2017 (http://www.bcs-sgai.org/ai2017/). The final publication is available at Springer via https://doi.org/10.1007/978-3-319-71078-5_7. arXiv admin note: substantial text overlap with arXiv:1707.05165, arXiv:1706.0636

    On Coupling FCA and MDL in Pattern Mining

    Get PDF
    International audiencePattern Mining is a well-studied field in Data Mining and Machine Learning. The modern methods are based on dynamically updating models, among which MDL-based ones ensure high-quality pattern sets. Formal concepts also characterize patterns in a condensed form. In this paper we study MDL-based algorithm called Krimp in FCA settings and propose a modified version that benefits from FCA and relies on probabilistic assumptions that underlie MDL. We provide an experimental proof that the proposed approach improves quality of pattern sets generated by Krimp

    A Categorical Clustering of Publishers for Mobile Performance Marketing

    Get PDF
    Mobile marketing is an expanding industry due to the growth of mobile devices (e.g., tablets, smartphones). In this paper, we explore a categorical approach to cluster publishers of a mobile performance market, in which payouts are only issued when there is a conversion (e.g., a sale). As a case study, we analyze recent and real-world data from a global mobile marketing company. Several experiments were held, considering a first internal evaluation stage, using training data, clustering quality metrics and computational effort. In the second stage, the best method, COBWEB algorithm, was analyzed using an external evaluation based on business metrics, computed over test data, and that allowed an identification of interesting clusters.This article is a result of the project NORTE-01-0247-FEDER- 017497, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). This work was also supported by COMPETE: POCI-01-0145-FEDER-007043 and FCT Funda ̧ca ̃o para a Ciˆencia e Tecnologia within the Project Scope: UID/CEC/00319/2013

    Object classification for robotic platforms

    Get PDF
    Computer vision has been revolutionised in recent years by increased research in convolutional neural networks (CNNs); however, many challenges remain to be addressed in order to ensure fast and accurate image processing when applying these techniques to robotics. These challenges consist of handling extreme changes in scale, illumination, noise, and viewing angles of a moving object. The project main contribution is to provide insight on how to properly train a convolutional neural network (CNN), a specific type of DNN, for object tracking in the context of industrial robotics. The proposed solution aims to use a combination of documented approaches to replicate a pick-and-place task with an industrial robot using computer vision feeding a YOLOv3 CNN. Experimental tests, designed to investigate the requirements of training the CNN in this context, were performed using a variety of objects that differed in shape and size in a controlled environment. The general focus was to detect the objects based on their shape; as a result, a suitable and secure grasp could be selected by the robot. The findings in this article reflect the challenges of training the CNN through brute force. It also highlights the different methods of annotating images and the ensuing results obtained after training the neural network
    • 

    corecore