6 research outputs found

    When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks

    Full text link
    We introduce a framework for the modeling of sequential data capturing pathways of varying lengths observed in a network. Such data are important, e.g., when studying click streams in information networks, travel patterns in transportation systems, information cascades in social networks, biological pathways or time-stamped social interactions. While it is common to apply graph analytics and network analysis to such data, recent works have shown that temporal correlations can invalidate the results of such methods. This raises a fundamental question: when is a network abstraction of sequential data justified? Addressing this open question, we propose a framework which combines Markov chains of multiple, higher orders into a multi-layer graphical model that captures temporal correlations in pathways at multiple length scales simultaneously. We develop a model selection technique to infer the optimal number of layers of such a model and show that it outperforms previously used Markov order detection techniques. An application to eight real-world data sets on pathways and temporal networks shows that it allows to infer graphical models which capture both topological and temporal characteristics of such data. Our work highlights fallacies of network abstractions and provides a principled answer to the open question when they are justified. Generalizing network representations to multi-order graphical models, it opens perspectives for new data mining and knowledge discovery algorithms.Comment: 10 pages, 4 figures, 1 table, companion python package pathpy available on gitHu

    RSC: mining and modeling temporal activity in social media

    Get PDF
    Can we identify patterns of temporal activities caused by human communications in social media? Is it possible to model these patterns and tell if a user is a human or a bot based only on the timing of their postings? Social media services allow users to make postings, generating large datasets of human activity time-stamps. In this paper we analyze time-stamp data from social media services and find that the distribution of postings inter-arrival times (IAT) is characterized by four patterns: (i) positive correlation between consecutive IATs, (ii) heavy tails, (iii) periodic spikes and (iv) bimodal distribution. Based on our findings, we propose Rest-Sleep-and-\ud Comment (RSC), a generative model that is able to match all four discovered patterns. We demonstrate the utility of RSC by showing that it can accurately fit real time-stamp data from Reddit and Twitter. We also show that RSC can be used to spot outliers and detect users with non-human behavior, such as bots. We validate RSC using real data consisting of over 35 million postings from Twitter and Reddit. RSC consistently provides a better fit to real data and clearly outperform existing models for human dynamics. RSC was also able to detect bots with a precision higher than 94%.FAPESPCNPqCAPESSTIC-AmSudRESCUER project funded by the European Commission (Grant: 614154) and by the CNPq/MCTI (Grant: 490084/2013-3)JSPS KAKENHI, Grant-in-Aid for JSPS Fellows #242322National Science Foundation under Grant No. CNS-1314632, IIS-1408924ARO/DARPA under Contract Number W911NF-11-C-0088Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-005

    Medical image supported by shape features

    No full text
    Bases de imagens armazenadas em sistemas computacionais da área médica correspondem a uma valiosa fonte de conhecimento. Assim, a mineração de imagens pode ser aplicada para extrair conhecimento destas bases com o propósito de apoiar o diagnóstico auxiliado por computador (Computer Aided Diagnosis - CAD). Sistemas CAD apoiados por mineração de imagens tipicamente realizam a extração de características visuais relevantes das imagens. Essas características são organizadas na forma de vetores de características que representam as imagens e são utilizados como entrada para classificadores. Devido ao problema conhecido como lacuna semântica, que corresponde à diferença entre a percepção da imagem pelo especialista médico e suas características automaticamente extraídas, um aspecto desafiador do CAD é a obtenção de um conjunto de características que seja capaz de representar de maneira sucinta e eficiente o conteúdo visual de imagens médicas. Foi desenvolvido neste trabalho o extrator de características FFS (Fast Fractal Stack) que realiza a extração de características de forma, que é um atributo visual que aproxima a semântica esperada pelo ser humano. Adicionalmente, foi desenvolvido o algoritmo de classificação Concept, que emprega mineração de regras de associação para predizer a classe de uma imagem. O aspecto inovador do Concept refere-se ao algoritmo de obtenção de representações de imagens, denominado MFS-Map (Multi Feature Space Map) e também desenvolvido neste trabalho. O MFS-Map realiza agrupamento de dados em diferentes espaços de características para melhor aproveitar as características extraídas no processo de classificação. Os experimentos realizados para imagens de tomografia pulmonar e mamografias indicam que tanto o FFS como a abordagem de representação adotada pelo Concept podem contribuir para o aprimoramento de sistemas CADMedical image databases represent a valuable source of data from which potential knowledge can be extracted. Image mining can be applied to knowledge discover from these data in order to help CAD (Computer Aided Diagnosis) systems. The typical set-up of a CAD system consists in the extraction of relevant visual features in the form of image feature vectors that are used as input to a classifier. Due to the semantic gap problem, which corresponds to the difference between the humans image perception and the features automatically extracted from the image, a challenging aspect of CAD is to obtain a set of features that is able to succinctly and efficiently represent the visual contents of medical images. To deal with this problem it was developed in this work a new feature extraction method entitled Fast Fractal Stack (FFS). FFS extracts shape features from objects and structures, which is a visual attribute that approximates the semantics expected by humans. Additionally, it was developed the Concept classification method, which employs association rules mining to the task of image class prediction. The innovative aspect of Concept refers to its image representation algorithm termed MFS-Map (Multi Feature Space Map). MFS-Map employs clustering in different feature spaces to maximize features usefulness in the classification process. Experiments performed employing computed tomography and mammography images indicate that both FFS and Concept methods for image representation can contribute to the improvement of CAD system

    Mineração de Dados de Atividade de Usuários em Serviços de Mídia Social

    No full text
    Social media services have a growing impact in our society. Individuals often rely on social media to get their news, decide which products to buy or to communicate with their friends. As consequence of the widespread adoption of social media, a large volume of data on how users behave is created every day and stored into large databases. Learning how to analyze and extract useful knowledge from this data has a number of potential applications. For instance, a deeper understanding on how legitimate users interact with social media services could be explored to design more accurate spam and fraud detection methods. This PhD research is based on the following hypothesis: data generated by social media users present patterns that can be exploited to improve the effectiveness of tasks such as prediction, forecasting and modeling in the domain of social media. To validate our hypothesis, we focus on designing data mining methods tailored to social media data. The main contributions of this PhD can be divided into three parts. First, we propose Act-M, a mathematical model that describes the timing of users actions. We also show that Act-M can be used to automatically detect bots among social media users based only on the timing (i.e. time-stamp) data. Our second contribution is VnC (Vote-and-Comment), a model that explains how the volume of different types of user interactions evolve over time when a piece of content is submitted to a social media service. In addition to accurately matching real data, VnC is useful, as it can be employed to forecast the number of interactions received by social media content. Finally, our third contribution is the MFS-Map method. MFS-Map automatically provides textual annotations to social media images by efficiently combining visual and metadata features. Our contributions were validated using real data from several social media services. Our experiments show that the Act-M and VnC models provided a more accurate fit to the data than existing models for communication dynamics and information diffusion, respectively. MFS-Map obtained both superior precision and faster speed when compared to other widely employed image annotation methods.O impacto dos serviços de mídia social em nossa sociedade é crescente. Indivíduos frequentemente utilizam mídias sociais para obter notícias, decidir quais os produtos comprar ou para se comunicar com amigos. Como consequência da adoção generalizada de mídias sociais, um grande volume de dados sobre como os usuários se comportam é gerado diariamente e armazenado em grandes bancos de dados. Aprender a analisar e extrair conhecimentos úteis a partir destes dados tem uma série de potenciais aplicações. Por exemplo, um entendimento mais detalhado sobre como usuários legítimos interagem com serviços de mídia social poderia ser explorado para projetar métodos mais precisos de detecção de spam e fraude. Esta pesquisa de doutorado baseia-se na seguinte hipótese: dados gerados por usuários de mídia social apresentam padrões que podem ser explorados para melhorar a eficácia de tarefas como previsão e modelagem no domínio das mídias sociais. Para validar esta hipótese, foram projetados métodos de mineração de dados adaptados aos dados de mídia social. As principais contribuições desta pesquisa de doutorado podem ser divididas em três partes. Primeiro, foi desenvolvido o Act-M, um modelo matemático que descreve o tempo das ações dos usuários. O autor demonstrou que o Act-M pode ser usado para detectar automaticamente bots entre usuários de mídia social com base apenas nos dados de tempo. A segunda contribuição desta tese é o VnC (Vote-and- Comment), um modelo que explica como o volume de diferentes tipos de interações de usuário evolui ao longo do tempo quando um conteúdo é submetido a um serviço de mídia social. Além de descrever precisamente os dados reais, o VnC é útil, pois pode ser empregado para prever o número de interações recebidas por determinado conteúdo de mídia social. Por fim, nossa terceira contribuição é o método MFS-Map. O MFS-Map fornece automaticamente anotações textuais para imagens de mídias sociais, combinando eficientemente características visuais e de metadados das imagens. As contribuições deste doutorado foram validadas utilizando dados reais de diversos serviços de mídia social. Os experimentos mostraram que os modelos Act-M e VnC forneceram um ajuste mais preciso aos dados quando comparados, respectivamente, a modelos existentes para dinâmica de comunicação e difusão de informação. O MFS-Map obteve precisão superior e tempo de execução reduzido quando comparado com outros métodos amplamente utilizados para anotação de imagens

    MFS-Map: efficient context and content combination to annotate images

    No full text
    Automatic image annotation provides textual description to images based on content and context information. Since images may present large variability, image annotation methods often employ multiple extractors to represent visual contente considering local and global features under different visual aspects. As result, an important aspect of image annotation is the combination of context and content representations. This paper proposes MFS-Map (Multi-Feature Space Map), a novel image annotation method that manages the problem of combining multiple content and contexto representations when annotating images. The advantage of MFS-Map is that it does not represent visual and textual features by a single large feature vector. Rather, MFS-Map divides the problem into feature subspaces. This approach allows MFS-Map to improve its accuracy by identifying the\ud features relevant for each annotation. We evaluated MFSMap using two publicly available datasets: MIR Flickr and Image CLEF 2011. MFS-Map obtained both superior precision and faster speed when compared to other widely employed annotation methods.FAPESP (São Paulo State Research Foundation)CNPq (Brazilian National Research Council)CAPES (Brazilian Coordination for Improvement of Higher Level Personnel

    Unveiling smoke in social images with the SmokeBlock approach

    No full text
    Can we use information from social media and crowdsourced images to detect smoke and assist rescue forces? While there are computer vision methods for detecting smoke, they require movement information extracted from video data. In this paper we propose SmokeBlock: a method that is able to segment and detect smoke in still images. SmokeBlock uses superpixel segmentation and extracts local color and texture features from images to spot smoke. We used real data from Flickr and compared SmokeBlock against state-of-theart methods for feature extraction. Our method achieved performance superior than the competitors, for the task of smoke detection. Our findings shall support further investigations in the field of image analysis, in particular, concerning images captured with mobile devices.FAPESPCAPESRESCUER Project (grants EU 614154 and CNPq/MCTI 490084/2013-3
    corecore