4 research outputs found

    Identification of Fake Profiles in Twitter Social Network

    Get PDF
    Online social networks are being intensively used by millions of users, Twitter being one of the most popular, as a powerful source of information with impact on opinion and decision making. However, in Twitter as in other online social networks, not all the users are legitimate, and it is not easy to detect those accounts that correspond to fake profiles. In this work in progress paper, we propose a method to help practitioners to identify fake Twitter accounts, by calculating the “fake probability” based on a weighted parameter set collected from public Twitter accounts. The preliminary results obtained with a subset of an existing annotated dataset of Twitter accounts are promising and give confidence on using this method as a decision support system, to help practitioners to identify fake profiles.info:eu-repo/semantics/publishedVersio

    Identificação de perfis falsos nas redes sociais

    Get PDF
    Atualmente com o constante crescimento das redes sociais e com o papel que desempenham na sociedade, tanto a nível social como de negócios, torna-se importante compreender alguns problemas que se têm vindo a identificar nessas redes. Para os seus utilizadores a sua vida prática do dia-a-dia, tornou-se interligada pelas redes sociais. A popularidade das redes sociais acarreta alguns problemas, como por exemplo a possibilidade de expor informações pessoais dos utilizadores, a propagação do spam, a exposição a cenários de potencial extorsão e outras atividades relacionadas com o cibercrime. Uma das formas de potenciar estes episódios de cibercrime está relacionado com a criação de perfis falsos e com a sua utilização em atividades ilícitas e potencialmente lesivas para os utilizadores das redes sociais. No âmbito deste projeto foi desenvolvido um estudo sobre o funcionamento das redes sociais e dos principais crimes praticados com recurso a perfis falsos, bem como os principais mecanismos existentes para identificar esse tipo de perfis. Este projeto teve como principal objetivo o desenvolvimento de uma metodologia para auxiliar na identificação de perfis falsos na rede social Twitter. Para tal foram identificados vários parâmetros relacionados com a conta dos utilizadores nessa rede social e, para cada parâmetro, foi calculado o seu valor ponderado, com base na informação pública do perfil em apreciação. O resultado obtido da análise de cada perfil permite aferir sobre a probabilidade de o mesmo ser falso, ou não. No âmbito deste projeto foi desenvolvida uma aplicação web que implementa a metodologia definida e calcula a probabilidade de “falsidade” do perfil. A disponibilidade alargada da aplicação permite a consulta em tempo real e de forma rápida, em cada momento, do nível de “falsidade” de uma determinada conta. Os testes foram realizados com dois datasets conhecidos e já publicados, correspondendo a conjuntos de perfis falsos e verdadeiros no Twitter. Além de se ter validado a viabilidade da solução e aplicação da metodologia na rede social Twitter, foi possível obter resultados promissores através da elevada assertividade, tendo-se registado uma percentagem média de acerto de 94.5%

    Coping with new Challenges in Clustering and Biomedical Imaging

    Get PDF
    The last years have seen a tremendous increase of data acquisition in different scientific fields such as molecular biology, bioinformatics or biomedicine. Therefore, novel methods are needed for automatic data processing and analysis of this large amount of data. Data mining is the process of applying methods like clustering or classification to large databases in order to uncover hidden patterns. Clustering is the task of partitioning points of a data set into distinct groups in order to minimize the intra cluster similarity and to maximize the inter cluster similarity. In contrast to unsupervised learning like clustering, the classification problem is known as supervised learning that aims at the prediction of group membership of data objects on the basis of rules learned from a training set where the group membership is known. Specialized methods have been proposed for hierarchical and partitioning clustering. However, these methods suffer from several drawbacks. In the first part of this work, new clustering methods are proposed that cope with problems from conventional clustering algorithms. ITCH (Information-Theoretic Cluster Hierarchies) is a hierarchical clustering method that is based on a hierarchical variant of the Minimum Description Length (MDL) principle which finds hierarchies of clusters without requiring input parameters. As ITCH may converge only to a local optimum we propose GACH (Genetic Algorithm for Finding Cluster Hierarchies) that combines the benefits from genetic algorithms with information-theory. In this way the search space is explored more effectively. Furthermore, we propose INTEGRATE a novel clustering method for data with mixed numerical and categorical attributes. Supported by the MDL principle our method integrates the information provided by heterogeneous numerical and categorical attributes and thus naturally balances the influence of both sources of information. A competitive evaluation illustrates that INTEGRATE is more effective than existing clustering methods for mixed type data. Besides clustering methods for single data objects we provide a solution for clustering different data sets that are represented by their skylines. The skyline operator is a well-established database primitive for finding database objects which minimize two or more attributes with an unknown weighting between these attributes. In this thesis, we define a similarity measure, called SkyDist, for comparing skylines of different data sets that can directly be integrated into different data mining tasks such as clustering or classification. The experiments show that SkyDist in combination with different clustering algorithms can give useful insights into many applications. In the second part, we focus on the analysis of high resolution magnetic resonance images (MRI) that are clinically relevant and may allow for an early detection and diagnosis of several diseases. In particular, we propose a framework for the classification of Alzheimer's disease in MR images combining the data mining steps of feature selection, clustering and classification. As a result, a set of highly selective features discriminating patients with Alzheimer and healthy people has been identified. However, the analysis of the high dimensional MR images is extremely time-consuming. Therefore we developed JGrid, a scalable distributed computing solution designed to allow for a large scale analysis of MRI and thus an optimized prediction of diagnosis. In another study we apply efficient algorithms for motif discovery to task-fMRI scans in order to identify patterns in the brain that are characteristic for patients with somatoform pain disorder. We find groups of brain compartments that occur frequently within the brain networks and discriminate well among healthy and diseased people
    corecore