51 research outputs found

    An Arabic Corpus of Fake News: Collection, Analysis and Classification

    Get PDF
    International audienceOver the last years, with the explosive growth of social media, huge amounts of rumors have been rapidly spread on the internet. Indeed, the proliferation of malicious misinformation and nasty rumors in social media can have harmful effects on individuals and society. In this paper, we investigate the content of the fake news in the Arabic world through the information posted on YouTube. Our contribution is threefold. First, we introduce a novel Arab corpus for the task of fake news analysis, covering the topics most concerned by rumors. We describe the corpus and the data collection process in detail. Second, we present several exploratory analysis on the harvested data in order to retrieve some useful knowledge about the transmission of rumors for the studied topics. Third, we test the possibility of discrimination between rumor and no rumor comments using three machine learning classifiers namely, Support Vector Machine (SVM), Decision Tree (DT) and Multinomial NaĂŻve Bayes (MNB)

    On morphological hierarchical representations for image processing and spatial data clustering

    Full text link
    Hierarchical data representations in the context of classi cation and data clustering were put forward during the fties. Recently, hierarchical image representations have gained renewed interest for segmentation purposes. In this paper, we briefly survey fundamental results on hierarchical clustering and then detail recent paradigms developed for the hierarchical representation of images in the framework of mathematical morphology: constrained connectivity and ultrametric watersheds. Constrained connectivity can be viewed as a way to constrain an initial hierarchy in such a way that a set of desired constraints are satis ed. The framework of ultrametric watersheds provides a generic scheme for computing any hierarchical connected clustering, in particular when such a hierarchy is constrained. The suitability of this framework for solving practical problems is illustrated with applications in remote sensing

    Characterizing eve: Analysing cybercrime actors in a large underground forum

    Get PDF
    Underground forums contain many thousands of active users, but the vast majority will be involved, at most, in minor levels of deviance. The number who engage in serious criminal activity is small. That being said, underground forums have played a significant role in several recent high-profile cybercrime activities. In this work we apply data science approaches to understand criminal pathways and characterize key actors related to illegal activity in one of the largest and longest- running underground forums. We combine the results of a logistic regression model with k-means clustering and social network analysis, verifying the findings using topic analysis. We identify variables relating to forum activity that predict the likelihood a user will become an actor of interest to law enforcement, and would therefore benefit the most from intervention. This work provides the first step towards identifying ways to deter the involvement of young people away from a career in cybercrime.Alan Turing Institut

    A multi-classifier approach to dialogue act classification using function words

    Get PDF
    This paper extends a novel technique for the classification of sentences as Dialogue Acts, based on structural information contained in function words. Initial experiments on classifying questions in the presence of a mix of straightforward and “difficult” non-questions yielded promising results, with classification accuracy approaching 90%. However, this initial dataset does not fully represent the various permutations of natural language in which sentences may occur. Also, a higher Classification Accuracy is desirable for real-world applications. Following an analysis of categorisation of sentences, we present a series of experiments that show improved performance over the initial experiment and promising performance for categorising more complex combinations in the future

    Diversity of hard-bottom fauna relative to environmental gradients in Kongsfjorden, Svalbard

    Get PDF
    A baseline study of hard-bottom zoobenthos in relation to environmental gradients in Kongsfjorden, a glacial fjord in Svalbard, is presented, based on collections from 1996 to 1998. The total species richness in 62 samples from 0 to 30 m depth along five transects was 403 species. Because 32 taxa could not be identified to species level and because 11 species are probably new to science, the total number of identified species was 360. Of these, 47 species are new for Svalbard waters. Bryozoa was the most diverse group. Biogeographic composition revealed features of both Arctic and sub-Arctic properties of the fauna. Species richness, frequency of species occurrence, mean abundance and biomass generally decreased towards the tidal glaciers in inner Kongsfjorden. Among eight environmental factors, depth was most important for explaining variance in the composition of the zoobenthos. The diversity was consistently low at shallow depths, whereas the non-linear patterns of species composition of deeper samples indicated a transitional zone between surface and deeper water masses at 15–20 m depth. Groups of “colonial” and “non-colonial” species differed in diversity, biogeographic composition and distribution by location and depth as well as in relation to other environmental factors. “Non-colonial” species made a greater contribution than “colonial” species to total species richness, total occurrence and biomass in samples, and were more influenced by the depth gradient. Biogeographic composition was sensitive to variation of zoobenthic characteristics over the studied depth range. A list of recorded species and a description of sampling sites are presented

    The Conditions of Sex-change in the Oyster (Ostrea edulis)

    No full text
    • …
    corecore