8,333 research outputs found
Fame for sale: efficient detection of fake Twitter followers
are those Twitter accounts specifically created to
inflate the number of followers of a target account. Fake followers are
dangerous for the social platform and beyond, since they may alter concepts
like popularity and influence in the Twittersphere - hence impacting on
economy, politics, and society. In this paper, we contribute along different
dimensions. First, we review some of the most relevant existing features and
rules (proposed by Academia and Media) for anomalous Twitter accounts
detection. Second, we create a baseline dataset of verified human and fake
follower accounts. Such baseline dataset is publicly available to the
scientific community. Then, we exploit the baseline dataset to train a set of
machine-learning classifiers built over the reviewed rules and features. Our
results show that most of the rules proposed by Media provide unsatisfactory
performance in revealing fake followers, while features proposed in the past by
Academia for spam detection provide good results. Building on the most
promising features, we revise the classifiers both in terms of reduction of
overfitting and cost for gathering the data needed to compute the features. The
final result is a novel classifier, general enough to thwart
overfitting, lightweight thanks to the usage of the less costly features, and
still able to correctly classify more than 95% of the accounts of the original
training set. We ultimately perform an information fusion-based sensitivity
analysis, to assess the global sensitivity of each of the features employed by
the classifier. The findings reported in this paper, other than being supported
by a thorough experimental methodology and interesting on their own, also pave
the way for further investigation on the novel issue of fake Twitter followers
Deep Architectures and Ensembles for Semantic Video Classification
This work addresses the problem of accurate semantic labelling of short
videos. To this end, a multitude of different deep nets, ranging from
traditional recurrent neural networks (LSTM, GRU), temporal agnostic networks
(FV,VLAD,BoW), fully connected neural networks mid-stage AV fusion and others.
Additionally, we also propose a residual architecture-based DNN for video
classification, with state-of-the art classification performance at
significantly reduced complexity. Furthermore, we propose four new approaches
to diversity-driven multi-net ensembling, one based on fast correlation measure
and three incorporating a DNN-based combiner. We show that significant
performance gains can be achieved by ensembling diverse nets and we investigate
factors contributing to high diversity. Based on the extensive YouTube8M
dataset, we provide an in-depth evaluation and analysis of their behaviour. We
show that the performance of the ensemble is state-of-the-art achieving the
highest accuracy on the YouTube-8M Kaggle test data. The performance of the
ensemble of classifiers was also evaluated on the HMDB51 and UCF101 datasets,
and show that the resulting method achieves comparable accuracy with
state-of-the-art methods using similar input features
Predictive Modelling of Bone Age through Classification and Regression of Bone Shapes
Bone age assessment is a task performed daily in hospitals worldwide. This involves a clinician estimating the age of a patient from a radiograph of the non-dominant hand. Our approach to automated bone age assessment is to modularise the algorithm into the following three stages: segment and verify hand outline; segment and verify bones; use the bone outlines to construct models of age. In this paper we address the final question: given outlines of bones, can we learn how to predict the bone age of the patient? We examine two alternative approaches. Firstly, we attempt to train classifiers on individual bones to predict the bone stage categories commonly used in bone ageing. Secondly, we construct regression models to directly predict patient age. We demonstrate that models built on summary features of the bone outline perform better than those built using the one dimensional representation of the outline, and also do at least as well as other automated systems. We show that models constructed on just three bones are as accurate at predicting age as expert human assessors using the standard technique. We also demonstrate the utility of the model by quantifying the importance of ethnicity and sex on age development. Our conclusion is that the feature based system of separating the image processing from the age modelling is the best approach for automated bone ageing, since it offers flexibility and transparency and produces accurate estimate
Predictive Capacity of Meteorological Data - Will it rain tomorrow
With the availability of high precision digital sensors and cheap storage
medium, it is not uncommon to find large amounts of data collected on almost
all measurable attributes, both in nature and man-made habitats. Weather in
particular has been an area of keen interest for researchers to develop more
accurate and reliable prediction models. This paper presents a set of
experiments which involve the use of prevalent machine learning techniques to
build models to predict the day of the week given the weather data for that
particular day i.e. temperature, wind, rain etc., and test their reliability
across four cities in Australia {Brisbane, Adelaide, Perth, Hobart}. The
results provide a comparison of accuracy of these machine learning techniques
and their reliability to predict the day of the week by analysing the weather
data. We then apply the models to predict weather conditions based on the
available data.Comment: 7 pages, 2 Result Set
A bagging SVM to learn from positive and unlabeled examples
We consider the problem of learning a binary classifier from a training set
of positive and unlabeled examples, both in the inductive and in the
transductive setting. This problem, often referred to as \emph{PU learning},
differs from the standard supervised classification problem by the lack of
negative examples in the training set. It corresponds to an ubiquitous
situation in many applications such as information retrieval or gene ranking,
when we have identified a set of data of interest sharing a particular
property, and we wish to automatically retrieve additional data sharing the
same property among a large and easily available pool of unlabeled data. We
propose a conceptually simple method, akin to bagging, to approach both
inductive and transductive PU learning problems, by converting them into series
of supervised binary classification problems discriminating the known positive
examples from random subsamples of the unlabeled set. We empirically
demonstrate the relevance of the method on simulated and real data, where it
performs at least as well as existing methods while being faster
Analysis of group evolution prediction in complex networks
In the world, in which acceptance and the identification with social
communities are highly desired, the ability to predict evolution of groups over
time appears to be a vital but very complex research problem. Therefore, we
propose a new, adaptable, generic and mutli-stage method for Group Evolution
Prediction (GEP) in complex networks, that facilitates reasoning about the
future states of the recently discovered groups. The precise GEP modularity
enabled us to carry out extensive and versatile empirical studies on many
real-world complex / social networks to analyze the impact of numerous setups
and parameters like time window type and size, group detection method,
evolution chain length, prediction models, etc. Additionally, many new
predictive features reflecting the group state at a given time have been
identified and tested. Some other research problems like enriching learning
evolution chains with external data have been analyzed as well
- …