14,277 research outputs found
Towards the cloudification of the social networks analytics
In the last years, with the increase of the available data from social networks and the rise of big data technologies, social data has emerged as one of the most profitable market for companies to increase their benefits. Besides, social computation scientists see such data as a vast ocean of information to study modern human societies. Nowadays, enterprises and researchers are developing their own mining tools in house, or they are outsourcing their social media mining needs to specialised companies with its consequent economical cost. In this paper, we present the first cloud computing service to facilitate the deployment of social media analytics applications to allow data practitioners to use social mining tools as a service. The main advantage of this service is the possibility to run different queries at the same time and combine their results in real time. Additionally, we also introduce twearch, a prototype to develop twitter mining algorithms as services in the cloud.Peer ReviewedPostprint (author’s final draft
Neural nets - their use and abuse for small data sets
Neural nets can be used for non-linear classification and regression models. They have a big advantage
over conventional statistical tools in that it is not necessary to assume any mathematical form for the
functional relationship between the variables. However, they also have a few associated problems chief of
which are probably the risk of over-parametrization in the absence of P-values, the lack of appropriate
diagnostic tools and the difficulties associated with model interpretation. The first of these problems is
particularly important in the case of small data sets. These problems are investigated in the context of real
market research data involving non-linear regression and discriminant analysis. In all cases we compare
the results of the non-linear neural net models with those of conventional linear statistical methods. Our
conclusion is that the theory and software for neural networks has some way to go before the above
problems will be solved
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
The Royal Birth of 2013: Analysing and Visualising Public Sentiment in the UK Using Twitter
Analysis of information retrieved from microblogging services such as Twitter
can provide valuable insight into public sentiment in a geographic region. This
insight can be enriched by visualising information in its geographic context.
Two underlying approaches for sentiment analysis are dictionary-based and
machine learning. The former is popular for public sentiment analysis, and the
latter has found limited use for aggregating public sentiment from Twitter
data. The research presented in this paper aims to extend the machine learning
approach for aggregating public sentiment. To this end, a framework for
analysing and visualising public sentiment from a Twitter corpus is developed.
A dictionary-based approach and a machine learning approach are implemented
within the framework and compared using one UK case study, namely the royal
birth of 2013. The case study validates the feasibility of the framework for
analysis and rapid visualisation. One observation is that there is good
correlation between the results produced by the popular dictionary-based
approach and the machine learning approach when large volumes of tweets are
analysed. However, for rapid analysis to be possible faster methods need to be
developed using big data techniques and parallel methods.Comment: http://www.blessonv.com/research/publicsentiment/ 9 pages. Submitted
to IEEE BigData 2013: Workshop on Big Humanities, October 201
- …