74,326 research outputs found
Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election
Social media has become an emerging alternative to opinion polls for public
opinion collection, while it is still posing many challenges as a passive data
source, such as structurelessness, quantifiability, and representativeness.
Social media data with geotags provide new opportunities to unveil the
geographic locations of users expressing their opinions. This paper aims to
answer two questions: 1) whether quantifiable measurement of public opinion can
be obtained from social media and 2) whether it can produce better or
complementary measures compared to opinion polls. This research proposes a
novel approach to measure the relative opinion of Twitter users towards public
issues in order to accommodate more complex opinion structures and take
advantage of the geography pertaining to the public issues. To ensure that this
new measure is technically feasible, a modeling framework is developed
including building a training dataset by adopting a state-of-the-art approach
and devising a new deep learning method called Opinion-Oriented Word Embedding.
With a case study of the tweets selected for the 2016 U.S. presidential
election, we demonstrate the predictive superiority of our relative opinion
approach and we show how it can aid visual analytics and support opinion
predictions. Although the relative opinion measure is proved to be more robust
compared to polling, our study also suggests that the former can advantageously
complement the later in opinion prediction
Similarity-based virtual screening using 2D fingerprints
This paper summarises recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases related to the sizes of the molecules that are being sought. Group fusion involves combining the results of similarity searches based on multiple reference structures and a single similarity measure. We demonstrate the effectiveness of this approach to screening, and also describe an approximate form of group fusion, turbo similarity searching, that can be used when just a single reference structure is available
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Technical note: Bias and the quantification of stability
Research on bias in machine learning algorithms has generally been concerned with the
impact of bias on predictive accuracy. We believe that there are other factors that should
also play a role in the evaluation of bias. One such factor is the stability of the algorithm;
in other words, the repeatability of the results. If we obtain two sets of data from the same
phenomenon, with the same underlying probability distribution, then we would like our
learning algorithm to induce approximately the same concepts from both sets of data. This
paper introduces a method for quantifying stability, based on a measure of the agreement
between concepts. We also discuss the relationships among stability, predictive accuracy,
and bias
Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining
The notion of meta-mining has appeared recently and extends the traditional
meta-learning in two ways. First it does not learn meta-models that provide
support only for the learning algorithm selection task but ones that support
the whole data-mining process. In addition it abandons the so called black-box
approach to algorithm description followed in meta-learning. Now in addition to
the datasets, algorithms also have descriptors, workflows as well. For the
latter two these descriptions are semantic, describing properties of the
algorithms. With the availability of descriptors both for datasets and data
mining workflows the traditional modelling techniques followed in
meta-learning, typically based on classification and regression algorithms, are
no longer appropriate. Instead we are faced with a problem the nature of which
is much more similar to the problems that appear in recommendation systems. The
most important meta-mining requirements are that suggestions should use only
datasets and workflows descriptors and the cold-start problem, e.g. providing
workflow suggestions for new datasets.
In this paper we take a different view on the meta-mining modelling problem
and treat it as a recommender problem. In order to account for the meta-mining
specificities we derive a novel metric-based-learning recommender approach. Our
method learns two homogeneous metrics, one in the dataset and one in the
workflow space, and a heterogeneous one in the dataset-workflow space. All
learned metrics reflect similarities established from the dataset-workflow
preference matrix. We demonstrate our method on meta-mining over biological
(microarray datasets) problems. The application of our method is not limited to
the meta-mining problem, its formulations is general enough so that it can be
applied on problems with similar requirements
- …