99,383 research outputs found
Search Bias Quantification: Investigating Political Bias in Social Media and Web Search
Users frequently use search systems on the Web as well as online social media to learn about ongoing events and public opinion on personalities. Prior studies have shown that the top-ranked results returned by these search engines can shape user opinion about the topic (e.g., event or person) being searched. In case of polarizing topics like politics, where multiple competing perspectives exist, the political bias in the top search results can play a significant role in shaping public opinion towards (or away from) certain perspectives. Given the considerable impact that search bias can have on the user, we propose a generalizable search bias quantification framework that not only measures the political bias in ranked list output by the search system but also decouples the bias introduced by the different sourcesâinput data and ranking system. We apply our framework to study the political bias in searches related to 2016 US Presidential primaries in Twitter social media search and find that both input data and ranking system matter in determining the final search output bias seen by the users. And finally, we use the framework to compare the relative bias for two popular search systemsâTwitter social media search and Google web searchâfor queries related to politicians and political events. We end by discussing some potential solutions to signal the bias in the search results to make the users more aware of them.publishe
TCMI: a non-parametric mutual-dependence estimator for multivariate continuous distributions
The identification of relevant features, i.e., the driving variables that determine a process or the property of a system, is an essential part of the analysis of data sets whose entries are described by a large number of variables. The preferred measure for quantifying the relevance of nonlinear statistical dependencies is mutual information, which requires as input probability distributions. Probability distributions cannot be reliably sampled and estimated from limited data, especially for real-valued data samples such as lengths or energies. Here, we introduce total cumulative mutual information (TCMI), a measure of the relevance of mutual dependencies based on cumulative probability distributions. TCMI can be estimated directly from sample data and is a non-parametric, robust and deterministic measure that facilitates comparisons and rankings between feature sets with different cardinality. The ranking induced by TCMI allows for feature selection, i.e., the identification of the set of relevant features that are statistical related to the process or the property of a system, while taking into account the number of data samples as well as the cardinality of the feature subsets. We evaluate the performance of our measure with simulated data, compare its performance with similar multivariate dependence measures, and demonstrate the effectiveness of our feature selection method on a set of standard data sets and a typical scenario in materials science
Explaining Recurrent Neural Network Predictions in Sentiment Analysis
Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown
to deliver insightful explanations in the form of input space relevances for
understanding feed-forward neural network classification decisions. In the
present work, we extend the usage of LRP to recurrent neural networks. We
propose a specific propagation rule applicable to multiplicative connections as
they arise in recurrent network architectures such as LSTMs and GRUs. We apply
our technique to a word-based bi-directional LSTM model on a five-class
sentiment prediction task, and evaluate the resulting LRP relevances both
qualitatively and quantitatively, obtaining better results than a
gradient-based related method which was used in previous work.Comment: 9 pages, 4 figures, accepted for EMNLP'17 Workshop on Computational
Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA
DeepSig: Deep learning improves signal peptide detection in proteins
Motivation:
The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization.
Results:
Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification.
Availability and implementation:
DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website
A combined measure for quantifying and qualifying the topology preservation of growing self-organizing maps
The Self-OrganizingMap (SOM) is a neural network model that performs an ordered projection of a high dimensional input space in a low-dimensional topological structure. The process in which such mapping is formed is defined by the SOM algorithm, which is a competitive, unsupervised and nonparametric method, since it does not make any assumption about the input data distribution. The feature maps provided by this algorithm have been successfully applied for vector quantization, clustering and high dimensional data visualization processes. However, the initialization of the network topology and the selection of the SOM training parameters are two difficult tasks caused by the unknown distribution of the input signals. A misconfiguration of these parameters can generate a feature map of low-quality, so it is necessary to have some measure of the degree of adaptation of the SOM network to the input data model. The topologypreservation is the most common concept used to implement this measure. Several qualitative and quantitative methods have been proposed for measuring the degree of SOM topologypreservation, particularly using Kohonen's model. In this work, two methods for measuring the topologypreservation of the Growing Cell Structures (GCSs) model are proposed: the topographic function and the topology preserving ma
- âŠ