10,635 research outputs found
Adversarial Removal of Demographic Attributes from Text Data
Recent advances in Representation Learning and Adversarial Training seem to
succeed in removing unwanted features from the learned representation. We show
that demographic information of authors is encoded in -- and can be recovered
from -- the intermediate representations learned by text-based neural
classifiers. The implication is that decisions of classifiers trained on
textual data are not agnostic to -- and likely condition on -- demographic
attributes. When attempting to remove such demographic information using
adversarial training, we find that while the adversarial component achieves
chance-level development-set accuracy during training, a post-hoc classifier,
trained on the encoded sentences from the first part, still manages to reach
substantially higher classification accuracies on the same data. This behavior
is consistent across several tasks, demographic properties and datasets. We
explore several techniques to improve the effectiveness of the adversarial
component. Our main conclusion is a cautionary one: do not rely on the
adversarial training to achieve invariant representation to sensitive features
A Probabilistic Linear Genetic Programming with Stochastic Context-Free Grammar for solving Symbolic Regression problems
Traditional Linear Genetic Programming (LGP) algorithms are based only on the
selection mechanism to guide the search. Genetic operators combine or mutate
random portions of the individuals, without knowing if the result will lead to
a fitter individual. Probabilistic Model Building Genetic Programming (PMB-GP)
methods were proposed to overcome this issue through a probability model that
captures the structure of the fit individuals and use it to sample new
individuals. This work proposes the use of LGP with a Stochastic Context-Free
Grammar (SCFG), that has a probability distribution that is updated according
to selected individuals. We proposed a method for adapting the grammar into the
linear representation of LGP. Tests performed with the proposed probabilistic
method, and with two hybrid approaches, on several symbolic regression
benchmark problems show that the results are statistically better than the
obtained by the traditional LGP.Comment: Genetic and Evolutionary Computation Conference (GECCO) 2017, Berlin,
German
Scene Text Eraser
The character information in natural scene images contains various personal
information, such as telephone numbers, home addresses, etc. It is a high risk
of leakage the information if they are published. In this paper, we proposed a
scene text erasing method to properly hide the information via an inpainting
convolutional neural network (CNN) model. The input is a scene text image, and
the output is expected to be text erased image with all the character regions
filled up the colors of the surrounding background pixels. This work is
accomplished by a CNN model through convolution to deconvolution with
interconnection process. The training samples and the corresponding inpainting
images are considered as teaching signals for training. To evaluate the text
erasing performance, the output images are detected by a novel scene text
detection method. Subsequently, the same measurement on text detection is
utilized for testing the images in benchmark dataset ICDAR2013. Compared with
direct text detection way, the scene text erasing process demonstrates a
drastically decrease on the precision, recall and f-score. That proves the
effectiveness of proposed method for erasing the text in natural scene images
Validation and refinement of allometric equations for roots of northern hardwoods
The allometric equations developed by Whittaker et al. (1974. Ecol. Monogr. 44: 233–252), at the Hubbard Brook Experimental Forest have been used to estimate biomass and productivity in northern hardwood forest systems for over three decades. Few other species-specific allometric estimates of belowground biomass are available because of the difficulty in collecting the data, and such equations are rarely validated. Using previously unpublished data from Whittaker’s sampling effort, we extended the equations to predict the root crown and lateral root components for the three dominant species of the northern hardwood forest: American beech (Fagus grandifolia Ehrh.), yellow birch (Betula alleghaniensis Britt), and sugar maple (Acer saccharum Marsh.). We also refined the allometric models by eliminating the use of very small trees for which the original data were unreliable. We validated these new models of the relationship of tree diameter to the mass of root crowns and lateral roots using root mass data collected from 12 northern hardwood stands of varying age in central New Hampshire. These models provide accurate estimates of lateral roots (diameter) in northern hardwood stands \u3e20 years old (mean error 24%–32%). For the younger stands that we studied, allometric equations substantially underestimated observed root biomass (mean error \u3e60%), presumably due to remnant mature root systems from harvested trees supporting young root-sprouted trees
Privacy and Fairness in Recommender Systems via Adversarial Training of User Representations
Latent factor models for recommender systems represent users and items as low
dimensional vectors. Privacy risks of such systems have previously been studied
mostly in the context of recovery of personal information in the form of usage
records from the training data. However, the user representations themselves
may be used together with external data to recover private user information
such as gender and age. In this paper we show that user vectors calculated by a
common recommender system can be exploited in this way. We propose the
privacy-adversarial framework to eliminate such leakage of private information,
and study the trade-off between recommender performance and leakage both
theoretically and empirically using a benchmark dataset. An advantage of the
proposed method is that it also helps guarantee fairness of results, since all
implicit knowledge of a set of attributes is scrubbed from the representations
used by the model, and thus can't enter into the decision making. We discuss
further applications of this method towards the generation of deeper and more
insightful recommendations.Comment: International Conference on Pattern Recognition and Method
- …