Search CORE

10,635 research outputs found

Adversarial Removal of Demographic Attributes from Text Data

Author: Elazar Yanai
Goldberg Yoav
Publication venue
Publication date: 01/01/2018
Field of study

Recent advances in Representation Learning and Adversarial Training seem to succeed in removing unwanted features from the learned representation. We show that demographic information of authors is encoded in -- and can be recovered from -- the intermediate representations learned by text-based neural classifiers. The implication is that decisions of classifiers trained on textual data are not agnostic to -- and likely condition on -- demographic attributes. When attempting to remove such demographic information using adversarial training, we find that while the adversarial component achieves chance-level development-set accuracy during training, a post-hoc classifier, trained on the encoded sentences from the first part, still manages to reach substantially higher classification accuracies on the same data. This behavior is consistent across several tasks, demographic properties and datasets. We explore several techniques to improve the effectiveness of the adversarial component. Our main conclusion is a cautionary one: do not rely on the adversarial training to achieve invariant representation to sensitive features

arXiv.org e-Print Archive

Crossref

A Probabilistic Linear Genetic Programming with Stochastic Context-Free Grammar for solving Symbolic Regression problems

Author: Bosman P. A. N.
Poli R.
Shan Y.
Wong P. K.
Yanai K.
Yanai K.
Publication venue
Publication date: 03/04/2017
Field of study

Traditional Linear Genetic Programming (LGP) algorithms are based only on the selection mechanism to guide the search. Genetic operators combine or mutate random portions of the individuals, without knowing if the result will lead to a fitter individual. Probabilistic Model Building Genetic Programming (PMB-GP) methods were proposed to overcome this issue through a probability model that captures the structure of the fit individuals and use it to sample new individuals. This work proposes the use of LGP with a Stochastic Context-Free Grammar (SCFG), that has a probability distribution that is updated according to selected individuals. We proposed a method for adapting the grammar into the linear representation of LGP. Tests performed with the proposed probabilistic method, and with two hybrid approaches, on several symbolic regression benchmark problems show that the results are statistically better than the obtained by the traditional LGP.Comment: Genetic and Evolutionary Computation Conference (GECCO) 2017, Berlin, German

arXiv.org e-Print Archive

Crossref

Scene Text Eraser

Author: Nakamura Toshiki
Uchida Seiichi
Yanai Keiji
Zhu Anna
Publication venue
Publication date: 08/05/2017
Field of study

The character information in natural scene images contains various personal information, such as telephone numbers, home addresses, etc. It is a high risk of leakage the information if they are published. In this paper, we proposed a scene text erasing method to properly hide the information via an inpainting convolutional neural network (CNN) model. The input is a scene text image, and the output is expected to be text erased image with all the character regions filled up the colors of the surrounding background pixels. This work is accomplished by a CNN model through convolution to deconvolution with interconnection process. The training samples and the corresponding inpainting images are considered as teaching signals for training. To evaluate the text erasing performance, the output images are detected by a novel scene text detection method. Subsequently, the same measurement on text detection is utilized for testing the images in benchmark dataset ICDAR2013. Compared with direct text detection way, the scene text erasing process demonstrates a drastically decrease on the precision, recall and f-score. That proves the effectiveness of proposed method for erasing the text in natural scene images

arXiv.org e-Print Archive

Crossref

Validation and refinement of allometric equations for roots of northern hardwoods

Author: Hamburg Steven P.
Vadeboncoeur Matthew A.
Yanai Ruth D.
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/09/2007
Field of study

The allometric equations developed by Whittaker et al. (1974. Ecol. Monogr. 44: 233–252), at the Hubbard Brook Experimental Forest have been used to estimate biomass and productivity in northern hardwood forest systems for over three decades. Few other species-specific allometric estimates of belowground biomass are available because of the difficulty in collecting the data, and such equations are rarely validated. Using previously unpublished data from Whittaker’s sampling effort, we extended the equations to predict the root crown and lateral root components for the three dominant species of the northern hardwood forest: American beech (Fagus grandifolia Ehrh.), yellow birch (Betula alleghaniensis Britt), and sugar maple (Acer saccharum Marsh.). We also refined the allometric models by eliminating the use of very small trees for which the original data were unreliable. We validated these new models of the relationship of tree diameter to the mass of root crowns and lateral roots using root mass data collected from 12 northern hardwood stands of varying age in central New Hampshire. These models provide accurate estimates of lateral roots (diameter) in northern hardwood stands \u3e20 years old (mean error 24%–32%). For the younger stands that we studied, allometric equations substantially underestimated observed root biomass (mean error \u3e60%), presumably due to remnant mature root systems from harvested trees supporting young root-sprouted trees

UNH Scholars' Repository

Privacy and Fairness in Recommender Systems via Adversarial Training of User Representations

Author: Elazar Yanai
Resheff Yehezkel S.
Shahar Moni
Shalom Oren Sar
Publication venue
Publication date: 18/12/2018
Field of study

Latent factor models for recommender systems represent users and items as low dimensional vectors. Privacy risks of such systems have previously been studied mostly in the context of recovery of personal information in the form of usage records from the training data. However, the user representations themselves may be used together with external data to recover private user information such as gender and age. In this paper we show that user vectors calculated by a common recommender system can be exploited in this way. We propose the privacy-adversarial framework to eliminate such leakage of private information, and study the trade-off between recommender performance and leakage both theoretically and empirically using a benchmark dataset. An advantage of the proposed method is that it also helps guarantee fairness of results, since all implicit knowledge of a set of attributes is scrubbed from the representations used by the model, and thus can't enter into the decision making. We discuss further applications of this method towards the generation of deeper and more insightful recommendations.Comment: International Conference on Pattern Recognition and Method

arXiv.org e-Print Archive

Crossref