227 research outputs found
Measuring Societal Biases in Text Corpora via First-Order Co-occurrence
Text corpora are used to study societal biases, typically through statistical
models such as word embeddings. The bias of a word towards a concept is
typically estimated using vectors similarity, measuring whether the word and
concept words share other words in their contexts. We argue that this
second-order relationship introduces unrelated concepts into the measure, which
causes an imprecise measurement of the bias. We propose instead to measure bias
using the direct normalized co-occurrence associations between the word and the
representative concept words, a first-order measure, by reconstructing the
co-occurrence estimates inherent in the word embedding models. To study our
novel corpus bias measurement method, we calculate the correlation of the
gender bias values estimated from the text to the actual gender bias statistics
of the U.S. job market, provided by two recent collections. The results show a
consistently higher correlation when using the proposed first-order measure
with a variety of word embedding models, as well as a more severe degree of
bias, especially to female in a few specific occupations
Morphological Segmentation on Learned Boundaries
International audienceColour information is usually not enough to segment natural complex scenes. Texture contains relevant information that segmentation approaches should consider. Martin et al. [Learning to detect natural image boundaries using local brightness, color, and texture cues, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (5) (2004) 530-549] proposed a particularly interesting colour-texture gradient. This gradient is not suitable for Watershed-based approaches because it contains gaps. In this paper, we propose a method based on the distance function to fill these gaps. Then, two hierarchical Watershed-based approaches, the Watershed using volume extinction values and the Waterfall, are used to segment natural complex scenes. Resulting segmentations are thoroughly evaluated and compared to segmentations produced by the Normalised Cuts algorithm using the Berkeley segmentation dataset and benchmark. Evaluations based on both the area overlap and boundary agreement with manual segmentations are performed
- …