2,878 research outputs found
Exploring knowledge bases for engineering a user interests hierarchy for social network applications
Master of ScienceDepartment of Computing and Information SciencesDoina CarageaGurdip SinghIn the recent years, social networks have become an integral part of our lives. Their outgrowth has resulted in opportunities for interesting data mining problems, such as interest or friendship recommendations. A global ontology over the interests specified by the users of a social network is essential for accurate recommendations. The focus of this work is on engineering such an interest ontology. In particular, given that the resulting ontology is meant to be used for data mining applications to social network problems, we explore only hierarchical ontologies. We propose, evaluate and compare three approaches to engineer an interest hierarchy. The proposed approaches make use of two popular knowledge bases, Wikipedia and Directory Mozilla, to extract interest definitions and/or relationships between interests. More precisely, the first approach uses Wikipedia to find interest definitions, the latent semantic analysis technique to measure the similarity between interests based on their definitions, and an agglomerative clustering algorithm to group similar interests into higher level concepts. The second approach uses the Wikipedia Category Graph to extract relationships between interests. Similarly, the third approach uses Directory Mozilla to extract relationships between interests. Our results indicate that the third approach, although the simplest, is the most effective for building an ontology over user interests. We use the ontology produced by the third approach to construct interest based features. These features are further used to learn classifiers for the friendship prediction task. The results show the usefulness of the ontology with respect to the results obtained in absence of the ontology
PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature
Scientific information extraction (SciIE), which aims to automatically
extract information from scientific literature, is becoming more important than
ever. However, there are no existing SciIE datasets for polymer materials,
which is an important class of materials used ubiquitously in our daily lives.
To bridge this gap, we introduce POLYIE, a new SciIE dataset for polymer
materials. POLYIE is curated from 146 full-length polymer scholarly articles,
which are annotated with different named entities (i.e., materials, properties,
values, conditions) as well as their N-ary relations by domain experts. POLYIE
presents several unique challenges due to diverse lexical formats of entities,
ambiguity between entities, and variable-length relations. We evaluate
state-of-the-art named entity extraction and relation extraction models on
POLYIE, analyze their strengths and weaknesses, and highlight some difficult
cases for these models. To the best of our knowledge, POLYIE is the first SciIE
benchmark for polymer materials, and we hope it will lead to more research
efforts from the community on this challenging task. Our code and data are
available on: https://github.com/jerry3027/PolyIE.Comment: Work in progres
Recurrent Pixel Embedding for Instance Grouping
We introduce a differentiable, end-to-end trainable framework for solving
pixel-level grouping problems such as instance segmentation consisting of two
novel components. First, we regress pixels into a hyper-spherical embedding
space so that pixels from the same group have high cosine similarity while
those from different groups have similarity below a specified margin. We
analyze the choice of embedding dimension and margin, relating them to
theoretical results on the problem of distributing points uniformly on the
sphere. Second, to group instances, we utilize a variant of mean-shift
clustering, implemented as a recurrent neural network parameterized by kernel
bandwidth. This recurrent grouping module is differentiable, enjoys convergent
dynamics and probabilistic interpretability. Backpropagating the group-weighted
loss through this module allows learning to focus on only correcting embedding
errors that won't be resolved during subsequent clustering. Our framework,
while conceptually simple and theoretically abundant, is also practically
effective and computationally efficient. We demonstrate substantial
improvements over state-of-the-art instance segmentation for object proposal
generation, as well as demonstrating the benefits of grouping loss for
classification tasks such as boundary detection and semantic segmentation
Self-Supervised Transformer with Domain Adaptive Reconstruction for General Face Forgery Video Detection
Face forgery videos have caused severe social public concern, and various
detectors have been proposed recently. However, most of them are trained in a
supervised manner with limited generalization when detecting videos from
different forgery methods or real source videos. To tackle this issue, we
explore to take full advantage of the difference between real and forgery
videos by only exploring the common representation of real face videos. In this
paper, a Self-supervised Transformer cooperating with Contrastive and
Reconstruction learning (CoReST) is proposed, which is first pre-trained only
on real face videos in a self-supervised manner, and then fine-tuned a linear
head on specific face forgery video datasets. Two specific auxiliary tasks
incorporated contrastive and reconstruction learning are designed to enhance
the representation learning. Furthermore, a Domain Adaptive Reconstruction
(DAR) module is introduced to bridge the gap between different forgery domains
by reconstructing on unlabeled target videos when fine-tuning. Extensive
experiments on public datasets demonstrate that our proposed method performs
even better than the state-of-the-art supervised competitors with impressive
generalization
A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects
Recently, Minimum Cost Multicut Formulations have been proposed and proven to
be successful in both motion trajectory segmentation and multi-target tracking
scenarios. Both tasks benefit from decomposing a graphical model into an
optimal number of connected components based on attractive and repulsive
pairwise terms. The two tasks are formulated on different levels of granularity
and, accordingly, leverage mostly local information for motion segmentation and
mostly high-level information for multi-target tracking. In this paper we argue
that point trajectories and their local relationships can contribute to the
high-level task of multi-target tracking and also argue that high-level cues
from object detection and tracking are helpful to solve motion segmentation. We
propose a joint graphical model for point trajectories and object detections
whose Multicuts are solutions to motion segmentation {\it and} multi-target
tracking problems at once. Results on the FBMS59 motion segmentation benchmark
as well as on pedestrian tracking sequences from the 2D MOT 2015 benchmark
demonstrate the promise of this joint approach
User modeling for exploratory search on the Social Web. Exploiting social bookmarking systems for user model extraction, evaluation and integration
Exploratory search is an information seeking strategy that extends be- yond the query-and-response paradigm of traditional Information Retrieval models. Users browse through information to discover novel content and to learn more about the newly discovered things. Social bookmarking systems integrate well with exploratory search, because they allow one to search, browse, and filter social bookmarks.
Our contribution is an exploratory tag search engine that merges social bookmarking with exploratory search. For this purpose, we have applied collaborative filtering to recommend tags to users. User models are an im- portant prerequisite for recommender systems. We have produced a method to algorithmically extract user models from folksonomies, and an evaluation method to measure the viability of these user models for exploratory search. According to our evaluation web-scale user modeling, which integrates user models from various services across the Social Web, can improve exploratory search. Within this thesis we also provide a method for user model integra- tion.
Our exploratory tag search engine implements the findings of our user model extraction, evaluation, and integration methods. It facilitates ex- ploratory search on social bookmarks from Delicious and Connotea and pub- lishes extracted user models as Linked Data
Automated Georeferencing of Antarctic Species
Many text documents in the biological domain contain references to the toponym of specific phenomena (e.g. species sightings) in natural language form "In Garwood Valley summer activity was 0.2% for Umbilicaria aprina and 1.7% for Caloplaca sp. ..."
While methods have been developed to extract place names from documents, and attention has been given to the interpretation of spatial prepositions, the ability to connect toponym mentions in text with the phenomena to which they refer (in this case species) has been given limited attention, but would be of considerable benefit for the task of mapping specific phenomena mentioned in text documents.
As part of work to create a pipeline to automate georeferencing of species within legacy documents, this paper proposes a method to: (1) recognise species and toponyms within text and (2) match each species mention to the relevant toponym mention. Our methods find significant promise in a bespoke rules- and dictionary-based approach to recognise species within text (F1 scores up to 0.87 including partial matches) but less success, as yet, recognising toponyms using multiple gazetteers combined with an off the shelf natural language processing tool (F1 up to 0.62).
Most importantly, we offer a contribution to the relatively nascent area of matching toponym references to the object they locate (in our case species), including cases in which the toponym and species are in different sentences. We use tree-based models to achieve precision as high as 0.88 or an F1 score up to 0.68 depending on the downsampling rate. Initial results out perform previous research on detecting entity relationships that may cross sentence boundaries within biomedical text, and differ from previous work in specifically addressing species mapping
- …