17,113 research outputs found
TopExNet: Entity-Centric Network Topic Exploration in News Streams
The recent introduction of entity-centric implicit network representations of
unstructured text offers novel ways for exploring entity relations in document
collections and streams efficiently and interactively. Here, we present
TopExNet as a tool for exploring entity-centric network topics in streams of
news articles. The application is available as a web service at
https://topexnet.ifi.uni-heidelberg.de/ .Comment: Published in Proceedings of the Twelfth ACM International Conference
on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February
11-15, 201
The HyperBagGraph DataEdron: An Enriched Browsing Experience of Multimedia Datasets
Traditional verbatim browsers give back information in a linear way according
to a ranking performed by a search engine that may not be optimal for the
surfer. The latter may need to assess the pertinence of the information
retrieved, particularly when she wants to explore other facets of a
multi-facetted information space. For instance, in a multimedia dataset
different facets such as keywords, authors, publication category, organisations
and figures can be of interest. The facet simultaneous visualisation can help
to gain insights on the information retrieved and call for further searches.
Facets are co-occurence networks, modeled by HyperBag-Graphs -- families of
multisets -- and are in fact linked not only to the publication itself, but to
any chosen reference. These references allow to navigate inside the dataset and
perform visual queries. We explore here the case of scientific publications
based on Arxiv searches.Comment: Extension of the hypergraph framework shortly presented in
arXiv:1809.00164 (possible small overlaps); use the theoretical framework of
hb-graphs presented in arXiv:1809.0019
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
A User-Centered Concept Mining System for Query and Document Understanding at Tencent
Concepts embody the knowledge of the world and facilitate the cognitive
processes of human beings. Mining concepts from web documents and constructing
the corresponding taxonomy are core research problems in text understanding and
support many downstream tasks such as query analysis, knowledge base
construction, recommendation, and search. However, we argue that most prior
studies extract formal and overly general concepts from Wikipedia or static web
pages, which are not representing the user perspective. In this paper, we
describe our experience of implementing and deploying ConcepT in Tencent QQ
Browser. It discovers user-centered concepts at the right granularity
conforming to user interests, by mining a large amount of user queries and
interactive search click logs. The extracted concepts have the proper
granularity, are consistent with user language styles and are dynamically
updated. We further present our techniques to tag documents with user-centered
concepts and to construct a topic-concept-instance taxonomy, which has helped
to improve search as well as news feeds recommendation in Tencent QQ Browser.
We performed extensive offline evaluation to demonstrate that our approach
could extract concepts of higher quality compared to several other existing
methods. Our system has been deployed in Tencent QQ Browser. Results from
online A/B testing involving a large number of real users suggest that the
Impression Efficiency of feeds users increased by 6.01% after incorporating the
user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201
Weaving Entities into Relations: From Page Retrieval to Relation Mining on the Web
With its sheer amount of information, the Web is clearly an important frontier for data mining. While Web mining must start with content on the Web, there is no effective ``search-based'' mechanism to help sifting through the information on the Web. Our goal is to provide a such online search-based facility for supporting query primitives, upon which Web mining applications can be built. As a first step, this paper aims at entity-relation discovery, or E-R discovery, as a useful function-- to weave scattered entities on the Web into coherent relations. To begin with, as our proposal, we formalize the concept of E-R discovery. Further, to realize E-R discovery, as our main thesis, we abstract tuple ranking-- the essential challenge of E-R discovery-- as pattern-based cooccurrence analysis. Finally, as our key insight, we observe that such relation mining shares the same core functions as traditional page-retrieval systems, which enables us to build the new E-R discovery upon today's search engines, almost for free. We report our system prototype and testbed, WISDM-ER, with real Web corpus. Our case studies have demonstrated a high promise, achieving 83%-91% accuracy for real benchmark queries-- and thus the real possibilities of enabling ad-hoc Web mining tasks with online E-R discovery
Ontology-assisted database integration to support natural language processing and biomedical data-mining
Successful biomedical data mining and information extraction require a complete picture of biological phenomena such as genes, biological processes, and diseases; as these exist on different levels of granularity. To realize this goal, several freely available heterogeneous databases as well as proprietary structured datasets have to be integrated into a single global customizable scheme. We will present a tool to integrate different biological data sources by mapping them to a proprietary biomedical ontology that has been developed for the purposes of making computers understand medical natural language
- …