58,318 research outputs found
The state-of-the-art in personalized recommender systems for social networking
With the explosion of Web 2.0 application such as blogs, social and professional networks, and various other types of social media, the rich online information and various new sources of knowledge flood users and hence pose a great challenge in terms of information overload. It is critical to use intelligent agent software systems to assist users in finding the right information from an abundance of Web data. Recommender systems can help users deal with information overload problem efficiently by suggesting items (e.g., information and products) that match users’ personal interests. The recommender technology has been successfully employed in many applications such as recommending films, music, books, etc. The purpose of this report is to give an overview of existing technologies for building personalized recommender systems in social networking environment, to propose a research direction for addressing user profiling and cold start problems by exploiting user-generated content newly available in Web 2.0
Open-World Knowledge Graph Completion
Knowledge Graphs (KGs) have been applied to many tasks including Web search,
link prediction, recommendation, natural language processing, and entity
linking. However, most KGs are far from complete and are growing at a rapid
pace. To address these problems, Knowledge Graph Completion (KGC) has been
proposed to improve KGs by filling in its missing connections. Unlike existing
methods which hold a closed-world assumption, i.e., where KGs are fixed and new
entities cannot be easily added, in the present work we relax this assumption
and propose a new open-world KGC task. As a first attempt to solve this task we
introduce an open-world KGC model called ConMask. This model learns embeddings
of the entity's name and parts of its text-description to connect unseen
entities to the KG. To mitigate the presence of noisy text descriptions,
ConMask uses a relationship-dependent content masking to extract relevant
snippets and then trains a fully convolutional neural network to fuse the
extracted snippets with entities in the KG. Experiments on large data sets,
both old and new, show that ConMask performs well in the open-world KGC task
and even outperforms existing KGC models on the standard closed-world KGC task.Comment: 8 pages, accepted to AAAI 201
From Data Fusion to Knowledge Fusion
The task of {\em data fusion} is to identify the true values of data items
(eg, the true date of birth for {\em Tom Cruise}) among multiple observed
values drawn from different sources (eg, Web sites) of varying (and unknown)
reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of
various fusion methods on Deep Web data. In this paper, we study the
applicability and limitations of different fusion techniques on a more
challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true
subject-predicate-object triples extracted by multiple information extractors
from multiple information sources. These extractors perform the tasks of entity
linkage and schema alignment, thus introducing an additional source of noise
that is quite different from that traditionally considered in the data fusion
literature, which only focuses on factual errors in the original sources. We
adapt state-of-the-art data fusion techniques and apply them to a knowledge
base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B
Web pages, which is three orders of magnitude larger than the data sets used in
previous data fusion papers. We show great promise of the data fusion
approaches in solving the knowledge fusion problem, and suggest interesting
research directions through a detailed error analysis of the methods.Comment: VLDB'201
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
- …