218,907 research outputs found
Ranking Archived Documents for Structured Queries on Semantic Layers
Archived collections of documents (like newspaper and web archives) serve as
important information sources in a variety of disciplines, including Digital
Humanities, Historical Science, and Journalism. However, the absence of
efficient and meaningful exploration methods still remains a major hurdle in
the way of turning them into usable sources of information. A semantic layer is
an RDF graph that describes metadata and semantic information about a
collection of archived documents, which in turn can be queried through a
semantic query language (SPARQL). This allows running advanced queries by
combining metadata of the documents (like publication date) and content-based
semantic information (like entities mentioned in the documents). However, the
results returned by such structured queries can be numerous and moreover they
all equally match the query. In this paper, we deal with this problem and
formalize the task of "ranking archived documents for structured queries on
semantic layers". Then, we propose two ranking models for the problem at hand
which jointly consider: i) the relativeness of documents to entities, ii) the
timeliness of documents, and iii) the temporal relations among the entities.
The experimental results on a new evaluation dataset show the effectiveness of
the proposed models and allow us to understand their limitation
Toward a Relation Hierarchy for Information Retrieval
Natural language text can be seen as a symbolic representation of a cognitive state of the creator that comprises concepts and the relations among those concepts. Much work has been done in Information Science, especially within Information Retrieval (IR), concerning the handling of concepts, most notably in the form of keywords. Much less effort has been spent toward the understanding and handling of the semantic relations that contextually bind concepts together. While it has been shown (Wang, et al., 1985) that the use of these semantic relations for query enhancement can increase retrieval effectiveness, the proper handling of semantic relations has a much wider application than just query enhancement. Once relations inherent in text are identified and captured, they can be used to provide contextual information to the concepts in the representations of the text, which otherwise would be treated as if they were independent and separate
Learning Analogies and Semantic Relations
We present an algorithm for learning from unlabeled text, based on the
Vector Space Model (VSM) of information retrieval, that can solve verbal
analogy questions of the kind found in the Scholastic Aptitude Test (SAT).
A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D";
for example, mason:stone::carpenter:wood. SAT analogy questions provide
a word pair, A:B, and the problem is to select the most analogous word
pair, C:D, from a set of five choices. The VSM algorithm correctly
answers 47% of a collection of 374 college-level analogy questions
(random guessing would yield 20% correct). We motivate this research by
relating it to work in cognitive science and linguistics, and by applying
it to a difficult problem in natural language processing, determining
semantic relations in noun-modifier pairs. The problem is to classify a
noun-modifier pair, such as "laser printer", according to the semantic
relation between the noun (printer) and the modifier (laser). We use a
supervised nearest-neighbour algorithm that assigns a class to a given
noun-modifier pair by finding the most analogous noun-modifier pair in
the training data. With 30 classes of semantic relations, on a collection
of 600 labeled noun-modifier pairs, the learning algorithm attains an F
value of 26.5% (random guessing: 3.3%). With 5 classes of semantic
relations, the F value is 43.2% (random: 20%). The performance is
state-of-the-art for these challenging problems
Topology Analysis of International Networks Based on Debates in the United Nations
In complex, high dimensional and unstructured data it is often difficult to
extract meaningful patterns. This is especially the case when dealing with
textual data. Recent studies in machine learning, information theory and
network science have developed several novel instruments to extract the
semantics of unstructured data, and harness it to build a network of relations.
Such approaches serve as an efficient tool for dimensionality reduction and
pattern detection. This paper applies semantic network science to extract
ideological proximity in the international arena, by focusing on the data from
General Debates in the UN General Assembly on the topics of high salience to
international community. UN General Debate corpus (UNGDC) covers all high-level
debates in the UN General Assembly from 1970 to 2014, covering all UN member
states. The research proceeds in three main steps. First, Latent Dirichlet
Allocation (LDA) is used to extract the topics of the UN speeches, and
therefore semantic information. Each country is then assigned a vector
specifying the exposure to each of the topics identified. This intermediate
output is then used in to construct a network of countries based on information
theoretical metrics where the links capture similar vectorial patterns in the
topic distributions. Topology of the networks is then analyzed through network
properties like density, path length and clustering. Finally, we identify
specific topological features of our networks using the map equation framework
to detect communities in our networks of countries
Tag-Aware Recommender Systems: A State-of-the-art Survey
In the past decade, Social Tagging Systems have attracted increasing
attention from both physical and computer science communities. Besides the
underlying structure and dynamics of tagging systems, many efforts have been
addressed to unify tagging information to reveal user behaviors and
preferences, extract the latent semantic relations among items, make
recommendations, and so on. Specifically, this article summarizes recent
progress about tag-aware recommender systems, emphasizing on the contributions
from three mainstream perspectives and approaches: network-based methods,
tensor-based methods, and the topic-based methods. Finally, we outline some
other tag-related works and future challenges of tag-aware recommendation
algorithms.Comment: 19 pages, 3 figure
Construction of Geo-Ontology Knowledge Base about Spatial Relations
The Chinese Academy of Science (CAS); National Natural Science Foundation of China (NSFC); Inst. Geogr. Sci. Nat. Resour. Res. Chin. Acad. Sci. (CAS); Fuzhou University; University of Calabria<span class="MedBlackText">The spatial relation analysis, query and reasoning in current geographic information systems usually generalize geographic objects into geometric points, lines and polygons. However, in the real world and human's cognition, geographic objects are not simply geometric objects but spatially distributed objects with geographic semantics. If the geographic entities belong to different types, we may use different words to describe their spatial relationship although their shapes and geometric relationships are exactly the same. Aiming at above phenomenon, this paper analyzes what kinds of semantic information are involved in spatial relationship descriptions and queries. Based on the semantic analysis of geographic relations, an ontological knowledge base is established to store the knowledge of spatial relations between geographic objects. The knowledge base is implemented with Prote´ge´ and OWL, and finally is connected to the spatial relation query system.</span
Managing corporate memory on the semantic web
Corporate memory (CM) is the total body of data, information and knowledge required to deliver the strategic aims and objectives of an organization. In the current market, the rapidly increasing volume of unstructured documents in the enterprises has brought the challenge of building an autonomic framework to acquire, represent, learn and maintain CM, and efficiently reason from it to aid in knowledge discovery and reuse. The concept of semantic web is being introduced in the enterprises to structure information in a machine readable way and enhance the understandability of the disparate information. Due to the continual popularity of the semantic web, this paper develops a framework for CM management on the semantic web. The proposed approach gleans information from the documents, converts into a semantic web resource using resource description framework (RDF) and RDF Schema and then identifies relations among them using latent semantic analysis technique. The efficacy of the proposed approach is demonstrated through empirical experiments conducted on two case studies. © 2014 Springer Science+Business Media New York
- …