218,907 research outputs found

    Ranking Archived Documents for Structured Queries on Semantic Layers

    Full text link
    Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of "ranking archived documents for structured queries on semantic layers". Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitation

    Toward a Relation Hierarchy for Information Retrieval

    Get PDF
    Natural language text can be seen as a symbolic representation of a cognitive state of the creator that comprises concepts and the relations among those concepts. Much work has been done in Information Science, especially within Information Retrieval (IR), concerning the handling of concepts, most notably in the form of keywords. Much less effort has been spent toward the understanding and handling of the semantic relations that contextually bind concepts together. While it has been shown (Wang, et al., 1985) that the use of these semantic relations for query enhancement can increase retrieval effectiveness, the proper handling of semantic relations has a much wider application than just query enhancement. Once relations inherent in text are identified and captured, they can be used to provide contextual information to the concepts in the representations of the text, which otherwise would be treated as if they were independent and separate

    Learning Analogies and Semantic Relations

    Get PDF
    We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the Scholastic Aptitude Test (SAT). A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct). We motivate this research by relating it to work in cognitive science and linguistics, and by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for these challenging problems

    Topology Analysis of International Networks Based on Debates in the United Nations

    Get PDF
    In complex, high dimensional and unstructured data it is often difficult to extract meaningful patterns. This is especially the case when dealing with textual data. Recent studies in machine learning, information theory and network science have developed several novel instruments to extract the semantics of unstructured data, and harness it to build a network of relations. Such approaches serve as an efficient tool for dimensionality reduction and pattern detection. This paper applies semantic network science to extract ideological proximity in the international arena, by focusing on the data from General Debates in the UN General Assembly on the topics of high salience to international community. UN General Debate corpus (UNGDC) covers all high-level debates in the UN General Assembly from 1970 to 2014, covering all UN member states. The research proceeds in three main steps. First, Latent Dirichlet Allocation (LDA) is used to extract the topics of the UN speeches, and therefore semantic information. Each country is then assigned a vector specifying the exposure to each of the topics identified. This intermediate output is then used in to construct a network of countries based on information theoretical metrics where the links capture similar vectorial patterns in the topic distributions. Topology of the networks is then analyzed through network properties like density, path length and clustering. Finally, we identify specific topological features of our networks using the map equation framework to detect communities in our networks of countries

    Tag-Aware Recommender Systems: A State-of-the-art Survey

    Get PDF
    In the past decade, Social Tagging Systems have attracted increasing attention from both physical and computer science communities. Besides the underlying structure and dynamics of tagging systems, many efforts have been addressed to unify tagging information to reveal user behaviors and preferences, extract the latent semantic relations among items, make recommendations, and so on. Specifically, this article summarizes recent progress about tag-aware recommender systems, emphasizing on the contributions from three mainstream perspectives and approaches: network-based methods, tensor-based methods, and the topic-based methods. Finally, we outline some other tag-related works and future challenges of tag-aware recommendation algorithms.Comment: 19 pages, 3 figure

    Construction of Geo-Ontology Knowledge Base about Spatial Relations

    Get PDF
    The Chinese Academy of Science (CAS); National Natural Science Foundation of China (NSFC); Inst. Geogr. Sci. Nat. Resour. Res. Chin. Acad. Sci. (CAS); Fuzhou University; University of Calabria<span class="MedBlackText">The spatial relation analysis, query and reasoning in current geographic information systems usually generalize geographic objects into geometric points, lines and polygons. However, in the real world and human's cognition, geographic objects are not simply geometric objects but spatially distributed objects with geographic semantics. If the geographic entities belong to different types, we may use different words to describe their spatial relationship although their shapes and geometric relationships are exactly the same. Aiming at above phenomenon, this paper analyzes what kinds of semantic information are involved in spatial relationship descriptions and queries. Based on the semantic analysis of geographic relations, an ontological knowledge base is established to store the knowledge of spatial relations between geographic objects. The knowledge base is implemented with Prote&acute;ge&acute; and OWL, and finally is connected to the spatial relation query system.</span

    Managing corporate memory on the semantic web

    Get PDF
    Corporate memory (CM) is the total body of data, information and knowledge required to deliver the strategic aims and objectives of an organization. In the current market, the rapidly increasing volume of unstructured documents in the enterprises has brought the challenge of building an autonomic framework to acquire, represent, learn and maintain CM, and efficiently reason from it to aid in knowledge discovery and reuse. The concept of semantic web is being introduced in the enterprises to structure information in a machine readable way and enhance the understandability of the disparate information. Due to the continual popularity of the semantic web, this paper develops a framework for CM management on the semantic web. The proposed approach gleans information from the documents, converts into a semantic web resource using resource description framework (RDF) and RDF Schema and then identifies relations among them using latent semantic analysis technique. The efficacy of the proposed approach is demonstrated through empirical experiments conducted on two case studies. © 2014 Springer Science+Business Media New York
    • …
    corecore