Search CORE

134,290 research outputs found

An open database of productivity in Vietnam's social sciences and humanities for public use

Author: Ho Manh-Toan
Ho Manh-Tung
La Viet-Phuong
Nguyen Hong K. T.
Nguyen Viet-Ha T.
Pham Hiep-Hung
Vuong Quan-Hoang
Vuong Thu-Trang
Publication venue
Publication date: 01/01/2018
Field of study

This study presents a description of an open database on scientific output of Vietnamese researchers in social sciences and humanities, one that corrects for the shortcomings in current research publication databases such as data duplication, slow update, and a substantial cost of doing science. Here, using scientists’ self-reports, open online sources and cross-checking with Scopus database, we introduce a manual system and its semi-automated version of the database on the profiles of 657 Vietnamese researchers in social sciences and humanities who have published in Scopus-indexed journals from 2008 to 2018. The final system also records 973 foreign co-authors, 1,289 papers, and 789 affiliations. The data collection method, highly applicable for other sources, could be replicated in other developing countries while its content be used in cross-section, multivariate, and network data analyses. The open database is expected to help Vietnam revamp its research capacity and meet the public demand for greater transparency in science management

PhilPapers

Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

Author: Ferrández Sergio
Monachini Monica
Muñoz Rafael
Toral Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2011
Field of study

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diﬀerent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aﬀects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diﬀerent steps of the procedure (mapping, disambiguation, extraction, NE identiﬁcation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

DCU Online Research Access Service

Recommended from our members

Using TREC for cross-comparison between classic IR and ontology-based search models at a Web scale

Author: Castells Pablo
Fernandez Miriam
Lopez Vanessa
Motta Enrico
Sabou Marta
Uren Victoria
Vallet David
Publication venue
Publication date: 01/01/2009
Field of study

The construction of standard datasets and benchmarks to evaluate ontology-based search approaches and to compare then against baseline IR models is a major open problem in the semantic technologies community. In this paper we propose a novel evaluation benchmark for ontology-based IR models based on an adaptation of the well-known Cranfield paradigm (Cleverdon, 1967) traditionally used by the IR community. The proposed benchmark comprises: 1) a text document collection, 2) a set of queries and their corresponding document relevance judgments and 3) a set of ontologies and Knowledge Bases covering the query topics. The document collection and the set of queries and judgments are taken from one of the most widely used datasets in the IR community, the TREC Web track. As a use case example we apply the proposed benchmark to compare a real ontology-based search model (Fernandez, et al., 2008) against the best IR systems of TREC 9 and TREC 2001 competitions. A deep analysis of the strengths and weaknesses of this benchmark and a discussion of how it can be used to evaluate other ontology-based search systems is also included at the end of the paper

Open Research Online (The Open University)

Biblos-e Archivo

Aligning archive maps and extracting footprints for analysis of historic urban environments.

Author: A.M. Day
Delaunay
Dornaika
Fischer
Guidi
Hu
Kim
Laycock
McMaster
Morgan
P.G. Brown
Parish
Peng
R.G. Laycock
Rosten
S.D. Laycock
Shimuzu
Suzuki
Zhang
Publication venue: 'Elsevier BV'
Publication date: 19/01/2011
Field of study

Archive cartography and archaeologist's sketches are invaluable resources when analysing a historic town or city. A virtual reconstruction of a city provides the user with the ability to navigate and explore an environment which no longer exists to obtain better insight into its design and purpose. However, the process of reconstructing the city from maps depicting features such as building footprints and roads can be labour intensive. In this paper we present techniques to aid in the semi-automatic extraction of building footprints from digital images of archive maps and sketches. Archive maps often exhibit problems in the form of inaccuracies and inconsistencies in scale which can lead to incorrect reconstructions. By aligning archive maps to accurate modern vector data one may reduce these problems. Furthermore, the efficiency of the footprint extraction methods may be improved by aligning either modern vector data or previously extracted footprints, since common elements can be identified between maps of differing time periods and only the difference between the two needs to be extracted. An evaluation of two alignment approaches is presented: using a linear affine transformation and a set of piecewise linear affine transformations

Crossref

University of East Anglia digital repository

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

Author: Guo Weidong
Lai Kunfeng
Lin Jinghong
Liu Bang
Niu Di
Wang Chaoyue
Xu Shunnan
Xu Yu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/05/2019
Field of study

Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201

arXiv.org e-Print Archive

Crossref