780 research outputs found
PRIME: A System for Multi-lingual Patent Retrieval
Given the growing number of patents filed in multiple countries, users are
interested in retrieving patents across languages. We propose a multi-lingual
patent retrieval system, which translates a user query into the target
language, searches a multilingual database for patents relevant to the query,
and improves the browsing efficiency by way of machine translation and
clustering. Our system also extracts new translations from patent families
consisting of comparable patents, to enhance the translation dictionary
Mapping Science Based on Research Content Similarity
Maps of science representing the structure of science help us understand science and technology development. Thus, research in scientometrics has developed techniques for analyzing research activities and for measuring their relationships; however, navigating the recent scientific landscape is still challenging, since conventional inter-citation and co-citation analysis has difficulty in applying to recently published articles and ongoing projects. Therefore, to characterize what is being attempted in the current scientific landscape, this article proposes a content-based method of locating research articles/projects in a multi-dimensional space using word/paragraph embedding. Specifically, for addressing an unclustered problem, we introduced cluster vectors based on the information entropies of technical concepts. The experimental results showed that our method formed a clustered map from approx. 300Â k IEEE articles and NSF projects from 2012 to 2016. Finally, we confirmed that formation of specific research areas can be captured as changes in the network structure
A survey on thesauri application in automatic natural language processing
This paper is devoted to investigate efficiency of thesauri use in popular natural language processing (NLP) fields: information retrieval and analysis of texts and subject areas. A thesaurus is a natural language resource that models a subject area and can reflect human expert's knowledge in many NLP tasks. The main target of this survey is to determine how much thesauri affect processing quality and where they can provide better performance. We describe studies that use different types of thesauri, discuss contribution of the thesaurus into achieved results, and propose directions for future research in the thesaurus field
A Survey of Multilingual Text Retrieval
This report reviews the present state of the art
in selection of texts in one language based on queries in another, a
problem we refer to as ``multilingual'' text retrieval. Present
applications of multilingual text retrieval systems are limited by the
cost and complexity of developing and using the multilingual thesauri
on which they are based and by the level of user training that is
required to achieve satisfactory search effectiveness. A general
model for multilingual text retrieval is used to review the
development of the field and to describe modern production and
experimental systems. The report concludes with some observations on
the present state of the art and an extensive bibliography of the
technical literature on multilingual text retrieval. The research
reported herein was supported, in part, by Army Research
Office contract DAAL03-91-C-0034 through Battelle Corporation, NSF NYI
IRI-9357731, Alfred P. Sloan Research Fellow Award BR3336, and a
General Research Board Semester Award.
(Also cross-referenced as UMIACS-TR-96-19
Theory and Applications for Advanced Text Mining
Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
- …