Search CORE

1,187 research outputs found

Representing aggregate works in the digital library

Author: Blandford A.
Buchanan G.
Gow J.
Larson R.R.
Rasmussen E.
Rimmer J.
Sugimoto S.
Toms E.
Warwick C.
Publication venue
Publication date: 18/06/2007
Field of study

This paper studies the challenge of representing aggregate works such as encyclopedias, collected poems and journals in heterogenous digital library collections. Reflecting on the materials used by humanities academics, we demonstrate the varied range of aggregate types and the problems of faithfully representing this in the DL interface. Aggregates are complex and pervasive, challenge common assumptions and confuse boundaries within organisational structures. Existing DL systems can only provide imperfect representation of aggregates, and alterations to document encoding are insufficient to create a faithful reproduction of the physical library. The challenge is amplified through concrete examples, and solutions are demonstrated in a well-known DL system and related to standard DL architecture

Durham Research Online

Information Extraction from Heterogeneous WWW Resources

Author: Meziane Farid
Sulong Muhammad Suhaizan
Publication venue
Publication date: 01/06/2004
Field of study

The information available on the WWW is growing very fast. However, a fundamental problem with the information on the WWW is its lack of structure making its exploitation very difficult. As a result, the desired information is getting more difficult to retrieve and extract. To overcome this problem many tools and techniques are being developed and used for locating the web pages of interest and extracting the desired information from these pages. In this paper we present the first prototype of an Information Extraction (IE) system that attempts to extract information on different Computer Science related courses offered by British Universities

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Optimization of the search engine ElasticSearch

Author: Coviaux Quentin
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

This thesis will present the work done in the Search on Demand team at Orange. It will present the optimization of the search engine Elasticsearch, the ways to bring data into it with the mean of an ETL and how relevance can be tuned using Lucene's inverted indices

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Visualizing and Interacting with Concept Hierarchies

Author: Crampes Michel
Plantié Michel
Publication venue
Publication date: 11/03/2013
Field of study

Concept Hierarchies and Formal Concept Analysis are theoretically well grounded and largely experimented methods. They rely on line diagrams called Galois lattices for visualizing and analysing object-attribute sets. Galois lattices are visually seducing and conceptually rich for experts. However they present important drawbacks due to their concept oriented overall structure: analysing what they show is difficult for non experts, navigation is cumbersome, interaction is poor, and scalability is a deep bottleneck for visual interpretation even for experts. In this paper we introduce semantic probes as a means to overcome many of these problems and extend usability and application possibilities of traditional FCA visualization methods. Semantic probes are visual user centred objects which extract and organize reduced Galois sub-hierarchies. They are simpler, clearer, and they provide a better navigation support through a rich set of interaction possibilities. Since probe driven sub-hierarchies are limited to users focus, scalability is under control and interpretation is facilitated. After some successful experiments, several applications are being developed with the remaining problem of finding a compromise between simplicity and conceptual expressivity

arXiv.org e-Print Archive

Crossref

HAL Descartes

Hal-Diderot

Lexicalização de ontologias : o relacionamento entre conteúdo e significado no contexto da Recuperação da Informação

Author: Bräscher Marisa
Schiessl Marcelo
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2017
Field of study

Esta proposta visa representar a linguagem natural na forma adequada às ontologias e vice-versa. Para tanto, propõe-se à criação semiautomática de base de léxicos em português brasileiro, contendo informações morfológicas, sintáticas e semânticas apropriadas para a leitura por máquinas, permitindo vincular dados estruturados e não estruturados, bem como integrar a leitura em modelo de recuperação da informação para aumentar a precisão. Os resultados alcançados demonstram a utilização da metodologia, no domínio de risco financeiro em português, para a elaboração da ontologia, da base léxico-semântica e da proposta do modelo de recuperação da informação semântica. Para avaliar a performance do modelo proposto, foram selecionados documentos contendo as principais definições do domínio de risco financeiro. Esses foram indexados com e sem anotação semântica. Para possibilitar a comparação entre as abordagens, foram criadas duas bases, a primeira representando a busca tradicional, e a segunda contendo o índice construído, a partir dos textos com as anotações semânticas para representar a busca semântica. A avaliação da proposta é baseada na revocação e na precisão. As consultas submetidas ao modelo mostram que a busca semântica supera o desempenho da tradicional e validam a metodologia empregada. O procedimento, embora adicione complexidade em sua elaboração, pode ser reproduzido em qualquer outro domínio.The proposal presented in this study seeks to properly represent natural language to ontologies and vice-versa. Therefore, the semi-automatic creation of a lexical database in Brazilian Portuguese containing morphological, syntactic, and semantic information that can be read by machines was proposed, allowing the link between structured and unstructured data and its integration into an information retrieval model to improve precision. The results obtained demonstrated that the methodology can be used in the risco financeiro (financial risk) domain in Portuguese for the construction of an ontology and the lexical-semantic database and the proposal of a semantic information retrieval model. In order to evaluate the performance of the proposed model, documents containing the main definitions of the financial risk domain were selected and indexed with and without semantic annotation. To enable the comparison between the approaches, two databases were created based on the texts with the semantic annotations to represent the semantic search. The first one represents the traditional search and the second contained the index built based on the texts with the semantic annotations to represent the semantic search. The evaluation of the proposal was based on recall and precision. The queries submitted to the model showed that the semantic search outperforms the traditional search and validates the methodology used. Although more complex, the procedure proposed can be used in all kinds of domains

Repositório Institucional da Universidade de Brasília

Ontology lexicalization: Relationship between content and meaning in the context of Information Retrieval

Author: ALLEMANG D.
BAEZA-YATES R.
BERNERS-LEE T.
BERNERS-LEE T.
BIRD S.
BOND F.
BRIN S.
BRÄSCHER M.
BUITELAAR P.
CASTELLS P.
CASTELLS P.
CASTELLS P.
CIMIANO P.
CONTRERAS J.
DAHLBERG I.
FELLBAUM C.
FERNÁNDEZ M.
FERNÁNDEZ M.
GRESSER J. Y.
GUARINO N.
GUARINO N.
GUHA R.
GUHA R.
HEATH T.
HOGAN A.
KARA S.
LESK M.
MAEDCHE A.
McCRAE J.
McCRAE J.
NARDI D.
NAVIGLI R.
OLIVEIRA H. G. Onto
PAIVA V.
PEDREGOSA F.
POPOV B.
REYMONET A.
ROCHA C.
SILVA F.
SÉRASSET G.
UNGER C.
VALLET D.
WALTER S.
WALTER S.
WILKS Y.
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Development of a web-based platform for Biomedical Text Mining

Author: Fernandes Emanuel Queiroga Amorim
Publication venue
Publication date: 23/12/2019
Field of study

Dissertação de mestrado em Engenharia InformáticaBiomedical Text Mining (BTM) seeks to derive high-quality information from literature in the biomedical domain, by creating tools/methodologies that can automate time-consuming tasks when searching for new information. This encompasses both Information Retrieval, the discovery and recovery of relevant documents, and Information Extraction, the capability to extract knowledge from text. In the last years, SilicoLife, with the collaboration of the University of Minho, has been developing @Note2, an open-source Java-based multiplatform BTM workbench, including libraries to perform the main BTM tasks, also provid ing user-friendly interfaces through a stand-alone application. This work addressed the development of a web-based software platform that is able to address some of the main tasks within BTM, supported by the existing core libraries from the @Note project. This included the improvement of the available RESTful server, providing some new methods and APIs, and improving others, while also developing a web-based application through calls to the API provided by the server and providing a functional user-friendly web-based interface. This work focused on the development of tasks related with Information Retrieval, addressing the efficient search of relevant documents through an integrated interface. Also, at this stage the aim was to have interfaces to visualize and explore the main entities involved in BTM: queries, documents, corpora, annotation processes entities and resources.A mineração de Literatura Biomédica (BioLM) pretende extrair informação de alta qualidade da área biomédica, através da criação de ferramentas/metodologias que consigam automatizar tarefas com elevado dispêndio de tempo. As tarefas subjacentes vão desde recuperação de informação, descoberta e recuperação de documentos relevantes para a extração de informação pertinente e a capacidade de extrair conhecimento de texto. Nos últimos anos a SilicoLife tem vindo a desenvolver uma ferramenta, o @Note2, uma BioLM Workbench multiplataforma baseada em JAVA, que executa as principais tarefas inerentes a BioLM. Também possui uma versão autónoma com uma interface amigável para o utilizador. Esta tese desenvolveu uma plataforma de software baseada na web, que é capaz de executar algumas das tarefas de BioLM, com suporte num núcleo de bibliotecas do projeto @Note. Para tal foi necessário melhorar o servidor RESTfid atual, criando novos métodos e APIs, como também desenvolver a aplicação baseada na web, com uma interface amigável para o utilizador, que comunicará com o servidor através de chamadas à sua APL Este trabalho focou o seu desenvolvimento em tarefas relacionadas com recuperação de informação, focando na pesquisa eficiente de documentos de interesse através de uma interface integrada. Nesta fase, o objetivo foi também ter um conjunto de interfaces capazes de visualizar e explorar as principais entidades envolvidas em BioLM: pesquisas, documentos, corpora, entidades relacionadas com processos de anotações e recursos

Universidade do Minho: RepositoriUM

A comprehensive analysis of acknowledgement texts in Web of Science: a case study on four scientific domains

Author: Mayr Philipp
Smirnova Nina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Analysis of acknowledgments is particularly interesting as acknowledgments may give information not only about funding, but they are also able to reveal hidden contributions to authorship and the researcher’s collaboration patterns, context in which research was conducted, and specific aspects of the academic work. The focus of the present research is the analysis of a large sample of acknowledgement texts indexed in the Web of Science (WoS) Core Collection. Record types "article" and "review" from four different scientific domains, namely social sciences, economics, oceanography and computer science, published from 2014 to 2019 in a scientific journal in English were considered. Six types of acknowledged entities, i.e., funding agency, grant number, individuals, university, corporation and miscellaneous, were extracted from the acknowledgement texts using a named entity recognition tagger and subsequently examined. A general analysis of the acknowledgement texts showed that indexing of funding information in WoS is incomplete. The analysis of the automatically extracted entities revealed differences and distinct patterns in the distribution of acknowledged entities of different types between different scientific domains. A strong association was found between acknowledged entity and scientific domain, and acknowledged entity and entity type. Only negligible correlation was found between the number of citations and the number of acknowledged entities. Generally, the number of words in the acknowledgement texts positively correlates with the number of acknowledged funding organizations, universities, individuals and miscellaneous entities. At the same time, acknowledgement texts with the larger number of sentences have more acknowledged individuals and miscellaneous categories.Die Analyse von Danksagungstexten in wissenschaftlichen Veröffentlichungen ist besonders interessant, da sie nicht nur Aufschluss über die Finanzierung geben, sondern auch verborgene Beiträge zur Autorenschaft und zu den Kooperationsmustern der Forschenden, zum Kontext, in dem die Forschung durchgeführt wurde, sowie zu bestimmten Aspekten der wissenschaftlichen Arbeit offenlegen können. Der Schwerpunkt dieser Publikation liegt auf der Analyse einer großen Stichprobe von Danksagungstexten, die in der Web of Science (WoS) Core Collection indexiert sind

SSOAR - Social Science Open Access Repository