6 research outputs found

    DBFIRE: recuperação de documentos relacionados a consultas a banco de dados.

    Get PDF
    Bancos de dados e documentos são comumente mantidos em separado nas organizações, controlados por Sistemas Gerenciadores de Bancos de Dados (SGBDs) e Sistemas de Recuperação de Informação (SRIs), respectivamente. Essa separação tem ligação com a natureza dos dados manipulados: estruturados, no primeiro caso; não estruturados, no segundo. Enquanto os SGBDs processam consultas exatas a bancos de dados, os SRIs recuperam documentos com base em buscas por palavras-chave, que são inerentemente imprecisas. Apesar disso, a integração desses sistemas pode resultar em grandes ganhos ao usuário, uma vez que, numa mesma organização, bancos de dados e documentos frequentemente se referem a entidades comuns. Uma das possibilidades de integração é a recuperação de documentos associados a uma dada consulta a banco de dados. Por exemplo, considerando a consulta "Quais os clientes com contratos acima de X reais?", como recuperar documentos que possam estar associados a esta consulta, como os próprios contratos desses clientes, propostas de novas vendas em aberto, entre outros documentos? A solução proposta nesta tese baseia-se numa abordagem especial de expansão de busca para a recuperação de documentos: um conjunto inicial de palavras-chave é expandido com termos potencialmente úteis contidos no resultado de uma consulta a banco de dados; o conjunto de palavras-chave resultante é então enviado a um SRI para a recuperação dos documentos de interesse para a consulta. Propõe-se ainda uma nova forma de ordenação dos termos para expansão: partindo do pressuposto de que uma consulta a banco de dados representa com exatidão a necessidade de informação do usuário, a seleção dos termos é medida por sua difusão ao longo do resultado da consulta. Essa medida é usada não apenas para selecionar os melhores termos, mas também para estabelecer seus pesos relativos na expansão. Para validar o método proposto, foram realizados experimentos em dois domínios distintos, com resultados evidenciando melhorias significativas em termos da recuperação de documentos relacionados às consultas na comparação com outros modelos destacados na literatura

    Personal Knowledge Models with Semantic Technologies

    Get PDF
    Conceptual Data Structures (CDS) is a unified meta-model for representing knowledge cues in varying degrees of granularity, structuredness, and formality. CDS consists of: (1) A simple, expressive data-model; (2) A relation ontology which unifies the relations found in cognitive models of personal knowledge management tools, e. g., documents, mind-maps, hypertext, or semantic wikis. (3) An interchange format for structured text. Implemented prototypes have been evaluated

    Question Answering With Semex At Trec 2005

    No full text
    We describe the SEMEX question-answering system and report its performance in the TREC 2005 Question Answering track. Since this was SEMEX\u27s first year participating in the TREC evaluations, implementation teething pains were expected and indeed encountered. Nevertheless, performance against difficult factoid and list questions was supportive of the question answering approach that was implemented

    Question Answering with SEMEX at TREC 2005

    No full text
    We describe the SEMEX question-answering system and report its performance in the TREC 2005 Question Answering track. Since this was SEMEX 's first year participating in the TREC evaluations, implementation teething pains were expected and indeed encountered. Nevertheless, performance against difficult factoid and list questions was supportive of the question answering approach that was implemented

    Extensible metadata management framework for personal data lake

    Get PDF
    Common Internet users today are inundated with a deluge of diverse data being generated and siloed in a variety of digital services, applications, and a growing body of personal computing devices as we enter the era of the Internet of Things. Alongside potential privacy compromises, users are facing increasing difficulties in managing their data and are losing control over it. There appears to be a de facto agreement in business and scientific fields that there is critical new value and interesting insight that can be attained by users from analysing their own data, if only it can be freed from its silos and combined with other data in meaningful ways. This thesis takes the point of view that users should have an easy-to-use modern personal data management solution that enables them to centralise and efficiently manage their data by themselves, under their full control, for their best interests, with minimum time and efforts. In that direction, we describe the basic architecture of a management solution that is designed based on solid theoretical foundations and state of the art big data technologies. This solution (called Personal Data Lake - PDL) collects the data of a user from a plurality of heterogeneous personal data sources and stores it into a highly-scalable schema-less storage repository. To simplify the user-experience of PDL, we propose a novel extensible metadata management framework (MMF) that: (i) annotates heterogeneous data with rich lineage and semantic metadata, (ii) exploits the garnered metadata for automating data management workflows in PDL – with extensive focus on data integration, and (iii) facilitates the use and reuse of the stored data for various purposes by querying it on the metadata level either directly by the user or through third party personal analytics services. We first show how the proposed MMF is positioned in PDL architecture, and then describe its principal components. Specifically, we introduce a simple yet effective lineage manager for tracking the provenance of personal data in PDL. We then introduce an ontology-based data integration component called SemLinker which comprises two new algorithms; the first concerns generating graph-based representations to express the native schemas of (semi) structured personal data, and the second algorithm metamodels the extracted representations to a common extensible ontology. SemLinker outputs are utilised by MMF to generate user-tailored unified views that are optimised for querying heterogeneous personal data through low-level SPARQL or high-level SQL-like queries. Next, we introduce an unsupervised automatic keyphrase extraction algorithm called SemCluster that specialises in extracting thematically important keyphrases from unstructured data, and associating each keyphrase with ontological information drawn from an extensible WordNet-based ontology. SemCluster outputs serve as semantic metadata and are utilised by MMF to annotate unstructured contents in PDL, thus enabling various management functionalities such as relationship discovery and semantic search. Finally, we describe how MMF can be utilised to perform holistic integration of personal data and jointly querying it in native representations
    corecore