23 research outputs found
Using ontology to mine and classify Li-Fraumeni Syndrom patients
The Li-Fraumeni Syndrome (LFS) is a syndrome that causes multiple primary tumors in children and young adults. The main motivation of this work is to create a single integrated system that allows doctors and researchers from the A.C. Camargo Cancer Center to relate family histories, clinical and molecular data present in di erent databases through an innovative data integration methodology in order to improve the existing LFS diagnose criteria, or even to propose a new set of clinical criteria.
(Párrafo extraído del texto a modo de resumen)Sociedad Argentina de Informática e Investigación Operativa (SADIO
Adapting Searchy to extract data using evolved wrappers
This is the author’s version of a work that was accepted for publication inExpert Systems with Applications: An International Journal. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Expert Systems with Applications: An International Journal, 39, 3 (2012) DOI: 10.1016/j.eswa.2011.08.168Organizations need diverse information systems to deal with the increasing requirements in information storage and processing, yielding the creation of information islands and therefore an intrinsic difficulty to obtain a global view. Being able to provide such an unified view of the -likely heterogeneous-information available in an organization is a goal that provides added-value to the information systems and has been subject of intense research. In this paper we present an extension of a solution named Searchy, an agent-based mediator system specialized in data extraction and Integration. Through the use of a set of wrappers, it integrates information from arbitrary sources and semantically translates them according to a mediated scheme. Searchy is actually a domain-independent wrapper container that ease wrapper development, providing, for example, semantic mapping. The extension of Searchy proposed in this paper introduces an evolutionary wrapper that is able to evolve wrappers using regular expressions. To achieve this, a Genetic Algorithm (GA) is used to learn a regex able to extract a set of positive samples while rejects a set of negative samples.The authors gratefully acknowledge Mart´ın Knoblauch for
his useful suggestions and valuable comments. This work has
been partially supported by the Spanish Ministry of Science
and Innovation under the projects ABANT (TIN 2010-19872),
COMPUBIODIVE (TIN2007-65989) and by Castilla-La Mancha
project PEII09-0266-6640
Cooperative Approach for Composite Ontology Mapping
This paper proposes a cooperative approach for composite ontology mapping. We first present an extended classification of automated ontology matching and propose an automatic composite solution for the matching problem based on cooperation. In our proposal, agents apply individual mapping algorithms and cooperate in order to change their individual results. We assume that the approaches are complementary to each other and their combination produces better results than the individual ones. Next, we compare our model with three state of the art matching systems. The results are promising specially for what concerns precision and recall. Finally, we propose an argumentation formalism as
an extension of our initial model. We compare our argumentation model with the matching systems, showing improvements on the results
Structured digital tables on the Semantic Web: toward a structured digital literature
In parallel to the growth in bioscience databases, biomedical publications have increased exponentially in the past decade. However, the extraction of high-quality information from the corpus of scientific literature has been hampered by the lack of machine-interpretable content, despite text-mining advances. To address this, we propose creating a structured digital table as part of an overall effort in developing machine-readable, structured digital literature. In particular, we envision transforming publication tables into standardized triples using Semantic Web approaches. We identify three canonical types of tables (conveying information about properties, networks, and concept hierarchies) and show how more complex tables can be built from these basic types. We envision that authors would create tables initially using the structured triples for canonical types and then have them visually rendered for publication, and we present examples for converting representative tables into triples. Finally, we discuss how ‘stub' versions of structured digital tables could be a useful bridge for connecting together the literature with databases, allowing the former to more precisely document the later
Architecture for integrating heterogeneous biological data repositories using ontologies
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 86-89).High-throughput experiments generate vast quantities of biological information that are stored in autonomous data repositories distributed across the World Wide Web. There exists a need to integrate information from multiple data repositories for the purposes of data mining; however, current methods of integration require a significant amount of manual work that is often tedious and time consuming. The thesis proposes a flexible architecture that facilitates the automation of data integration from multiple heterogeneous biological data repositories using ontologies. The design uses ontologies to resolve the semantic conflicts that usually hinder schema integration and searching for information. The architecture implemented successfully demonstrates how ontologies facilitate the automation of data integration from multiple data repositories. Nevertheless, many optimizations to increase the performance of the system were realized during the implementation of various components in the architecture and are described in the thesis.by Howard H. Chou.M.Eng
A Cooperative Approach for Composite Ontology Matching
Ontologies have proven to be an essential element in a range of applications in which knowl-edge plays a key role. Resolving the semantic heterogeneity problem is crucial to allow the interoperability between ontology-based systems. This makes automatic ontology matching, as an anticipated solution to semantic heterogeneity, an important, research issue. Many dif-ferent approaches to the matching problem have emerged from the literature. An important issue of ontology matching is to find effective ways of choosing among many techniques and their variations, and then combining their results. An innovative and promising option is to formalize the combination of matching techniques using agent-based approaches, such as cooperative negotiation and argumentation. In this thesis, the formalization of the on-tology matching problem following an agent-based approach is proposed. Such proposal is evaluated using state-of-the-art data sets. The results show that the consensus obtained by negotiation and argumentation represent intermediary values which are closer to the best matcher. As the best matcher may vary depending on specific differences of multiple data sets, cooperative approaches are an advantage.
*** RESUMO -
Ontologias são elementos essenciais em sistemas baseados em conhecimento. Resolver o problema de heterogeneidade semântica é fundamental para permitira interoperabilidade entre sistemas baseados em ontologias. Mapeamento automático de ontologias pode ser visto como uma solução para esse problema. Diferentes e complementares abordagens para o problema são propostas na literatura. Um aspecto importante em mapeamento consiste em selecionar o conjunto adequado de abordagens e suas variações, e então combinar seus resultados. Uma opção promissora envolve formalizara combinação de técnicas de ma-peamento usando abordagens baseadas em agentes cooperativos, tais como negociação e argumentação. Nesta tese, a formalização do problema de combinação de técnicas de ma-peamento usando tais abordagens é proposta e avaliada. A avaliação, que envolve conjuntos de testes sugeridos pela comunidade científica, permite concluir que o consenso obtido pela negociação e pela argumentação não é exatamente a melhoria de todos os resultados individuais, mas representa os valores intermediários que são próximo da melhor técnica. Considerando que a melhor técnica pode variar dependendo de diferencas específicas de múltiplas bases de dados, abordagens cooperativas são uma vantagem
Garantia de privacidade na exploração de bases de dados distribuídas
Anonymisation is currently one of the biggest challenges when sharing sensitive
personal information. Its importance depends largely on the application
domain, but when dealing with health information, this becomes a more serious
issue. A simpler approach to avoid this disclosure is to ensure that all
data that can be associated directly with an individual is removed from the
original dataset. However, some studies have shown that simple anonymisation
procedures can sometimes be reverted using specific patients’ characteristics,
namely when the anonymisation is based on hidden key attributes.
In this work, we propose a secure architecture to share information from distributed
databases without compromising the subjects’ privacy. The work
was initially focused on identifying techniques to link information between
multiple data sources, in order to revert the anonymization procedures. In
a second phase, we developed the methodology to perform queries over
distributed databases was proposed. The architecture was validated using
a standard data schema that is widely adopted in observational research
studies.A garantia da anonimização de dados é atualmente um dos maiores desafios
quando existe a necessidade de partilhar informações pessoais de carácter
sensível. Apesar de ser um problema transversal a muitos domínios de
aplicação, este torna-se mais crítico quando a anonimização envolve dados
clinicos. Nestes casos, a abordagem mais comum para evitar a divulgação
de dados, que possam ser associados diretamente a um indivíduo, consiste
na remoção de atributos identificadores. No entanto, segundo a literatura,
esta abordagem não oferece uma garantia total de anonimato, que pode ser
quebrada através de ataques específicos que permitem a reidentificação dos
sujeitos.
Neste trabalho, é proposta uma arquitetura que permite partilhar dados
armazenados em repositórios distribuídos, de forma segura e sem comprometer
a privacidade. Numa primeira fase deste trabalho, foi feita uma análise
de técnicas que permitam reverter os procedimentos de anonimização. Na
fase seguinte, foi proposta uma metodologia que permite realizar pesquisas
em bases de dados distribuídas, sem que o anonimato seja quebrado. Esta
arquitetura foi validada sobre um esquema de base de dados relacional que
é amplamente utilizado em estudos clínicos observacionais.Mestrado em Ciberseguranç