15 research outputs found
Case-based reasoning and system design: An integrated approach based on ontology and preference modeling
This paper addresses the fulfillment of requirements related to case-based reasoning (CBR) processes for system design. Considering that CBR processes are well suited for problem solving, the proposed method concerns the definition of an integrated CBR process in line with system engineering principles. After the definition of the requirements that the approach has to fulfill, an ontology is defined to capitalize knowledge about the design within concepts. Based on the ontology, models are provided for requirements and solutions representation. Next, a recursive CBR process, suitable for system design, is provided. Uncertainty and designer preferences as well as ontological guidelines are considered during the requirements definition, the compatible cases retrieval, and the solution definition steps. This approach is designed to give flexibility within the CBR process as well as to provide guidelines to the designer. Such questions as the following are conjointly treated: how to guide the designer to be sure that the requirements are correctly defined and suitable for the retrieval step, how to retrieve cases when there are no available similarity measures, and how to enlarge the research scope during
the retrieval step to obtain a sufficient panel of solutions. Finally, an example of system engineering in the aeronautic domain illustrates the proposed method. A testbed has been developed and carried out to evaluate the performance of the retrieval algorithm and a software prototype has been developed in order to test the approach. The outcome of this work is a recursive CBR process suitable to engineering design and compatible with standards. Requirements are modeled by means of flexible constraints, where the designer preferences are used to express the flexibility. Similar solutions can be retrieved even if similarity measures between features are not available. Simultaneously, ontological guidelines are used to guide the process and to aid the designer to express her/his preferences
Semantic expansion of queries for web search (MSEC)
Internet has become the largest repository of human knowledge, and the amount of stored information increases day by day. This increase of information affects the levels of precision reported by Web search engines regarding documents retrieved for the user. One strategy being used to address this problem is a focus on a personalized resource recovery. Several projects currently offer semantic methods for improving the relevance of search results through the use of ontologies, natural language processing, knowledge based systems, query specification languages, and user profile, among others. Results are generally better than for web search engines that do not use these techniques. However, the high cost of these improvements in precision relate to use of more complex algorithms in carrying out the search and which are more wasteful of computational resources. This article describes a semantic query expansion model called MSEC, which is based mostly on the concept of semantic similarity, starting from domain ontologies and on the use of user profile in order to customize user searches so to improve their precision. In order to evaluate the proposed model, a software prototype was created. Preliminary experimental results show an improvement compared to the traditional web search approach. Finally the model was compared against the best state of the art semantic search engine, called GoPubMed, for the MEDLINE collection. Internet se ha convertido en el mayor repositorio de conocimiento humano y la cantidad de información almacenada crece cada día más. Esto último repercute en el bajo nivel de precisión que reportan los sistemas de búsqueda Web respecto a los documentos que son recuperados para el usuario. Para enfrentar este problema, una de las estrategias utilizadas es la recuperación personalizada de recursos. Actualmente existen varios proyectos que proponen métodos semánticos para aumentar la relevancia de las búsquedas, a través del uso de ontologías, procesamiento de lenguaje natural, sistemas basados en conocimiento, lenguajes de especificación de consultas y perfil de usuario, entre otras. Los resultados generalmente son mejores que los obtenidos por buscadores que no usan éstas técnicas.
Sin embargo, el costo que se paga por estas mejoras en precisión se centra en el uso de algoritmos más complejos en implementación y que consumen más recursos computacionales. Este artículo describe un modelo semántico de expansión de consultas denominado MSEC, el cual está basado principalmente en el concepto de similitud semántica a partir de Ontologías de dominio y en el uso del perfil de usuario para personalizar las búsquedas y así mejorar la precisión de las mismas. Para evaluar el modelo propuesto se creó un prototipo software. Los resultados experimentales preliminares muestran una mejora respecto al enfoque tradicional de búsqueda. Finalmente se comparó con el mejor buscador semántico del estado del arte, llamado GoPubMed para la colección MEDLINE
Semantic expansion of queries for web search (MSEC)
Internet has become the largest repository of human knowledge, and the amount of stored information increases day by day. This increase of information affects the levels of precision reported by Web search engines regarding documents retrieved for the user. One strategy being used to address this problem is a focus on a personalized resource recovery. Several projects currently offer semantic methods for improving the relevance of search results through the use of ontologies, natural language processing, knowledge based systems, query specification languages, and user profile, among others. Results are generally better than for web search engines that do not use these techniques. However, the high cost of these improvements in precision relate to use of more complex algorithms in carrying out the search and which are more wasteful of computational resources. This article describes a semantic query expansion model called MSEC, which is based mostly on the concept of semantic similarity, starting from domain ontologies and on the use of user profile in order to customize user searches so to improve their precision. In order to evaluate the proposed model, a software prototype was created. Preliminary experimental results show an improvement compared to the traditional web search approach. Finally the model was compared against the best state of the art semantic search engine, called GoPubMed, for the MEDLINE collection. Internet se ha convertido en el mayor repositorio de conocimiento humano y la cantidad de información almacenada crece cada día más. Esto último repercute en el bajo nivel de precisión que reportan los sistemas de búsqueda Web respecto a los documentos que son recuperados para el usuario. Para enfrentar este problema, una de las estrategias utilizadas es la recuperación personalizada de recursos. Actualmente existen varios proyectos que proponen métodos semánticos para aumentar la relevancia de las búsquedas, a través del uso de ontologías, procesamiento de lenguaje natural, sistemas basados en conocimiento, lenguajes de especificación de consultas y perfil de usuario, entre otras. Los resultados generalmente son mejores que los obtenidos por buscadores que no usan éstas técnicas.
Sin embargo, el costo que se paga por estas mejoras en precisión se centra en el uso de algoritmos más complejos en implementación y que consumen más recursos computacionales. Este artículo describe un modelo semántico de expansión de consultas denominado MSEC, el cual está basado principalmente en el concepto de similitud semántica a partir de Ontologías de dominio y en el uso del perfil de usuario para personalizar las búsquedas y así mejorar la precisión de las mismas. Para evaluar el modelo propuesto se creó un prototipo software. Los resultados experimentales preliminares muestran una mejora respecto al enfoque tradicional de búsqueda. Finalmente se comparó con el mejor buscador semántico del estado del arte, llamado GoPubMed para la colección MEDLINE
Procedimento para a construção de índices semânticos baseados em ontologias de domínio específico
The current on-line search systems are still far from providing users with contextualized and accurate answers because users have to make additional efforts to filter and evaluate information supplied to them. One of the ways to improve the results is to create semantic indexes that incorporate knowledge and intelligent processing of resources. When it comes to the implementation of semantic indexes, however, there is a wide range of research studies with their own procedures and lengthy conceptualization, implementation, and refinement processes. Thus, it becomes of the utmost importance to define an instrument that allows creating these kinds of structures in a more structured and efficient manner. This work proposes a procedure that makes it possible to create semantic indexes based on domain-specific ontologies. The methodology entailed creating a state of the art of the various existing proposals and drawing a general procedure that incorporates the best practice for creating semantic indexes. Then, a semantic index was created of the domain of plants and their components. The results demonstrate that the defined process is a good instrument that guides implementation of these kinds of structures with a high degree of customization. Nevertheless, it also shows that the process depends on other variables in building and processing the index, so the design needs to be re-examined until the desirable results are obtained.Los sistemas de búsqueda web actual, aún están lejos de ofrecer respuestas completamente contextualizadas y precisas a los usuarios, ya que éstos deben hacer esfuerzos adicionales de filtrado y evaluación de la información proporcionada. Una forma de mejorar los resultados, es mediante la creación de índices semánticos, los cuales incorporan conocimiento y procesamiento inteligente de los recursos. Sin embargo, al momento de implementar los índices semánticos, existen variadas investigaciones con procedimientos propios y con procesos largos de conceptualización, implementación y afinación. Es así, como se vuelve importante definir una herramienta que permita crear este tipo de estructuras de una manera más estructurada y eficiente. El presente trabajo propone un procedimiento que permite crear índices semánticos a partir de ontologías de dominio específico. La metodología utilizada fue la creación de un estado del arte de las diferentes propuestas existentes y posteriormente la abstracción de un procedimiento general que incorpore las mejores prácticas de creación de índices semánticos. Posteriormente, se creó un índice semántico el dominio de las plantas y sus componentes. Los resultados permiten establecer que el proceso definido es una buena herramienta para guiar la implementación de este tipo de estructuras con un alto grado de personalización. Sin embargo, también evidenció que el proceso depende otras variables al momento de construir y trabajar con el índice y por lo tanto se debe reevaluar el diseño hasta obtener los resultados deseados.Os atuais sistemas de busca na web, estão ainda longe de fornecer respostas plenamente contextualizadas e precisas aos usuários, uma vez que eles devem fazer esforços extras de filtragem e avaliação das informações fornecidas. Uma forma de melhorar os resultados é através da criação de índices semânticos, que incorporam conhecimento e processamento inteligente dos recursos. No entanto, no momento de implementar os índices semânticos, existem variadas investigações com procedimentos próprios e com longos processos de conceituação, implementação e ajuste. É assim que se torna importante definir uma ferramenta que permita criar este tipo de estruturas de uma maneira mais estruturada e eficiente. Este artigo propõe um procedimento que permite criar índices semânticos a partir de ontologias de domínio específico. A metodologia usada foi a criação de um estado de arte das diferentes propostas existentes e posteriormente a abstração de um procedimento geral que incorpore as melhores práticas de criação de índices semânticos. Posteriormente, foi criado um índice semântico de masterização das plantas e seus componentes. Os resultados permitem estabelecer que o processo definido é uma boa ferramenta para orientar a implementação deste tipo de estruturas com um alto grau de personalização. No entanto, também revelou que o processo depende de outras variáveis no momento de construir e trabalhar com o índice e, portanto, o projeto deve ser reavaliado até obter os resultados desejado
Evaluation Methodologies for Visual Information Retrieval and Annotation
Die automatisierte Evaluation von Informations-Retrieval-Systemen erlaubt
Performanz und Qualität der Informationsgewinnung zu bewerten. Bereits in
den 60er Jahren wurden erste Methodologien für die system-basierte
Evaluation aufgestellt und in den Cranfield Experimenten überprüft.
Heutzutage gehören Evaluation, Test und Qualitätsbewertung zu einem aktiven
Forschungsfeld mit erfolgreichen Evaluationskampagnen und etablierten
Methoden. Evaluationsmethoden fanden zunächst in der Bewertung von
Textanalyse-Systemen Anwendung. Mit dem rasanten Voranschreiten der
Digitalisierung wurden diese Methoden sukzessive auf die Evaluation von
Multimediaanalyse-Systeme übertragen. Dies geschah häufig, ohne die
Evaluationsmethoden in Frage zu stellen oder sie an die veränderten
Gegebenheiten der Multimediaanalyse anzupassen. Diese Arbeit beschäftigt
sich mit der system-basierten Evaluation von Indizierungssystemen für
Bildkollektionen. Sie adressiert drei Problemstellungen der Evaluation von
Annotationen: Nutzeranforderungen für das Suchen und Verschlagworten von
Bildern, Evaluationsmaße für die Qualitätsbewertung von
Indizierungssystemen und Anforderungen an die Erstellung visueller
Testkollektionen. Am Beispiel der Evaluation automatisierter
Photo-Annotationsverfahren werden relevante Konzepte mit Bezug zu
Nutzeranforderungen diskutiert, Möglichkeiten zur Erstellung einer
zuverlässigen Ground Truth bei geringem Kosten- und Zeitaufwand vorgestellt
und Evaluationsmaße zur Qualitätsbewertung eingeführt, analysiert und
experimentell verglichen. Traditionelle Maße zur Ermittlung der Performanz
werden in vier Dimensionen klassifiziert. Evaluationsmaße vergeben
üblicherweise binäre Kosten für korrekte und falsche Annotationen. Diese
Annahme steht im Widerspruch zu der Natur von Bildkonzepten. Das gemeinsame
Auftreten von Bildkonzepten bestimmt ihren semantischen Zusammenhang und
von daher sollten diese auch im Zusammenhang auf ihre Richtigkeit hin
überprüft werden. In dieser Arbeit wird aufgezeigt, wie semantische
Ähnlichkeiten visueller Konzepte automatisiert abgeschätzt und in den
Evaluationsprozess eingebracht werden können. Die Ergebnisse der Arbeit
inkludieren ein Nutzermodell für die konzeptbasierte Suche von Bildern,
eine vollständig bewertete Testkollektion und neue Evaluationsmaße für die
anforderungsgerechte Qualitätsbeurteilung von Bildanalysesystemen.Performance assessment plays a major role in the research on Information
Retrieval (IR) systems. Starting with the Cranfield experiments in the
early 60ies, methodologies for the system-based performance assessment
emerged and established themselves, resulting in an active research field
with a number of successful benchmarking activities. With the rise of the
digital age, procedures of text retrieval evaluation were often transferred
to multimedia retrieval evaluation without questioning their direct
applicability. This thesis investigates the problem of system-based
performance assessment of annotation approaches in generic image
collections. It addresses three important parts of annotation evaluation,
namely user requirements for the retrieval of annotated visual media,
performance measures for multi-label evaluation, and visual test
collections. Using the example of multi-label image annotation evaluation,
I discuss which concepts to employ for indexing, how to obtain a reliable
ground truth to moderate costs, and which evaluation measures are
appropriate. This is accompanied by a thorough analysis of related work on
system-based performance assessment in Visual Information Retrieval (VIR).
Traditional performance measures are classified into four dimensions and
investigated according to their appropriateness for visual annotation
evaluation. One of the main ideas in this thesis adheres to the common
assumption on the binary nature of the score prediction dimension in
annotation evaluation. However, the predicted concepts and the set of true
indexed concepts interrelate with each other. This work will show how to
utilise these semantic relationships for a fine-grained evaluation
scenario. Outcomes of this thesis result in a user model for concept-based
image retrieval, a fully assessed image annotation test collection, and a
number of novel performance measures for image annotation evaluation
Use of LSH functions for conceptual search based on ontologies
Orientador: Maurício Ferreira MagalhãesTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: O volume de dados disponíveis na WWW aumenta a cada dia. Com o surgimento da Web Semântica, os dados passaram a ter uma representação do seu significado, ou seja, serem classificados em um conceito de um domínio de conhecimento, tal domínio geralmente definido por uma ontologia. Essa representação, apoiada em todo o ferramental criado para a Web Semântica, propicia a busca conceitual. Nesse tipo de busca, o objetivo não é a recuperação de um dado específico, mas dados, de diversos tipos, classificados em um conceito de um domínio de conhecimento. Utilizando um índice de similaridade, é possível a recuperação de dados referentes a outros conceitos do mesmo domínio, aumentando a abrangência da busca. A indexação distribuída desses dados pode fazer com que uma busca conceitual por similaridade se torne muito custosa. Existem várias estruturas de indexação distribuída, como as redes P2P, que são empregadas na distribuição e compartilhamento de grandes volumes de dados. Esta tese propõe a utilização de funções LSH na indexação de conceitos de um domínio, definido por uma ontologia, mantendo a similaridade entre eles. Dessa forma, conceitos similares são armazenados próximos um dos outros, tal conceito de proximidade medida em alguma métrica, facilitando a busca conceitual por similaridadeAbstract: The volume of data available in the WWW increases every day. The Semantic Web emerged, giving a representation of the meaning of data, being classified in a concept of a knowledge domain, which is generally defined using an ontology. This representation, based in all the tools created for the Semantic Web, possibilitates the conceptual search. In this type of search, the goal is not to retrieve a specific piece of data, but several data, of several types, classified in a concept of an ontology. Using a similarity level, the retrieval of data that refer to other concepts of the domain is also possible, making the search broader. The distributed indexing of all these data may turn the conceptual search costly. The Internet holds several structures of distributed indexing, such as P2P networks, which are used in the distribution and sharing of huge volumes of data. This thesis presents how it is possible to use LSH functions to generate identifiers to concepts of a domain, defined using an ontology, keeping their similarity. This way, similar concepts are stored near each other, such distance measured in some metric, turning the conceptual search by similarity easierDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétric
Recommended from our members
Electronic Health Record Summarization over Heterogeneous and Irregularly Sampled Clinical Data
The increasing adoption of electronic health records (EHRs) has led to an unprecedented amount of patient health information stored in an electronic format. The ability to comb through this information is imperative, both for patient care and computational modeling. Creating a system to minimize unnecessary EHR data, automatically distill longitudinal patient information, and highlight salient parts of a patient’s record is currently an unmet need. However, summarization of EHR data is not a trivial task, as there exist many challenges with reasoning over this data. EHR data elements are most often obtained at irregular intervals as patients are more likely to receive medical care when they are ill, than when they are healthy. The presence of narrative documentation adds another layer of complexity as the notes are riddled with over-sampled text, often caused by the frequent copy-and-pasting during the documentation process.
This dissertation synthesizes a set of challenges for automated EHR summarization identified in the literature and presents an array of methods for dealing with some of these challenges. We used hybrid data-driven and knowledge-based approaches to examine abundant redundancy in clinical narrative text, a data-driven approach to identify and mitigate biases in laboratory testing patterns with implications for using clinical data for research, and a probabilistic modeling approach to automatically summarize patient records and learn computational models of disease with heterogeneous data types. The dissertation also demonstrates two applications of the developed methods to important clinical questions: the questions of laboratory test overutilization and cohort selection from EHR data
Formalisation et exploitation de connaissances et d’expériences pour l’aide à la décision dans les processus d’ingénierie système
Ce manuscrit d’habilitation à diriger des recherche synthétise mon activité professionnelle en enseignement et en recherche depuis l’obtention de mon poste de maître de conférences en 2001. Après l’obtention de mon diplôme de doctorat, préparé au Laboratoire Génie de Production (LGP) entre 1997 et 2000 sous la direction de Bernard Grabot, j’ai obtenu mon poste de maître de conférences à l’Université de Bretagne Sud à Lorient (UBS). Durant une période de trois années dans cette université et au Laboratoire d’Electronique des Systèmes Temps Réels (LESTER devenu LAB-STICC par la suite), j’ai pu développer des activités de recherche dans le domaine de la conception et de la reconfiguration des systèmes automatisés de type Systèmes Transitiques. Suite à ma mutation à l’Ecole Nationale d’Ingénieurs de Tarbes en 2004, j’ai poursuivi mes activités de recherche au Laboratoire Génie de Production (LGP) en lien avec le développement d’outils d’aide ‘a la décision dans les processus d’ingénierie système basés sur l’exploitation de connaissances et d’expériences. En enseignement, depuis 2001, mes activités sont partagées entre le génie industriel et l’informatique.
Ce document est structuré en deux parties :
1. la première partie permet d’exposer, dans mon Curriculum Vitae détaillé, un bilan de mes activités d’enseignant-chercheur. Mon parcours professionnel, mes activités d’enseignement et un bilan de mes activités de recherche sont exposés de manière synthétique. Dans un premier temps, les enseignements dont j’ai eu la responsabilité (conception et ou réalisation) ainsi que les documents pédagogiques produits et les volumes horaires sont exposés. Ensuite, les encadrements de chercheurs (doctorants, masters et post-doctorat), les projets institutionnels (FUI et ANR) dans lesquels j’ai pris des responsabilités, les partenariats avec des entreprises dans le cadre de contrats CIFRE, mes activités d’animation de la recherche au niveau national et international font partie de ce bilan. Cette section se termine par la liste exhaustive de mes publications et communications (section 3.5) réalisées depuis le début de mon activité de chercheur, en 1997,
2. la seconde partie synthétise mes activités de recherche réalisées depuis 2001. Cette seconde partie
est présentée selon 6 chapitres. Le chapitre 1 permet d’exposer la problématique globale de mes
travaux de recherche. Elle est orientée par un modèle à trois niveaux (Processus, Outils, Expériences / Connaissances) et étayée par un premier niveau d’étude bibliographique. Le niveau de détail choisi permet de comprendre cette problématique dans sa globalité. Les processus ciblés, les outils développés, les connaissances exploitées sont présentés au regard de la littérature dans les différents domaines. Les chapitres 2 à 5 fournissent quant à eux un niveau de détail plus fin.
Ils permettent de présenter les problématiques de manière affinée, les développements réalisés et les contributions scientifiques majeures. L’objectif est de fournir des éléments qui soient utiles à la compréhension de mon activité de recherche mais, également, d’en favoriser l’exploitation ultérieure. Enfin, dans le chapitre 6, la conclusion permet de prendre le recul nécessaire au travaux réalisés et de proposer mon projet de recherche pour les années à venir