929 research outputs found
CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT
Medical Subject Headings (MeSH) is a controlled vocabulary used by the National Library of Medicine to index medical articles, abstracts, and journals contained within the MEDLINE database. Although MeSH imposes uniformity and consistency in the indexing process, it has been proven that using MeSH indices only result in a small increase in precision over free-text indexing. Moreover, studies have shown that the use of controlled vocabularies in the indexing process is not an effective method to increase semantic relevance in information retrieval. To address the need for semantic relevance, we present an ontology-based information retrieval system for the MEDLINE collection that result in a 37.5% increase in precision when compared to free-text indexing systems. The presented system focuses on the ontology to: provide an alternative to text-representation for medical articles, finding relationships among co-occurring terms in abstracts, and to index terms that appear in text as well as discovered relationships. The presented system is then compared to existing MeSH and Free-Text information retrieval systems. This dissertation provides a proof-of-concept for an online retrieval system capable of providing increased semantic relevance when searching through medical abstracts in MEDLINE
Conceptual graph-based knowledge representation for supporting reasoning in African traditional medicine
Although African patients use both conventional or modern and traditional healthcare simultaneously, it has been proven that 80% of people rely on African traditional medicine (ATM). ATM includes medical activities stemming from practices, customs and traditions which were integral to the distinctive African cultures. It is based mainly on the oral transfer of knowledge, with the risk of losing critical knowledge. Moreover, practices differ according to the regions and the availability of medicinal plants. Therefore, it is necessary to compile tacit, disseminated and complex knowledge from various Tradi-Practitioners (TP) in order to determine interesting patterns for treating a given disease. Knowledge engineering methods for traditional medicine are useful to model suitably complex information needs, formalize knowledge of domain experts and highlight the effective practices for their integration to conventional medicine. The work described in this paper presents an approach which addresses two issues. First it aims at proposing a formal representation model of ATM knowledge and practices to facilitate their sharing and reusing. Then, it aims at providing a visual reasoning mechanism for selecting best available procedures and medicinal plants to treat diseases. The approach is based on the use of the Delphi method for capturing knowledge from various experts which necessitate reaching a consensus. Conceptual graph formalism is used to model ATM knowledge with visual reasoning capabilities and processes. The nested conceptual graphs are used to visually express the semantic meaning of Computational Tree Logic (CTL) constructs that are useful for formal specification of temporal properties of ATM domain knowledge. Our approach presents the advantage of mitigating knowledge loss with conceptual development assistance to improve the quality of ATM care (medical diagnosis and therapeutics), but also patient safety (drug monitoring)
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Spatial description-based approach towards integration of biomedical atlases
Biomedical imaging has become ubiquitous in both basic research and the clinical
sciences. As technology advances the resulting multitude of imaging modalities has
led to a sharp rise in the quantity and quality of such images. Whether for epi-
demiological studies, educational uses, clinical monitoring, or translational science
purposes, the ability to integrate and compare such image-based data has become in-
creasingly critical in the life sciences and eHealth domain. Ontology-based solutions
often lack spatial precision. Image processing-based solutions may have di culties
when the underlying morphologies are too di erent. This thesis proposes a compro-
mise solution which captures location in biomedical images via spatial descriptions.
Three approaches of spatial descriptions have been explored. These include: (1)
spatial descriptions based on spatial relationships between segmented regions; (2)
spatial descriptions based on ducial points and a set of spatial relations; and (3)
spatial descriptions based on ducial points and a set of spatial relations, integrated
with spatial relations between segmented regions. Evaluation, particularly in the
context of mouse gene expression data, a good representative of spatio-temporal bi-
ological data, suggests that the spatial description-based solution can provide good
spatial precision. This dissertation discusses the need for biomedical image data in-
tegration, the shortcomings of existing solutions and proposes new algorithms based
on spatial descriptions of anatomical details in the image. Evaluation studies, par-
ticularly in the context of gene expression data analysis, were carried out to study
the performance of the new algorithms
A framework for analyzing changes in health care lexicons and nomenclatures
Ontologies play a crucial role in current web-based biomedical applications for capturing contextual knowledge in the domain of life sciences. Many of the so-called bio-ontologies and controlled vocabularies are known to be seriously defective from both terminological and ontological perspectives, and do not sufficiently comply with the standards to be considered formai ontologies. Therefore, they are continuously evolving in order to fix the problems and provide valid knowledge. Moreover, many problems in ontology evolution often originate from incomplete knowledge about the given domain. As our knowledge improves, the related definitions in the ontologies will be altered. This problem is inadequately addressed by available tools and algorithms, mostly due to the lack of suitable knowledge representation formalisms to deal with temporal abstract notations, and the overreliance on human factors. Also most of the current approaches have been focused on changes within the internal structure of ontologies, and interactions with other existing ontologies have been widely neglected. In this research, alter revealing and classifying some of the common alterations in a number of popular biomedical ontologies, we present a novel agent-based framework, RLR (Represent, Legitimate, and Reproduce), to semi-automatically manage the evolution of bio-ontologies, with emphasis on the FungalWeb Ontology, with minimal human intervention. RLR assists and guides ontology engineers through the change management process in general, and aids in tracking and representing the changes, particularly through the use of category theory. Category theory has been used as a mathematical vehicle for modeling changes in ontologies and representing agents' interactions, independent of any specific choice of ontology language or particular implementation. We have also employed rule-based hierarchical graph transformation techniques to propose a more specific semantics for analyzing ontological changes and transformations between different versions of an ontology, as well as tracking the effects of a change in different levels of abstractions. Thus, the RLR framework enables one to manage changes in ontologies, not as standalone artifacts in isolation, but in contact with other ontologies in an openly distributed semantic web environment. The emphasis upon the generality and abstractness makes RLR more feasible in the multi-disciplinary domain of biomedical Ontology change management
Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress
Objective: To perform a review of recent research in clinical data reuse or secondary use, and envision future advances in this field. Methods: The review is based on a large literature search in MEDLINE (through PubMed), conference proceedings, and the ACM Digital Library, focusing only on research published between 2005 and early 2016. Each selected publication was reviewed by the authors, and a structured analysis and summarization of its content was developed. Results: The initial search produced 359 publications, reduced after a manual examination of abstracts and full publications. The following aspects of clinical data reuse are discussed: motivations and challenges, privacy and ethical concerns, data integration and interoperability, data models and terminologies, unstructured data reuse, structured data mining, clinical practice and research integration, and examples of clinical data reuse (quality measurement and learning healthcare systems). Conclusion: Reuse of clinical data is a fast-growing field recognized as essential to realize the potentials for high quality healthcare, improved healthcare management, reduced healthcare costs, population health management, and effective clinical research
A Colour Wheel to Rule them All: Analysing Colour & Geometry in Medical Microscopy
Personalized medicine is a rapidly growing field in healthcare that aims to customize
medical treatments and preventive measures based on each patient’s unique characteristics,
such as their genes, environment, and lifestyle factors. This approach
acknowledges that people with the same medical condition may respond differently
to therapies and seeks to optimize patient outcomes while minimizing the risk
of adverse effects.
To achieve these goals, personalized medicine relies on advanced technologies,
such as genomics, proteomics, metabolomics, and medical imaging. Digital
histopathology, a crucial aspect of medical imaging, provides clinicians with valuable
insights into tissue structure and function at the cellular and molecular levels. By
analyzing small tissue samples obtained through minimally invasive techniques, such
as biopsy or aspirate, doctors can gather extensive data to evaluate potential diagnoses
and clinical decisions. However, digital analysis of histology images presents
unique challenges, including the loss of 3D information and stain variability, which
is further complicated by sample variability. Limited access to data exacerbates
these challenges, making it difficult to develop accurate computational models for
research and clinical use in digital histology.
Deep learning (DL) algorithms have shown significant potential for improving the
accuracy of Computer-Aided Diagnosis (CAD) and personalized treatment models,
particularly in medical microscopy. However, factors such as limited generability,
lack of interpretability, and bias sometimes hinder their clinical impact. Furthermore,
the inherent variability of histology images complicates the development of robust DL
methods. Thus, this thesis focuses on developing new tools to address these issues.
Our essential objective is to create transparent, accessible, and efficient methods
based on classical principles from various disciplines, including histology, medical
imaging, mathematics, and art, to tackle microscopy image registration and colour
analysis successfully. These methods can contribute significantly to the advancement
of personalized medicine, particularly in studying the tumour microenvironment
for diagnosis and therapy research.
First, we introduce a novel automatic method for colour analysis and non-rigid
histology registration, enabling the study of heterogeneity morphology in tumour
biopsies. This method achieves accurate tissue cut registration, drastically reducing
landmark distance and excellent border overlap. Second, we introduce ABANICCO, a novel colour analysis method that combines
geometric analysis, colour theory, fuzzy colour spaces, and multi-label systems
for automatically classifying pixels into a set of conventional colour categories.
ABANICCO outperforms benchmark methods in accuracy and simplicity. It is
computationally straightforward, making it useful in scenarios involving changing
objects, limited data, unclear boundaries, or when users lack prior knowledge of
the image or colour theory. Moreover, results can be modified to match each
particular task.
Third, we apply the acquired knowledge to create a novel pipeline of rigid
histology registration and ABANICCO colour analysis for the in-depth study of
triple-negative breast cancer biopsies. The resulting heterogeneity map and tumour
score provide valuable insights into the composition and behaviour of the tumour,
informing clinical decision-making and guiding treatment strategies.
Finally, we consolidate the developed ideas into an efficient pipeline for tissue
reconstruction and multi-modality data integration on Tuberculosis infection data.
This enables accurate element distribution analysis to understand better interactions
between bacteria, host cells, and the immune system during the course of infection.
The methods proposed in this thesis represent a transparent approach to computational
pathology, addressing the needs of medical microscopy registration and
colour analysis while bridging the gap between clinical practice and computational
research. Moreover, our contributions can help develop and train better, more
robust DL methods.En una época en la que la medicina personalizada está revolucionando la asistencia
sanitaria, cada vez es más importante adaptar los tratamientos y las medidas
preventivas a la composición genética, el entorno y el estilo de vida de cada
paciente. Mediante el empleo de tecnologÃas avanzadas, como la genómica, la
proteómica, la metabolómica y la imagen médica, la medicina personalizada se
esfuerza por racionalizar el tratamiento para mejorar los resultados y reducir
los efectos secundarios.
La microscopÃa médica, un aspecto crucial de la medicina personalizada, permite
a los médicos recopilar y analizar grandes cantidades de datos a partir de pequeñas
muestras de tejido. Esto es especialmente relevante en oncologÃa, donde las terapias
contra el cáncer se pueden optimizar en función de la apariencia tisular especÃfica de
cada tumor. La patologÃa computacional, un subcampo de la visión por ordenador,
trata de crear algoritmos para el análisis digital de biopsias. Sin embargo, antes de
que un ordenador pueda analizar imágenes de microscopÃa médica, hay que seguir
varios pasos para conseguir las imágenes de las muestras.
La primera etapa consiste en recoger y preparar una muestra de tejido del
paciente. Para que esta pueda observarse fácilmente al microscopio, se corta en
secciones ultrafinas. Sin embargo, este delicado procedimiento no está exento de
dificultades. Los frágiles tejidos pueden distorsionarse, desgarrarse o agujerearse,
poniendo en peligro la integridad general de la muestra.
Una vez que el tejido está debidamente preparado, suele tratarse con tintes de
colores caracterÃsticos. Estos tintes acentúan diferentes tipos de células y tejidos
con colores especÃficos, lo que facilita a los profesionales médicos la identificación
de caracterÃsticas particulares. Sin embargo, esta mejora en visualización tiene
un alto coste. En ocasiones, los tintes pueden dificultar el análisis informático
de las imágenes al mezclarse de forma inadecuada, traspasarse al fondo o alterar
el contraste entre los distintos elementos.
El último paso del proceso consiste en digitalizar la muestra. Se toman imágenes
de alta resolución del tejido con distintos aumentos, lo que permite su análisis por
ordenador. Esta etapa también tiene sus obstáculos. Factores como una calibración
incorrecta de la cámara o unas condiciones de iluminación inadecuadas pueden
distorsionar o hacer borrosas las imágenes. Además, las imágenes de porta completo
obtenidas so de tamaño considerable, complicando aún más el análisis. En general, si bien la preparación, la tinción y la digitalización de las muestras
de microscopÃa médica son fundamentales para el análisis digital, cada uno de estos
pasos puede introducir retos adicionales que deben abordarse para garantizar un
análisis preciso. Además, convertir un volumen de tejido completo en unas pocas
secciones teñidas reduce drásticamente la información 3D disponible e introduce
una gran incertidumbre.
Las soluciones de aprendizaje profundo (deep learning, DL) son muy prometedoras
en el ámbito de la medicina personalizada, pero su impacto clÃnico a veces se
ve obstaculizado por factores como la limitada generalizabilidad, el sobreajuste, la
opacidad y la falta de interpretabilidad, además de las preocupaciones éticas y en
algunos casos, los incentivos privados. Por otro lado, la variabilidad de las imágenes
histológicas complica el desarrollo de métodos robustos de DL. Para superar estos
retos, esta tesis presenta una serie de métodos altamente robustos e interpretables
basados en principios clásicos de histologÃa, imagen médica, matemáticas y arte,
para alinear secciones de microscopÃa y analizar sus colores.
Nuestra primera contribución es ABANICCO, un innovador método de análisis
de color que ofrece una segmentación de colores objectiva y no supervisada y permite
su posterior refinamiento mediante herramientas fáciles de usar. Se ha demostrado
que la precisión y la eficacia de ABANICCO son superiores a las de los métodos
existentes de clasificación y segmentación del color, e incluso destaca en la detección
y segmentación de objetos completos. ABANICCO puede aplicarse a imágenes
de microscopÃa para detectar áreas teñidas para la cuantificación de biopsias, un
aspecto crucial de la investigación de cáncer.
La segunda contribución es un método automático y no supervisado de segmentación
de tejidos que identifica y elimina el fondo y los artefactos de las
imágenes de microscopÃa, mejorando asà el rendimiento de técnicas más sofisticadas
de análisis de imagen. Este método es robusto frente a diversas imágenes, tinciones
y protocolos de adquisición, y no requiere entrenamiento.
La tercera contribución consiste en el desarrollo de métodos novedosos para
registrar imágenes histopatológicas de forma eficaz, logrando el equilibrio adecuado
entre un registro preciso y la preservación de la morfologÃa local, en función de
la aplicación prevista.
Como cuarta contribución, los tres métodos mencionados se combinan para
crear procedimientos eficientes para la integración completa de datos volumétricos,
creando visualizaciones altamente interpretables de toda la información presente en
secciones consecutivas de biopsia de tejidos. Esta integración de datos puede tener
una gran repercusión en el diagnóstico y el tratamiento de diversas enfermedades,
en particular el cáncer de mama, al permitir la detección precoz, la realización
de pruebas clÃnicas precisas, la selección eficaz de tratamientos y la mejora en la
comunicación el compromiso con los pacientes. Por último, aplicamos nuestros hallazgos a la integración multimodal de datos y
la reconstrucción de tejidos para el análisis preciso de la distribución de elementos
quÃmicos en tuberculosis, lo que arroja luz sobre las complejas interacciones entre
las bacterias, las células huésped y el sistema inmunitario durante la infección
tuberculosa. Este método también aborda problemas como el daño por adquisición,
tÃpico de muchas modalidades de imagen.
En resumen, esta tesis muestra la aplicación de métodos clásicos de visión por
ordenador en el registro de microscopÃa médica y el análisis de color para abordar
los retos únicos de este campo, haciendo hincapié en la visualización eficaz y fácil de
datos complejos. Aspiramos a seguir perfeccionando nuestro trabajo con una amplia
validación técnica y un mejor análisis de los datos. Los métodos presentados en esta
tesis se caracterizan por su claridad, accesibilidad, visualización eficaz de los datos,
objetividad y transparencia. Estas caracterÃsticas los hacen perfectos para tender
puentes robustos entre los investigadores de inteligencia artificial y los clÃnicos e
impulsar asà la patologÃa computacional en la práctica y la investigación médicas.Programa de Doctorado en Ciencia y TecnologÃa Biomédica por la Universidad Carlos III de MadridPresidenta: MarÃa Jesús Ledesma Carbayo.- Secretario: Gonzalo Ricardo RÃos Muñoz.- Vocal: EstÃbaliz Gómez de Marisca
Knowledge representation and text mining in biomedical, healthcare, and political domains
Knowledge representation and text mining can be employed to discover new knowledge and develop services by using the massive amounts of text gathered by modern information systems. The applied methods should take into account the domain-specific nature of knowledge. This thesis explores knowledge representation and text mining in three application domains.
Biomolecular events can be described very precisely and concisely with appropriate representation schemes. Protein–protein interactions are commonly modelled in biological databases as binary relationships, whereas the complex relationships used in text mining are rich in information. The experimental results of this thesis show that complex relationships can be reduced to binary relationships and that it is possible to reconstruct complex relationships from mixtures of linguistically similar relationships. This encourages the extraction of complex relationships from the scientific literature even if binary relationships are required by the application at hand. The experimental results on cross-validation schemes for pair-input data help to understand how existing knowledge regarding dependent instances (such those concerning protein–protein pairs) can be leveraged to improve the generalisation performance estimates of learned models.
Healthcare documents and news articles contain knowledge that is more difficult to model than biomolecular events and tend to have larger vocabularies than biomedical scientific articles. This thesis describes an ontology that models patient education documents and their content in order to improve the availability and quality of such documents. The experimental results of this thesis also show that the Recall-Oriented Understudy for Gisting Evaluation measures are a viable option for the automatic evaluation of textual patient record summarisation methods and that the area under the receiver operating characteristic curve can be used in a large-scale sentiment analysis. The sentiment analysis of Reuters news corpora suggests that the Western mainstream media portrays China negatively in politics-related articles but not in general, which provides new evidence to consider in the debate over the image of China in the Western media
Theory and Applications for Advanced Text Mining
Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
Analysis of Atrial Electrograms
This work provides methods to measure and analyze features of atrial electrograms - especially complex fractionated atrial electrograms (CFAEs) - mathematically. Automated classification of CFAEs into clinical meaningful classes is applied and the newly gained electrogram information is visualized on patient specific 3D models of the atria. Clinical applications of the presented methods showed that quantitative measures of CFAEs reveal beneficial information about the underlying arrhythmia
- …