9,412 research outputs found
Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
Group Analysis of Self-organizing Maps based on Functional MRI using Restricted Frechet Means
Studies of functional MRI data are increasingly concerned with the estimation
of differences in spatio-temporal networks across groups of subjects or
experimental conditions. Unsupervised clustering and independent component
analysis (ICA) have been used to identify such spatio-temporal networks. While
these approaches have been useful for estimating these networks at the
subject-level, comparisons over groups or experimental conditions require
further methodological development. In this paper, we tackle this problem by
showing how self-organizing maps (SOMs) can be compared within a Frechean
inferential framework. Here, we summarize the mean SOM in each group as a
Frechet mean with respect to a metric on the space of SOMs. We consider the use
of different metrics, and introduce two extensions of the classical sum of
minimum distance (SMD) between two SOMs, which take into account the
spatio-temporal pattern of the fMRI data. The validity of these methods is
illustrated on synthetic data. Through these simulations, we show that the
three metrics of interest behave as expected, in the sense that the ones
capturing temporal, spatial and spatio-temporal aspects of the SOMs are more
likely to reach significance under simulated scenarios characterized by
temporal, spatial and spatio-temporal differences, respectively. In addition, a
re-analysis of a classical experiment on visually-triggered emotions
demonstrates the usefulness of this methodology. In this study, the
multivariate functional patterns typical of the subjects exposed to pleasant
and unpleasant stimuli are found to be more similar than the ones of the
subjects exposed to emotionally neutral stimuli. Taken together, these results
indicate that our proposed methods can cast new light on existing data by
adopting a global analytical perspective on functional MRI paradigms.Comment: 23 pages, 5 figures, 4 tables. Submitted to Neuroimag
Precise Proximal Femur Fracture Classification for Interactive Training and Surgical Planning
We demonstrate the feasibility of a fully automatic computer-aided diagnosis
(CAD) tool, based on deep learning, that localizes and classifies proximal
femur fractures on X-ray images according to the AO classification. The
proposed framework aims to improve patient treatment planning and provide
support for the training of trauma surgeon residents. A database of 1347
clinical radiographic studies was collected. Radiologists and trauma surgeons
annotated all fractures with bounding boxes, and provided a classification
according to the AO standard. The proposed CAD tool for the classification of
radiographs into types "A", "B" and "not-fractured", reaches a F1-score of 87%
and AUC of 0.95, when classifying fractures versus not-fractured cases it
improves up to 94% and 0.98. Prior localization of the fracture results in an
improvement with respect to full image classification. 100% of the predicted
centers of the region of interest are contained in the manually provided
bounding boxes. The system retrieves on average 9 relevant images (from the
same class) out of 10 cases. Our CAD scheme localizes, detects and further
classifies proximal femur fractures achieving results comparable to
expert-level and state-of-the-art performance. Our auxiliary localization model
was highly accurate predicting the region of interest in the radiograph. We
further investigated several strategies of verification for its adoption into
the daily clinical routine. A sensitivity analysis of the size of the ROI and
image retrieval as a clinical use case were presented.Comment: Accepted at IPCAI 2020 and IJCAR
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
Characterizing Health-Related Information Needs of Domain Experts (regular paper)
International audienceIn information retrieval literature, understanding the users' intents behind the queries is critically important to gain a better insight of how to select relevant results. While many studies investigated how users in general carry out exploratory health searches in digital environments, a few focused on how are the queries formulated, specifically by domain expert users. This study intends to fill this gap by studying 173 health expert queries issued from 3 medical information retrieval tasks within 2 different evaluation compaigns. A statistical analysis has been carried out to study both variation and correlation of health-query attributes such as length, clarity and specificity of either clinical or non clinical queries. The knowledge gained from the study has an immediate impact on the design of future health information seeking systems
Analysis of biomedical and health queries: Lessons learned from TREC and CLEF evaluation benchmarks
International audienceBACKGROUND:Inherited ichthyoses represent a group of rare skin disorders characterized by scaling, hyperkeratosis and inconstant erythema, involving most of the tegument. Epidemiology remains poorly described. This study aims to evaluate the prevalence of inherited ichthyosis (excluding very mild forms) and its different clinical forms in France.METHODS:Capture - recapture method was used for this study. According to statistical requirements, 3 different lists (reference/competence centres, French association of patients with ichthyosis and internet network) were used to record such patients. The study was conducted in 5 areas during a closed period.RESULTS:The prevalence was estimated at 13.3 per million people (/M) (CI95\%, [10.9 - 17.6]). With regard to autosomal recessive congenital ichthyosis, the prevalence was estimated at 7/M (CI 95\% [5.7 - 9.2]), with a prevalence of lamellar ichthyosis and congenital ichthyosiform erythroderma of 4.5/M (CI 95\% [3.7 - 5.9]) and 1.9/M (CI 95\% [1.6 - 2.6]), respectively. Prevalence of keratinopathic forms was estimated at 1.1/M (CI 95\% [0.9 - 1.5]). Prevalence of syndromic forms (all clinical forms together) was estimated at 1.9/M (CI 95\% [1.6 - 2.6]).CONCLUSIONS:Our results constitute a crucial basis to properly size the necessary health measures that are required to improve patient care and design further clinical studies
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
Recommended from our members
Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through âcrowd-sourcing.â Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for ânext-generation,â high-coverage lexical terminologies.</p
- âŠ