36,296 research outputs found
Spanish named entity recognition in the biomedical domain
Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?
After being collected for patient care, Observational Health Data (OHD) can
further benefit patient well-being by sustaining the development of health
informatics and medical research. Vast potential is unexploited because of the
fiercely private nature of patient-related data and regulations to protect it.
Generative Adversarial Networks (GANs) have recently emerged as a
groundbreaking way to learn generative models that produce realistic synthetic
data. They have revolutionized practices in multiple domains such as
self-driving cars, fraud detection, digital twin simulations in industrial
sectors, and medical imaging.
The digital twin concept could readily apply to modelling and quantifying
disease progression. In addition, GANs posses many capabilities relevant to
common problems in healthcare: lack of data, class imbalance, rare diseases,
and preserving privacy. Unlocking open access to privacy-preserving OHD could
be transformative for scientific research. In the midst of COVID-19, the
healthcare system is facing unprecedented challenges, many of which of are data
related for the reasons stated above.
Considering these facts, publications concerning GAN applied to OHD seemed to
be severely lacking. To uncover the reasons for this slow adoption, we broadly
reviewed the published literature on the subject. Our findings show that the
properties of OHD were initially challenging for the existing GAN algorithms
(unlike medical imaging, for which state-of-the-art model were directly
transferable) and the evaluation synthetic data lacked clear metrics.
We find more publications on the subject than expected, starting slowly in
2017, and since then at an increasing rate. The difficulties of OHD remain, and
we discuss issues relating to evaluation, consistency, benchmarking, data
modelling, and reproducibility.Comment: 31 pages (10 in previous version), not including references and
glossary, 51 in total. Inclusion of a large number of recent publications and
expansion of the discussion accordingl
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
- …