Search CORE

5,057 research outputs found

Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective

Author: Luo Yuan
Szolovits Peter
Publication venue
Publication date: 14/11/2018
Field of study

This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes. These experiments showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape

arXiv.org e-Print Archive

DSpace@MIT

Crossref

WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks

Author: Consonni Cristian
Laniado David
Montresor Alberto
Publication venue
Publication date: 04/04/2019
Field of study

Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the

9

largest language editions. The dataset contains yearly snapshots of the network and spans

17

years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.Comment: 10 pages, 3 figures, 7 tables, LaTeX. Final camera-ready version accepted at the 13TH International AAAI Conference on Web and Social Media (ICWSM 2019) - Munich (Germany), 11-14 June 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Automatic Population of Structured Reports from Narrative Pathology Reports

Author: Ou Ying
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2015
Field of study

There are a number of advantages for the use of structured pathology reports: they can ensure the accuracy and completeness of pathology reporting; it is easier for the referring doctors to glean pertinent information from them. The goal of this thesis is to extract pertinent information from free-text pathology reports and automatically populate structured reports for cancer diseases and identify the commonalities and differences in processing principles to obtain maximum accuracy. Three pathology corpora were annotated with entities and relationships between the entities in this study, namely the melanoma corpus, the colorectal cancer corpus and the lymphoma corpus. A supervised machine-learning based-approach, utilising conditional random fields learners, was developed to recognise medical entities from the corpora. By feature engineering, the best feature configurations were attained, which boosted the F-scores significantly from 4.2% to 6.8% on the training sets. Without proper negation and uncertainty detection, the quality of the structured reports will be diminished. The negation and uncertainty detection modules were built to handle this problem. The modules obtained overall F-scores ranging from 76.6% to 91.0% on the test sets. A relation extraction system was presented to extract four relations from the lymphoma corpus. The system achieved very good performance on the training set, with 100% F-score obtained by the rule-based module and 97.2% F-score attained by the support vector machines classifier. Rule-based approaches were used to generate the structured outputs and populate them to predefined templates. The rule-based system attained over 97% F-scores on the training sets. A pipeline system was implemented with an assembly of all the components described above. It achieved promising results in the end-to-end evaluations, with 86.5%, 84.2% and 78.9% F-scores on the melanoma, colorectal cancer and lymphoma test sets respectively

Sydney eScholarship

A proposal for a coordinated effort for the determination of brainwide neuroanatomical connectivity in model organisms at a mesoscopic scale

Author: A MacKenzie-Graham
A Reiner
A Vercelli
A Visel
Allan Jones
AM Hattox
Arthur W. Toga
AW Toga
AY Hardan
B Egaas
B Horwitz
BL Davidson
Brett D. Mensh
Bruce W. Stillman
C Gustafson
C Kobbert
Caizhi Wu
CL Veenman
Claus C. Hilgetag
Clifford B. Saper
CR Gerfen
D Atasoy
DA Benson
Daniel G. Herrera
David C. Van Essen
David Kleinfeld
DC Van Essen
DC Van Essen
DL Sparks
E Miyashita
ED Jarvis
Edward G. Jones
EM Callaway
ES Lein
ET Bullmore
F Castelli
F Crick
G Aston-Jones
H Markram
Hans C. Breiter
Harvey J. Karten
HC Breiter
Helen Barbas
Hemant Bokil
Henry A. Lester
Hollis T. Cline
IR Wickersham
J DeFalco
J Dejerine
J Panksepp
J Panksepp
Jaak Panksepp
James D. Watson
Jason W. Bohland
JD Schmahmann
Jeremy D. Schmahmann
JF Démonet
JG Bjaalie
JG Bjaalie
JG Bjaalie
JG White
JL Lanciego
JM Lin
John C. Doyle
John M. Lin
Joseph L. Price
Joseph Safdieh
K Oishi
K Wernicke
Karel Svoboda
KE Stephan
KE Stephan
L Ng
L Stein
Larry W. Swanson
LM Coolen
M Bota
M Bota
M Bota
M Murias
MA Just
MD Johnson
MI Ekstrand
Michael Hawrylycz
Mihail Bota
MJ Swift
N Geschwind
Nicholas D. Schiff
O Sporns
Olaf Sporns
Partha P. Mitra
Peter J. Freed
PH Luppi
PJ Broser
R Kotter
R Kotter
Ralph J. Greenspan
RH Güting
RM Kelly
Rolf Kötter
RW Baughman
S Folstein
S Lillehaug
S Mikula
Shawn Mikula
Suzanne N. Haber
U Burgel
U Frith
V Grinevich
Z. Josh Huang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2009
Field of study

In this era of complete genomes, our knowledge of neuroanatomical circuitry remains surprisingly sparse. Such knowledge is however critical both for basic and clinical research into brain function. Here we advocate for a concerted effort to fill this gap, through systematic, experimental mapping of neural circuits at a mesoscopic scale of resolution suitable for comprehensive, brain-wide coverage, using injections of tracers or viral vectors. We detail the scientific and medical rationale and briefly review existing knowledge and experimental techniques. We define a set of desiderata, including brain-wide coverage; validated and extensible experimental techniques suitable for standardization and automation; centralized, open access data repository; compatibility with existing resources, and tractability with current informatics technology. We discuss a hypothetical but tractable plan for mouse, additional efforts for the macaque, and technique development for human. We estimate that the mouse connectivity project could be completed within five years with a comparatively modest budget.Comment: 41 page

Cold Spring Harbor Laboratory Institutional Repository

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

Caltech Authors

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

PubMed Central

Proceedings, MSVSCC 2018

Author: Old Dominion University Department of Modeling, Simulation & Visualization Engineering
Old Dominion University Virginia Modeling, Analysis & Simulation Center
Publication venue: ODU Digital Commons
Publication date: 19/04/2018
Field of study

Proceedings of the 12th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 19, 2018 at VMASC in Suffolk, Virginia. 155 pp

Old Dominion University

Extraction of information from unstructured text

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Theory and Applications for Advanced Text Mining

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

Directory of Open Access Books (DOAB)

Integrating Multiple Sketch Recognition Methods to Improve Accuracy and Speed

Author: Copesetty Siddhartha Karthik
Publication venue
Publication date: 27/02/2020
Field of study

Sketch recognition is the computer understanding of hand drawn diagrams. Recognizing sketches instantaneously is necessary to build beautiful interfaces with real time feedback. There are various techniques to quickly recognize sketches into ten or twenty classes. However for much larger datasets of sketches from a large number of classes, these existing techniques can take an extended period of time to accurately classify an incoming sketch and require significant computational overhead. Thus, to make classification of large datasets feasible, we propose using multiple stages of recognition. In the initial stage, gesture-based feature values are calculated and the trained model is used to classify the incoming sketch. Sketches with an accuracy less than a threshold value, go through a second stage of geometric recognition techniques. In the second geometric stage, the sketch is segmented, and sent to shape-specific recognizers. The sketches are matched against predefined shape descriptions, and confidence values are calculated. The system outputs a list of classes that the sketch could be classified as, along with the accuracy, and precision for each sketch. This process both significantly reduces the time taken to classify such huge datasets of sketches, and increases both the accuracy and precision of the recognition

Texas A&M Repository

Integrating Multiple Sketch Recognition Methods to Improve Accuracy and Speed

Author: Copesetty Siddhartha Karthik
Publication venue
Publication date: 27/02/2020
Field of study

Strengths and limitations of the draft classification of public health interventions in the World Health Organization’s International Classification of Health Interventions: A developmental appraisal

Author: Fortune Elizabeth Nicola
Publication venue: 'University of Sarajevo Faculty of Health Sciences'
Publication date: 31/08/2018
Field of study

Statistical classifications provide a basis for collecting and analysing data, for building knowledge and communicating information. A classification of public health interventions is being developed as part of the World Health Organization’s International Classification of Health Interventions (ICHI). This is a pioneering development, as there have been no previous efforts to produce a standard classification of public health interventions. A comprehensive developmental appraisal of the draft classification of public health interventions was undertaken to gain an understanding of its strengths and limitations, and to identify what should be done to improve its utility. The classification was used to code three data sets of public health interventions, to identify problems encountered and to assess inter-coder reliability. Views of potential users were elicited through key-informant interviews. An analytical structure was developed, comprising a set of criteria concerning the desired features of a statistical classification and a model representing the main elements that make up a statistical classification. ICHI was found to have some utility for representing data on public health interventions. Limitations identified included coverage gaps, overlap between categories, lack of clarity concerning how the classification axes are operationalised for public health interventions, and difficulty splitting complex interventions into their constituent components for coding. This study makes a significant and timely contribution to the development of the draft classification, by providing specific proposals for improvements to ICHI, explicating some fundamental conceptual issues that should be addressed, and indicating a path forward for the further development and use of ICHI in the field of public health. The analytical structure developed through the conduct of this research represents a novel methodological contribution to the field of classification development

Sydney eScholarship