3,915 research outputs found
Neural networks versus Logistic regression for 30 days all-cause readmission prediction
Heart failure (HF) is one of the leading causes of hospital admissions in the
US. Readmission within 30 days after a HF hospitalization is both a recognized
indicator for disease progression and a source of considerable financial burden
to the healthcare system. Consequently, the identification of patients at risk
for readmission is a key step in improving disease management and patient
outcome. In this work, we used a large administrative claims dataset to
(1)explore the systematic application of neural network-based models versus
logistic regression for predicting 30 days all-cause readmission after
discharge from a HF admission, and (2)to examine the additive value of
patients' hospitalization timelines on prediction performance. Based on data
from 272,778 (49% female) patients with a mean (SD) age of 73 years (14) and
343,328 HF admissions (67% of total admissions), we trained and tested our
predictive readmission models following a stratified 5-fold cross-validation
scheme. Among the deep learning approaches, a recurrent neural network (RNN)
combined with conditional random fields (CRF) model (RNNCRF) achieved the best
performance in readmission prediction with 0.642 AUC (95% CI, 0.640-0.645).
Other models, such as those based on RNN, convolutional neural networks and CRF
alone had lower performance, with a non-timeline based model (MLP) performing
worst. A competitive model based on logistic regression with LASSO achieved a
performance of 0.643 AUC (95%CI, 0.640-0.646). We conclude that data from
patient timelines improve 30 day readmission prediction for neural
network-based models, that a logistic regression with LASSO has equal
performance to the best neural network model and that the use of administrative
data result in competitive performance compared to published approaches based
on richer clinical datasets
HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS
The objective of the research to be pursued is to develop a schema for representing raster-digitized (scanned) documents, The representation is to retain not only the spatial structure of a printed document, but should also facilitate automatic labeling of various components, such as text, figures, subtitles, and figure captions, and allow the extraction of important relationships (such as reading order) among them. Intended applications include (1) data compression for document transmission and archival, and (2) document entry, with out rekeying, into editing, formatting, and information retrieval systems
TWENTY QUESTIONS FOR DOCUMENT CLASSIFICATION
Documents – manuscripts, books, magazines, newspapers, sheet music, circuit diagrams, checks, web pages, email attachments, music- CDs, videos, and cuneiform - mirror the culture of the time and serve as the primary source of historical record. Although it seems natural to classify documents according to format before examining their content, form and function are often intertwined. The design of a document interpretation system must take both into consideration.
What are the essential parameters of a document interpretation system? What needs to be known before undertaking the design or purchase of such a system? What is the interrelationship of the client, the document, and the desired information? In other words, what is the range of issues of possible interest to our research community? In order to highlight the tacit assumptions implicit in the document analysis literature, we will start with tabula rasa and invite the workshop participants to join us in a game of Twenty Questions
Segmenting Tables via Indexing of Value Cells by Table Headers
Correct segmentation of a web table into its component regions is the essential first step to understanding tabular data. Our algorithmic solution to the segmentation problem relies on the property that strings defining row and column header paths uniquely index each data cell in the table. We segment the table using only “logical layout analysis” without resorting to any appearance features or natural language understanding. We start with a CSV table that preserves the 2- dimensional structure and contents of the original source table (e.g., an HTML table) but not font size, font weight, and color. The indexing property of table headers implies a four-quadrant partitioning of the table about a minimum index point. The algorithm finds the index point through an efficient guided search. Experimental results on a 200-table benchmark demonstrate the generality of the algorithm in handling a variety of table styles and forms
Is Android or iPhone the Platform for Innovation in Imaging Informatics
It is clear that ubiquitous mobile computing platforms will be a disruptive technology in the delivery of healthcare in the near future. While radiologists are fairly sedentary, their customers, the referring physicians, and the patients are not. The need for closer collaboration and interaction with referring physicians is seen as a key to maintaining relationships and integrating tightly with the patient management team. While today, patients have to settle for their images on a CD, in short time, they will be taking them home on their cell phone. As PACS vendors are moving ever outward in the enterprise, they are already actively developing clients on mobile platforms. Two major contenders are the Apple’s iPhone and the Android platform developed by Google. These two designs represent two entirely different architectures and business models
Decoding Substitution Ciphers by Means of Word Matching with Application to OCR
A substitution cipher consists of a block of natural language text where each letter of the alphabet has been replaced by a distinct symbol. As a problem in cryptography, the substitution cipher is of limited interest, but it has an important application in optical character recognition. Recent advances render it quite feasible to scan documents with a fairly complex layout and to classify (cluster) the printed characters into distinct groups according to their shape. However, given the immense variety of type styles and forms in current use, it is not possible to assign alphabetical identities to characters of arbitrary size and typeface. This gap can be bridged by solving the equivalent of a substitution cipher problem, thereby opening up the possibility of automatic translation of a scanned document into a standard character code, such as ASCII. Earlier methods relying on letter n-gram frequencies require a substantial amount of ciphertext for accurate n-gram estimates. A dictionary-based approach solves the problem using relatively small ciphertext samples and a dictionary of fewer than 500 words. Our heuristic backtrack algorithm typically visits only a few hundred among the 26! possible nodes on sample texts ranging from 100 to 600 words
Multi-Character Field Recognition for Arabic and Chinese Handwriting
Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references
- …