Search CORE

317,452 research outputs found

Table Detection in the Wild: A Novel Diverse Table Detection Dataset and Method

Author: Dash Siddhant Swaroop
Fande Nikhil
G Sanjay
Haloi Mrinal
Shekhar Shashank
Publication venue
Publication date: 31/08/2022
Field of study

Recent deep learning approaches in table detection achieved outstanding performance and proved to be effective in identifying document layouts. Currently, available table detection benchmarks have many limitations, including the lack of samples diversity, simple table structure, the lack of training cases, and samples quality. In this paper, we introduce a diverse large-scale dataset for table detection with more than seven thousand samples containing a wide variety of table structures collected from many diverse sources. In addition to that, we also present baseline results using a convolutional neural network-based method to detect table structure in documents. Experimental results show the superiority of applying convolutional deep learning methods over classical computer vision-based methods. The introduction of this diverse table detection dataset will enable the community to develop high throughput deep learning methods for understanding document layout and tabular data processing.Comment: Open source Table detection dataset and baseline result

arXiv.org e-Print Archive

Recommended from our members

Deep learning networks find unique mammographic differences in previous negative mammograms between interval and screen-detected cancers: a case-case study.

Author: Fan Bo
Greenwood Heather
Hinton Benjamin
Joe Bonnie
Kerlikowske Karla
Lee Vivian
Ma Lin
Mahmoudzadeh Amir Pasha
Malkov Serghei
Shepherd John
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

BackgroundTo determine if mammographic features from deep learning networks can be applied in breast cancer to identify groups at interval invasive cancer risk due to masking beyond using traditional breast density measures.MethodsFull-field digital screening mammograms acquired in our clinics between 2006 and 2015 were reviewed. Transfer learning of a deep learning network with weights initialized from ImageNet was performed to classify mammograms that were followed by an invasive interval or screen-detected cancer within 12 months of the mammogram. Hyperparameter optimization was performed and the network was visualized through saliency maps. Prediction loss and accuracy were calculated using this deep learning network. Receiver operating characteristic (ROC) curves and area under the curve (AUC) values were generated with the outcome of interval cancer using the deep learning network and compared to predictions from conditional logistic regression with errors quantified through contingency tables.ResultsPre-cancer mammograms of 182 interval and 173 screen-detected cancers were split into training/test cases at an 80/20 ratio. Using Breast Imaging-Reporting and Data System (BI-RADS) density alone, the ability to correctly classify interval cancers was moderate (AUC = 0.65). The optimized deep learning model achieved an AUC of 0.82. Contingency table analysis showed the network was correctly classifying 75.2% of the mammograms and that incorrect classifications were slightly more common for the interval cancer mammograms. Saliency maps of each cancer case found that local information could highly drive classification of cases more than global image information.ConclusionsPre-cancerous mammograms contain imaging information beyond breast density that can be identified with deep learning networks to predict the probability of breast cancer detection

eScholarship - University of California

Oversampling Log Messages Using a Sequence Generative Adversarial Network for Anomaly Detection and Classification

Author: Farzad Amir
Gulliver T. Aaron
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 04/01/2020
Field of study

Dealing with imbalanced data is one of the main challenges in machine/deep learning algorithms for classification. This issue is more important with log message data as it is typically very imbalanced and negative logs are rare. In this paper, a model is proposed to generate text log messages using a SeqGAN network. Then features are extracted using an Autoencoder and anomaly detection is done using a GRU network. The proposed model is evaluated with two imbalanced log data sets, namely BGL and Openstack. Results are presented which show that oversampling and balancing data increases the accuracy of anomaly detection and classification.Comment: 14 pages, 4 figures, 2 table

arXiv.org e-Print Archive

Crossref