20 research outputs found
Summary of Annotation pipeline and Prediction pipeline.
With annotation pipeline, we applied compound figure classification, subfigure separation, and bar chart classification to obtain bar charts from this sample and then ask annotators to annotate graphical integrity issues on these bar chart. With prediction pipeline, we applied our whole graphical integrity issues detector on this sample. Both sets are similar, as demonstrated by analysis in Fig 2. (XLSX)</p
The likelihood of having graphical integrity issues across each country.
Top three countries as the Netherlands, Spain, and France.</p
Text Localization (figures on the left) and Text Role Classification (figures on the right).
We first used a convolutional neural network (YOLO v4, pre-trained on MS COCO dataset) to localize texts on figures. Then, using text role classification to predict the role of texts for feature engineering. (EPS)</p
Example of graphs with graphical integrity issue.
If the y-axis does not start from zero(as upper two graphs) or there is partially hidden(as lower two graphs), then the bar chart would be labeled as “inappropriate”. (EPS)</p
An example process for predicting violations of the proportional ink principle (see Materials and Methods for details, and our code is in https://github.com/sciosci/graph_check).
A. Input image representing a scientific figure. PubMed Open Access subset provides figures already extracted from the publications. B. Subplot extraction using the YOLO deep learning architecture [51] trained on the hand-annotated dataset (see Materials and Methods). C. Each subplot is extracted from the input image. D. Subfigure plot classification where only bar charts are extracted (E). For each bar chart, we detect a set of low-level features (F), which are later used for predicting whether a bar chart is violating the proportional ink principle (H, yes) or not (I, not).</p
Flowchart of our data source and process.
Predictions and Human Annotations data sets are randomly selected from PubMed Open Access Images. Authors annotated 8,001 bar charts from the human-annotated set, and 4,834 bar charts could be processed by the method pipeline.</p
The likelihood of having graphical integrity issues across each year.
The likelihood of having graphical integrity issues across each year.</p
Example of False Positive cases, which means our method predicted these graphs have graphical integrity issues but actually not.
(EPS)</p
