20 research outputs found

    Summary of Annotation pipeline and Prediction pipeline.

    No full text
    With annotation pipeline, we applied compound figure classification, subfigure separation, and bar chart classification to obtain bar charts from this sample and then ask annotators to annotate graphical integrity issues on these bar chart. With prediction pipeline, we applied our whole graphical integrity issues detector on this sample. Both sets are similar, as demonstrated by analysis in Fig 2. (XLSX)</p

    The likelihood of having graphical integrity issues across each country.

    No full text
    Top three countries as the Netherlands, Spain, and France.</p

    Text Localization (figures on the left) and Text Role Classification (figures on the right).

    No full text
    We first used a convolutional neural network (YOLO v4, pre-trained on MS COCO dataset) to localize texts on figures. Then, using text role classification to predict the role of texts for feature engineering. (EPS)</p

    Example of graphs with graphical integrity issue.

    No full text
    If the y-axis does not start from zero(as upper two graphs) or there is partially hidden(as lower two graphs), then the bar chart would be labeled as “inappropriate”. (EPS)</p

    An example process for predicting violations of the proportional ink principle (see Materials and Methods for details, and our code is in https://github.com/sciosci/graph_check).

    No full text
    A. Input image representing a scientific figure. PubMed Open Access subset provides figures already extracted from the publications. B. Subplot extraction using the YOLO deep learning architecture [51] trained on the hand-annotated dataset (see Materials and Methods). C. Each subplot is extracted from the input image. D. Subfigure plot classification where only bar charts are extracted (E). For each bar chart, we detect a set of low-level features (F), which are later used for predicting whether a bar chart is violating the proportional ink principle (H, yes) or not (I, not).</p

    Flowchart of our data source and process.

    No full text
    Predictions and Human Annotations data sets are randomly selected from PubMed Open Access Images. Authors annotated 8,001 bar charts from the human-annotated set, and 4,834 bar charts could be processed by the method pipeline.</p

    The likelihood of having graphical integrity issues across each year.

    No full text
    The likelihood of having graphical integrity issues across each year.</p
    corecore