18 research outputs found
MCIndoor20000: A fully-labeled image dataset to advance indoor objects detection
A fully-labeled image dataset provides a unique resource for reproducible research inquiries and data analyses in several computational fields, such as computer vision, machine learning and deep learning machine intelligence. With the present contribution, a large-scale fully-labeled image dataset is provided, and made publicly and freely available to the research community. The current dataset entitled MCIndoor20000 includes more than 20,000 digital images from three different indoor object categories, including doors, stairs, and hospital signs. To make a comprehensive dataset addressing current challenges that exist in indoor objects modeling, we cover a multiple set of variations in images, such as rotation, intra-class variation plus various noise models. The current dataset is freely and publicly available at https://github.com/bircatmcri/MCIndoor20000. Keywords: Image dataset, Large-scale dataset, Image classification, Supervised learning, Indoor objects, Deep learnin
3DSEM: A Dataset for 3D SEM Surface Reconstruction
The Scanning Electron Microscope (SEM) as 2D imaging instrument has been widely used in biological, mechanical, and materials sciences to determine the surface attributes (e.g., compositions or geometries) of microscopic specimens. A SEM offers an excellent capability to overcome the limitation of human eyes by achieving increased magnification, contrast, and resolution greater than 1 nanometer. However, SEM micrographs still remain two-dimensional (2D). Having truly three-dimensional (3D) shapes from SEM micrographs would provide anatomic surfaces allowing for quantitative measurements and informative visualization of the objects being investigated. In biology, for example, 3D SEM surface reconstructions would enable researchers to investigate surface characteristics and recognize roughness, flatness, and waviness of a biological structure. There are also various applications in material and mechanical engineering in which 3D representations of material properties would allow us to accurately measure a fractal dimension and surface roughness and design a micro article which needs to fit into a tiny appliance.
3D SEM surface reconstruction employs several computational technologies, such as multi-view geometry, computer vision, optimization strategies, and machine learning to tackle the inverse problem going from 2D to 3D. In this contribution, an attempt is made to provide a 3D microscopy dataset along with the underlying algorithms publicly and freely available at http://selibcv.org/3dsem/ for the research community
3DSEM: A 3D microscopy dataset
The Scanning Electron Microscope (SEM) as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM
SparkText: Biomedical Text Mining on Big Data Framework
<div><p>Background</p><p>Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment.</p><p>Results</p><p>In this study, we designed and developed an efficient text mining framework called SparkText on a <i>Big Data</i> infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed NaĂŻve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes.</p><p>Conclusions</p><p>This study demonstrates the potential for mining large-scale scientific articles on a <i>Big Data</i> infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.</p></div
Comparing the time efficiency results, SparkText outperformed other available text mining tools with speeds up to 132 times faster on the larger dataset that included 29,437 full-text articles.
<p>Comparing the time efficiency results, SparkText outperformed other available text mining tools with speeds up to 132 times faster on the larger dataset that included 29,437 full-text articles.</p
The ROC curves for the dataset “Full-text Articles II”: the area under the curve for the SVM classifier represents a better result compare to that of the Naïve Bayes and Logistic Regression algorithms.
<p>The ROC curves for the dataset “Full-text Articles II”: the area under the curve for the SVM classifier represents a better result compare to that of the Naïve Bayes and Logistic Regression algorithms.</p
An example of a bag-of-words representation.
<p>The terms “biology”, “biopsy”, “biolab”, “biotin”, and “almost” are unigrams, but “cancer-surviv”, and “cancer-stage” are bigrams. Using TF/IDF weighting scores, the feature value of the term “almost” equals to zero.</p
SparkText: Biomedical Text Mining on Big Data Framework - Fig 5
<p><b>Quantitative comparisons of the prediction models on text mining</b>: (A) the accuracy, precision, and recall obtained from 19,681 abstracts; (B) the accuracy, precision, and recall on 12,902 full-text articles; and (C) the accuracy, precision, and recall on 29,437 full-text articles. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0162721#pone.0162721.t002" target="_blank">Table 2</a> provides the details on these 3 datasets. Five-fold cross validation was used in all analyses.</p