19,556 research outputs found
Damage Vision Mining Opportunity for Imbalanced Anomaly Detection
In past decade, previous balanced datasets have been used to advance
algorithms for classification, object detection, semantic segmentation, and
anomaly detection in industrial applications. Specifically, for condition-based
maintenance, automating visual inspection is crucial to ensure high quality.
Deterioration prognostic attempts to optimize the fine decision process for
predictive maintenance and proactive repair. In civil infrastructure and living
environment, damage data mining cannot avoid the imbalanced data issue because
of rare unseen events and high quality status by improved operations. For
visual inspection, deteriorated class acquired from the surface of concrete and
steel components are occasionally imbalanced. From numerous related surveys, we
summarize that imbalanced data problems can be categorized into four types; 1)
missing range of target and label valuables, 2) majority-minority class
imbalance, 3) foreground-background of spatial imbalance, 4) long-tailed class
of pixel-wise imbalance. Since 2015, there has been many imbalanced studies
using deep learning approaches that includes regression, image classification,
object detection, semantic segmentation. However, anomaly detection for
imbalanced data is not yet well known. In the study, we highlight one-class
anomaly detection application whether anomalous class or not, and demonstrate
clear examples on imbalanced vision datasets: wooden, concrete deterioration,
and disaster damage. We provide key results on damage vision mining advantage,
hypothesizing that the more effective range of positive ratio, the higher
accuracy gain of anomaly detection application. Finally, the applicability of
the damage learning methods, limitations, and future works are mentioned.Comment: 12 pages, 14 figures, 8 table
MCNN-LSTM: Combining CNN and LSTM to classify multi-class text in imbalanced news data
Searching, retrieving, and arranging text in ever-larger document collections necessitate more efficient information processing algorithms. Document categorization is a crucial component of various information processing systems for supervised learning. As the quantity of documents grows, the performance of classic supervised classifiers has deteriorated because of the number of document categories. Assigning documents to a predetermined set of classes is called text classification. It is utilized extensively in a wide range of data-intensive applications. However, the fact that real-world implementations of these models are plagued with shortcomings begs for more investigation. Imbalanced datasets hinder the most prevalent high-performance algorithms. In this paper, we propose an approach name multi-class Convolutional Neural Network (MCNN)-Long Short-Time Memory (LSTM), which combines two deep learning techniques, Convolutional Neural Network (CNN) and Long Short-Time Memory, for text classification in news data. CNN's are used as feature extractors for the LSTMs on text input data and have the spatial structure of words in a sentence, paragraph, or document. The dataset is also imbalanced, and we use the Tomek-Link algorithm to balance the dataset and then apply our model, which shows better performance in terms of F1-score (98%) and Accuracy (99.71%) than the existing works. The combination of deep learning techniques used in our approach is ideal for the classification of imbalanced datasets with underrepresented categories. Hence, our method outperformed other machine learning algorithms in text classification by a large margin. We also compare our results with traditional machine learning algorithms in terms of imbalanced and balanced datasets
Where Have the Litigants Gone?
The recognition of coral species based on underwater texture images pose a
significant difficulty for machine learning algorithms, due to the three
following challenges embedded in the nature of this data: 1) datasets do not
include information about the global structure of the coral; 2) several species
of coral have very similar characteristics; and 3) defining the spatial borders
between classes is difficult as many corals tend to appear together in groups.
For this reason, the classification of coral species has always required an aid
from a domain expert. The objective of this paper is to develop an accurate
classification model for coral texture images. Current datasets contain a large
number of imbalanced classes, while the images are subject to inter-class
variation. We have analyzed 1) several Convolutional Neural Network (CNN)
architectures, 2) data augmentation techniques and 3) transfer learning. We
have achieved the state-of-the art accuracies using different variations of
ResNet on the two current coral texture datasets, EILAT and RSMAS.Comment: 22 pages, 10 figure
High-Resolution Road Vehicle Collision Prediction for the City of Montreal
Road accidents are an important issue of our modern societies, responsible
for millions of deaths and injuries every year in the world. In Quebec only, in
2018, road accidents are responsible for 359 deaths and 33 thousands of
injuries. In this paper, we show how one can leverage open datasets of a city
like Montreal, Canada, to create high-resolution accident prediction models,
using big data analytics. Compared to other studies in road accident
prediction, we have a much higher prediction resolution, i.e., our models
predict the occurrence of an accident within an hour, on road segments defined
by intersections. Such models could be used in the context of road accident
prevention, but also to identify key factors that can lead to a road accident,
and consequently, help elaborate new policies.
We tested various machine learning methods to deal with the severe class
imbalance inherent to accident prediction problems. In particular, we
implemented the Balanced Random Forest algorithm, a variant of the Random
Forest machine learning algorithm in Apache Spark. Interestingly, we found that
in our case, Balanced Random Forest does not perform significantly better than
Random Forest.
Experimental results show that 85% of road vehicle collisions are detected by
our model with a false positive rate of 13%. The examples identified as
positive are likely to correspond to high-risk situations. In addition, we
identify the most important predictors of vehicle collisions for the area of
Montreal: the count of accidents on the same road segment during previous
years, the temperature, the day of the year, the hour and the visibility
Focusing on the Big Picture: Insights into a Systems Approach to Deep Learning for Satellite Imagery
Deep learning tasks are often complicated and require a variety of components
working together efficiently to perform well. Due to the often large scale of
these tasks, there is a necessity to iterate quickly in order to attempt a
variety of methods and to find and fix bugs. While participating in IARPA's
Functional Map of the World challenge, we identified challenges along the
entire deep learning pipeline and found various solutions to these challenges.
In this paper, we present the performance, engineering, and deep learning
considerations with processing and modeling data, as well as underlying
infrastructure considerations that support large-scale deep learning tasks. We
also discuss insights and observations with regard to satellite imagery and
deep learning for image classification.Comment: Accepted to IEEE Big Data 201
- …