12 research outputs found
Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective
Data-centric AI is at the center of a fundamental shift in software
engineering where machine learning becomes the new software, powered by big
data and computing infrastructure. Here software engineering needs to be
re-thought where data becomes a first-class citizen on par with code. One
striking observation is that a significant portion of the machine learning
process is spent on data preparation. Without good data, even the best machine
learning algorithms cannot perform well. As a result, data-centric AI practices
are now becoming mainstream. Unfortunately, many datasets in the real world are
small, dirty, biased, and even poisoned. In this survey, we study the research
landscape for data collection and data quality primarily for deep learning
applications. Data collection is important because there is lesser need for
feature engineering for recent deep learning approaches, but instead more need
for large amounts of data. For data quality, we study data validation,
cleaning, and integration techniques. Even if the data cannot be fully cleaned,
we can still cope with imperfect data during model training using robust model
training techniques. In addition, while bias and fairness have been less
studied in traditional data management research, these issues become essential
topics in modern machine learning applications. We thus study fairness measures
and unfairness mitigation techniques that can be applied before, during, or
after model training. We believe that the data management community is well
poised to solve these problems
Inspector Gadget: A Data Programming-based Labeling System for Industrial Images
As machine learning for images becomes democratized in the Software 2.0 era,
one of the serious bottlenecks is securing enough labeled data for training.
This problem is especially critical in a manufacturing setting where smart
factories rely on machine learning for product quality control by analyzing
industrial images. Such images are typically large and may only need to be
partially analyzed where only a small portion is problematic (e.g., identifying
defects on a surface). Since manual labeling these images is expensive, weak
supervision is an attractive alternative where the idea is to generate weak
labels that are not perfect, but can be produced at scale. Data programming is
a recent paradigm in this category where it uses human knowledge in the form of
labeling functions and combines them into a generative model. Data programming
has been successful in applications based on text or structured data and can
also be applied to images usually if one can find a way to convert them into
structured data. In this work, we expand the horizon of data programming by
directly applying it to images without this conversion, which is a common
scenario for industrial applications. We propose Inspector Gadget, an image
labeling system that combines crowdsourcing, data augmentation, and data
programming to produce weak labels at scale for image classification. We
perform experiments on real industrial image datasets and show that Inspector
Gadget obtains better performance than other weak-labeling techniques: Snuba,
GOGGLES, and self-learning baselines using convolutional neural networks (CNNs)
without pre-training.Comment: 10 pages, 11 figure
DeepHealthNet: Adolescent Obesity Prediction System Based on a Deep Learning Framework
Childhood and adolescent obesity rates are a global concern because obesity
is associated with chronic diseases and long-term health risks. Artificial
intelligence technology has emerged as a promising solution to accurately
predict obesity rates and provide personalized feedback to adolescents. This
study emphasizes the importance of early identification and prevention of
obesity-related health issues. Factors such as height, weight, waist
circumference, calorie intake, physical activity levels, and other relevant
health information need to be considered for developing robust algorithms for
obesity rate prediction and delivering personalized feedback. Hence, by
collecting health datasets from 321 adolescents, we proposed an adolescent
obesity prediction system that provides personalized predictions and assists
individuals in making informed health decisions. Our proposed deep learning
framework, DeepHealthNet, effectively trains the model using data augmentation
techniques, even when daily health data are limited, resulting in improved
prediction accuracy (acc: 0.8842). Additionally, the study revealed variations
in the prediction of the obesity rate between boys (acc: 0.9320) and girls
(acc: 0.9163), allowing the identification of disparities and the determination
of the optimal time to provide feedback. The proposed system shows significant
potential in effectively addressing childhood and adolescent obesity
Self-Adaptive Framework Based on MAPE Loop for Internet of Things
The Internet of Things (IoT) connects a wide range of objects and the types of environments in which IoT can be deployed dynamically change. Therefore, these environments can be modified dynamically at runtime considering the emergence of other requirements. Self-adaptive software alters its behavior to satisfy the requirements in a dynamic environment. In this context, the concept of self-adaptive software is suitable for some dynamic IoT environments (e.g., smart greenhouses, smart homes, and reality applications). In this study, we propose a self-adaptive framework for decision-making in an IoT environment at runtime. The framework comprises a finite-state machine model design and a game theoretic decision-making method for extracting efficient strategies. The framework was implemented as a prototype and experiments were conducted to evaluate its runtime performance. The results demonstrate that the proposed framework can be applied to IoT environments at runtime. In addition, a smart greenhouse-based use case is included to illustrate the usability of the proposed framework
A deep learning-based framework for predicting pork preference
Meat consumption per capita in South Korea has steadily increased over the last several years and is predicted to continue increasing. Up to 69.5% of Koreans eat pork at least once a week. Considering pork-related products produced and imported in Korea, Korean consumers have a high preference for high-fat parts, such as pork belly. Managing the high-fat portions of domestically produced and imported meat according to consumer needs has become a competitive factor. Therefore, this study presents a deep learning-based framework for predicting the flavor and appearance preference scores of the customers based on the characteristic information of pork using ultrasound equipment. The characteristic information is collected using ultrasound equipment (AutoFom III). Subsequently, according to the measured information, consumers’ preferences for flavor and appearance were directly investigated for a long period and predicted using a deep learning methodology. For the first time, we have applied a deep neural network-based ensemble technique to predict consumer preference scores according to the measured pork carcasses. To demonstrate the efficiency of the proposed framework, an empirical evaluation was conducted using a survey and data on pork belly preference. Experimental results indicate a strong relationship between the predicted preference scores and characteristics of pork belly
A Comprehensive Survey on Security and Privacy for Electronic Health Data
Recently, the integration of state-of-the-art technologies, such as modern sensors, networks, and cloud computing, has revolutionized the conventional healthcare system. However, security concerns have increasingly been emerging due to the integration of technologies. Therefore, the security and privacy issues associated with e-health data must be properly explored. In this paper, to investigate the security and privacy of e-health systems, we identified major components of the modern e-health systems (i.e., e-health data, medical devices, medical networks and edge/fog/cloud). Then, we reviewed recent security and privacy studies that focus on each component of the e-health systems. Based on the review, we obtained research taxonomy, security concerns, requirements, solutions, research trends, and open challenges for the components with strengths and weaknesses of the analyzed studies. In particular, edge and fog computing studies for e-health security and privacy were reviewed since the studies had mostly not been analyzed in other survey papers