Search CORE

12 research outputs found

Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective

Author: Lee Jae-Gil
Roh Yuji
Song Hwanjun
Whang Steven Euijong
Publication venue
Publication date: 04/08/2022
Field of study

Data-centric AI is at the center of a fundamental shift in software engineering where machine learning becomes the new software, powered by big data and computing infrastructure. Here software engineering needs to be re-thought where data becomes a first-class citizen on par with code. One striking observation is that a significant portion of the machine learning process is spent on data preparation. Without good data, even the best machine learning algorithms cannot perform well. As a result, data-centric AI practices are now becoming mainstream. Unfortunately, many datasets in the real world are small, dirty, biased, and even poisoned. In this survey, we study the research landscape for data collection and data quality primarily for deep learning applications. Data collection is important because there is lesser need for feature engineering for recent deep learning approaches, but instead more need for large amounts of data. For data quality, we study data validation, cleaning, and integration techniques. Even if the data cannot be fully cleaned, we can still cope with imperfect data during model training using robust model training techniques. In addition, while bias and fairness have been less studied in traditional data management research, these issues become essential topics in modern machine learning applications. We thus study fairness measures and unfairness mitigation techniques that can be applied before, during, or after model training. We believe that the data management community is well poised to solve these problems

arXiv.org e-Print Archive

Inspector Gadget: A Data Programming-based Labeling System for Industrial Images

Author: Heo Geon
Hwang Seonghyeon
Lee Dayun
Roh Yuji
Whang Steven Euijong
Publication venue
Publication date: 21/08/2020
Field of study

As machine learning for images becomes democratized in the Software 2.0 era, one of the serious bottlenecks is securing enough labeled data for training. This problem is especially critical in a manufacturing setting where smart factories rely on machine learning for product quality control by analyzing industrial images. Such images are typically large and may only need to be partially analyzed where only a small portion is problematic (e.g., identifying defects on a surface). Since manual labeling these images is expensive, weak supervision is an attractive alternative where the idea is to generate weak labels that are not perfect, but can be produced at scale. Data programming is a recent paradigm in this category where it uses human knowledge in the form of labeling functions and combines them into a generative model. Data programming has been successful in applications based on text or structured data and can also be applied to images usually if one can find a way to convert them into structured data. In this work, we expand the horizon of data programming by directly applying it to images without this conversion, which is a common scenario for industrial applications. We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification. We perform experiments on real industrial image datasets and show that Inspector Gadget obtains better performance than other weak-labeling techniques: Snuba, GOGGLES, and self-learning baselines using convolutional neural networks (CNNs) without pre-training.Comment: 10 pages, 11 figure

arXiv.org e-Print Archive

DeepHealthNet: Adolescent Obesity Prediction System Based on a Deep Learning Framework

Author: Jeong Ji-Hoon
Kam Tae-Eui
Kim Sung-Kyung
Lee Euijong
Lee In-Gyu
Lee Seong-Whan
Publication venue
Publication date: 30/08/2023
Field of study

Childhood and adolescent obesity rates are a global concern because obesity is associated with chronic diseases and long-term health risks. Artificial intelligence technology has emerged as a promising solution to accurately predict obesity rates and provide personalized feedback to adolescents. This study emphasizes the importance of early identification and prevention of obesity-related health issues. Factors such as height, weight, waist circumference, calorie intake, physical activity levels, and other relevant health information need to be considered for developing robust algorithms for obesity rate prediction and delivering personalized feedback. Hence, by collecting health datasets from 321 adolescents, we proposed an adolescent obesity prediction system that provides personalized predictions and assists individuals in making informed health decisions. Our proposed deep learning framework, DeepHealthNet, effectively trains the model using data augmentation techniques, even when daily health data are limited, resulting in improved prediction accuracy (acc: 0.8842). Additionally, the study revealed variations in the prediction of the obesity rate between boys (acc: 0.9320) and girls (acc: 0.9163), allowing the identification of disparities and the determination of the optimal time to provide feedback. The proposed system shows significant potential in effectively addressing childhood and adolescent obesity

arXiv.org e-Print Archive

Self-Adaptive Framework Based on MAPE Loop for Internet of Things

Author: Euijong Lee
Young-Duk Seo
Young-Gab Kim
Publication venue: 'MDPI AG'
Publication date: 07/07/2019
Field of study

The Internet of Things (IoT) connects a wide range of objects and the types of environments in which IoT can be deployed dynamically change. Therefore, these environments can be modified dynamically at runtime considering the emergence of other requirements. Self-adaptive software alters its behavior to satisfy the requirements in a dynamic environment. In this context, the concept of self-adaptive software is suitable for some dynamic IoT environments (e.g., smart greenhouses, smart homes, and reality applications). In this study, we propose a self-adaptive framework for decision-making in an IoT environment at runtime. The framework comprises a finite-state machine model design and a game theoretic decision-making method for extracting efficient strategies. The framework was implemented as a prototype and experiments were conducted to evaluate its runtime performance. The results demonstrate that the proposed framework can be applied to IoT environments at runtime. In addition, a smart greenhouse-based use case is included to illustrate the usability of the proposed framework

Multidisciplinary Digital Publishing Institute

A deep learning-based framework for predicting pork preference

Author: Euijong Lee
Eunyoung Ko
Hongseok Oh
Jungseok Choi
Kyungchang Jeong
Yunhwan Park
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Meat consumption per capita in South Korea has steadily increased over the last several years and is predicted to continue increasing. Up to 69.5% of Koreans eat pork at least once a week. Considering pork-related products produced and imported in Korea, Korean consumers have a high preference for high-fat parts, such as pork belly. Managing the high-fat portions of domestically produced and imported meat according to consumer needs has become a competitive factor. Therefore, this study presents a deep learning-based framework for predicting the flavor and appearance preference scores of the customers based on the characteristic information of pork using ultrasound equipment. The characteristic information is collected using ultrasound equipment (AutoFom III). Subsequently, according to the measured information, consumers’ preferences for flavor and appearance were directly investigated for a long period and predicted using a deep learning methodology. For the first time, we have applied a deep neural network-based ensemble technique to predict consumer preference scores according to the measured pork carcasses. To demonstrate the efficiency of the proposed framework, an empirical evaluation was conducted using a survey and data on pork belly preference. Experimental results indicate a strong relationship between the predicted preference scores and characteristics of pork belly

Directory of Open Access Journals

A Comprehensive Survey on Security and Privacy for Electronic Health Data

Author: Euijong Lee
Se-Ra Oh
Young-Duk Seo
Young-Gab Kim
Publication venue: 'MDPI AG'
Publication date: 01/09/2021
Field of study

Recently, the integration of state-of-the-art technologies, such as modern sensors, networks, and cloud computing, has revolutionized the conventional healthcare system. However, security concerns have increasingly been emerging due to the integration of technologies. Therefore, the security and privacy issues associated with e-health data must be properly explored. In this paper, to investigate the security and privacy of e-health systems, we identified major components of the modern e-health systems (i.e., e-health data, medical devices, medical networks and edge/fog/cloud). Then, we reviewed recent security and privacy studies that focus on each component of the e-health systems. Based on the review, we obtained research taxonomy, security concerns, requirements, solutions, research trends, and open challenges for the components with strengths and weaknesses of the analyzed studies. In particular, edge and fog computing studies for e-health security and privacy were reviewed since the studies had mostly not been analyzed in other survey papers

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Self-Adaptive Framework Based on MAPE Loop for Internet of Things

Author: Euijong Lee
Kim
Muccini
Nisan
Rayes
Straffin
Welsh
Weyns
Young-Duk Seo
Young-Gab Kim
Publication venue: 'MDPI AG'
Publication date
Field of study

Crossref