12 research outputs found

    Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective

    Full text link
    Data-centric AI is at the center of a fundamental shift in software engineering where machine learning becomes the new software, powered by big data and computing infrastructure. Here software engineering needs to be re-thought where data becomes a first-class citizen on par with code. One striking observation is that a significant portion of the machine learning process is spent on data preparation. Without good data, even the best machine learning algorithms cannot perform well. As a result, data-centric AI practices are now becoming mainstream. Unfortunately, many datasets in the real world are small, dirty, biased, and even poisoned. In this survey, we study the research landscape for data collection and data quality primarily for deep learning applications. Data collection is important because there is lesser need for feature engineering for recent deep learning approaches, but instead more need for large amounts of data. For data quality, we study data validation, cleaning, and integration techniques. Even if the data cannot be fully cleaned, we can still cope with imperfect data during model training using robust model training techniques. In addition, while bias and fairness have been less studied in traditional data management research, these issues become essential topics in modern machine learning applications. We thus study fairness measures and unfairness mitigation techniques that can be applied before, during, or after model training. We believe that the data management community is well poised to solve these problems

    Inspector Gadget: A Data Programming-based Labeling System for Industrial Images

    Full text link
    As machine learning for images becomes democratized in the Software 2.0 era, one of the serious bottlenecks is securing enough labeled data for training. This problem is especially critical in a manufacturing setting where smart factories rely on machine learning for product quality control by analyzing industrial images. Such images are typically large and may only need to be partially analyzed where only a small portion is problematic (e.g., identifying defects on a surface). Since manual labeling these images is expensive, weak supervision is an attractive alternative where the idea is to generate weak labels that are not perfect, but can be produced at scale. Data programming is a recent paradigm in this category where it uses human knowledge in the form of labeling functions and combines them into a generative model. Data programming has been successful in applications based on text or structured data and can also be applied to images usually if one can find a way to convert them into structured data. In this work, we expand the horizon of data programming by directly applying it to images without this conversion, which is a common scenario for industrial applications. We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification. We perform experiments on real industrial image datasets and show that Inspector Gadget obtains better performance than other weak-labeling techniques: Snuba, GOGGLES, and self-learning baselines using convolutional neural networks (CNNs) without pre-training.Comment: 10 pages, 11 figure

    DeepHealthNet: Adolescent Obesity Prediction System Based on a Deep Learning Framework

    Full text link
    Childhood and adolescent obesity rates are a global concern because obesity is associated with chronic diseases and long-term health risks. Artificial intelligence technology has emerged as a promising solution to accurately predict obesity rates and provide personalized feedback to adolescents. This study emphasizes the importance of early identification and prevention of obesity-related health issues. Factors such as height, weight, waist circumference, calorie intake, physical activity levels, and other relevant health information need to be considered for developing robust algorithms for obesity rate prediction and delivering personalized feedback. Hence, by collecting health datasets from 321 adolescents, we proposed an adolescent obesity prediction system that provides personalized predictions and assists individuals in making informed health decisions. Our proposed deep learning framework, DeepHealthNet, effectively trains the model using data augmentation techniques, even when daily health data are limited, resulting in improved prediction accuracy (acc: 0.8842). Additionally, the study revealed variations in the prediction of the obesity rate between boys (acc: 0.9320) and girls (acc: 0.9163), allowing the identification of disparities and the determination of the optimal time to provide feedback. The proposed system shows significant potential in effectively addressing childhood and adolescent obesity

    Self-Adaptive Framework Based on MAPE Loop for Internet of Things

    No full text
    The Internet of Things (IoT) connects a wide range of objects and the types of environments in which IoT can be deployed dynamically change. Therefore, these environments can be modified dynamically at runtime considering the emergence of other requirements. Self-adaptive software alters its behavior to satisfy the requirements in a dynamic environment. In this context, the concept of self-adaptive software is suitable for some dynamic IoT environments (e.g., smart greenhouses, smart homes, and reality applications). In this study, we propose a self-adaptive framework for decision-making in an IoT environment at runtime. The framework comprises a finite-state machine model design and a game theoretic decision-making method for extracting efficient strategies. The framework was implemented as a prototype and experiments were conducted to evaluate its runtime performance. The results demonstrate that the proposed framework can be applied to IoT environments at runtime. In addition, a smart greenhouse-based use case is included to illustrate the usability of the proposed framework

    A deep learning-based framework for predicting pork preference

    No full text
    Meat consumption per capita in South Korea has steadily increased over the last several years and is predicted to continue increasing. Up to 69.5% of Koreans eat pork at least once a week. Considering pork-related products produced and imported in Korea, Korean consumers have a high preference for high-fat parts, such as pork belly. Managing the high-fat portions of domestically produced and imported meat according to consumer needs has become a competitive factor. Therefore, this study presents a deep learning-based framework for predicting the flavor and appearance preference scores of the customers based on the characteristic information of pork using ultrasound equipment. The characteristic information is collected using ultrasound equipment (AutoFom III). Subsequently, according to the measured information, consumers’ preferences for flavor and appearance were directly investigated for a long period and predicted using a deep learning methodology. For the first time, we have applied a deep neural network-based ensemble technique to predict consumer preference scores according to the measured pork carcasses. To demonstrate the efficiency of the proposed framework, an empirical evaluation was conducted using a survey and data on pork belly preference. Experimental results indicate a strong relationship between the predicted preference scores and characteristics of pork belly

    A Comprehensive Survey on Security and Privacy for Electronic Health Data

    No full text
    Recently, the integration of state-of-the-art technologies, such as modern sensors, networks, and cloud computing, has revolutionized the conventional healthcare system. However, security concerns have increasingly been emerging due to the integration of technologies. Therefore, the security and privacy issues associated with e-health data must be properly explored. In this paper, to investigate the security and privacy of e-health systems, we identified major components of the modern e-health systems (i.e., e-health data, medical devices, medical networks and edge/fog/cloud). Then, we reviewed recent security and privacy studies that focus on each component of the e-health systems. Based on the review, we obtained research taxonomy, security concerns, requirements, solutions, research trends, and open challenges for the components with strengths and weaknesses of the analyzed studies. In particular, edge and fog computing studies for e-health security and privacy were reviewed since the studies had mostly not been analyzed in other survey papers
    corecore