1,195 research outputs found

    Quality of Information in Mobile Crowdsensing: Survey and Research Challenges

    Full text link
    Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary applications based on the mobile crowdsensing paradigm, such as real-time road traffic monitoring, air and noise pollution, crime control, and wildlife monitoring, just to name a few. Differently from prior sensing paradigms, humans are now the primary actors of the sensing process, since they become fundamental in retrieving reliable and up-to-date information about the event being monitored. As humans may behave unreliably or maliciously, assessing and guaranteeing Quality of Information (QoI) becomes more important than ever. In this paper, we provide a new framework for defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the current state-of-the-art on the topic. We also outline novel research challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN

    DataPerf: Benchmarks for Data-Centric AI Development

    Full text link
    Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.Comment: NeurIPS 2023 Datasets and Benchmarks Trac

    객체 인식의 레이블 효율적 학습

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2023. 2. 윤성로.딥러닝의 발전은 이미지 물체 인식 분야를 크게 발전시켰다. 하지만 이러한 발전은 수많은 학습 이미지와 각 이미지에 사람이 직접 생성한 물체의 위치 정보에 대한 레이블 덕분에 가능한 것이였다. 이미지 물체 인식 분야를 실생활에서 활용하기 위해서는 다양한 물체의 카테고리를 인식 할 수 있어야 하며, 이를 위해선 각 카테고리당 수많은 학습 데이터가 필요하다. 하지만 각 이미지당 물체의 위치를 각 픽셀마다 주석을 다는 것은 많은 비용이 들어간다. 이러한 정보를 얻을 때 필요한 비용은 약한지도학습으로 줄일 수 있다. 약한 지도 학습이란, 물체의 명시적인 위치 정보를 포함하는 레이블보다 더 값싸게 얻을 수는 있지만, 약한 위치 정보를 활용하여 뉴럴네트워크를 학습하는 것이다. 본 학위논문에서는 물체의 카테고리 정보, 학습 외 분포 데이터 (out-of-distribution) 데이터, 그리고 물체의 박스 레이블을 활용하는 약한지도학습 방법론들을 다룬다. 첫 번째로, 물체의 카테고리 정보를 이용한 약한 지도 학습을 다룬다. 대부분의 카테로기 정보를 활용하는 방법들은 학습된 분류기로부터 얻어진 기여도맵 (attribution map) 을 활용하지만, 이들은 물체의 일부만을 찾아내는 문제가 있다. 우리는 이 문제에 대한 근본 원인을 이론적인 관점에서 의논하고, 이 문제를 해결할 수 있는 세 가지의 방법론을 제안한다. 하지만, 물체의 카테고리 정보만 활용하게 되면 이미지의 전경과 배경이 악의적인 상관관계를 가진다고 잘 알려져 있다. 우리는 이러한 상관관계를 학습 외 분포 데이터를 활용하여 완화한다. 마지막으로, 물체의 카테고리 정보에 기반한 방법론들은 같은 카테고리의 다른 물체를 분리하지 못하기 때문에 인스턴스 분할 (instance segmentation) 에 적용되기는 힘들다. 따라서 물체의 박스 레이블을 활용한 약한 지도학습 방법론을 제안한다. 제안된 방법론을 통해 레이블을 제작하는 시간을 획기적으로 줄일 수 있다는 것을 실험결과를 통해 확인했다. 어려운 데이터셋인 Pascal VOC 에 대해 우리는 91%의 데이터 비용을 감소하면서, 강한 레이블로 학습된 비교군의 89%의 성능을 달성하였다. 또한, 물체의 박스 정보를 활용해서는 83% 의 데이터 비용을 감소하면서, 강한 레이블로 학습된 비교군의 96%의 성능을 달성하였다. 본 학위논문에서 제안된 방법론들이 딥러닝 기반의 물체 인식이 다양한 데이터와 다양한 환경에서 활용되는 데에 있어 도움이 되기를 기대한다.Advances in deep neural network approaches have produced tremendous progress in object recognition tasks, but it has come at the cost of annotating a huge amount of training images with explicit localization cues. To use object recognition tasks in real-life applications requires a large variety of object classes and a great deal of labeled data for each class. However, labeling pixel-level annotations of each object class is laborious, and hampers the expansion of object classes. The need for such expensive annotations is sidestepped by weakly supervised learning, in which a DNN is trained on images with some form of abbreviated annotation that is cheaper than explicit localization cues. In the dissertation, we study the methods of using various form of weak supervision, i.e., image-level class labels, out-of-distribution data, and bounding box labels. We first study image-level class labels for weakly supervised semantic segmentation. Most of the weakly supervised methods on image-level class labels depend on attribution maps from a trained classifier, but their focus tends to be restricted to a small discriminative region of the target object. We theoretically discuss the root cause of this problem, and propose three novel techniques to address this issue. However, built on class labels only, the produced localization maps are known to suffer from the confusion between foreground and background cues, i.e., spurious correlation. We address the spurious correlation problem by utilizing out-of-distribution data. Finally, methods based on class labels cannot separate different instance objects of the same class, which is essential for instance segmentation. Therefore, we utilize bounding box labels for weakly supervised instance segmentation as boxes provide information about individual objects and their locations. Experimental results show that annotation cost for learning semantic segmentation and instance segmentation can be significantly reduced: On the challenging Pascal VOC dataset, we have achieved 89% of the performance of the fully supervised equivalent by using only class labels, which reduces the label cost by 91%. In addition, we have achieved 96% of the performance of the fully supervised equivalent by using bounding box labels, which reduces the label cost by 83%. We expect that the methods introduced in this dissertation will be helpful for applying deep learning based object recognition tasks in a variety of domains and scenarios.1 Introduction 1 2 Background 8 2.1 Object Recognition 8 2.2 Weak Supervision 13 2.3 Preliminary Algirothms 16 2.3.1 Attribution Methods for Image Classifier 16 2.3.2 Refinement Techniques of Localization Maps 18 3 Learning with Image-Level Class Labels 22 3.1 Introduction 22 3.2 Related Work 23 3.2.1 FickleNet: Stochastic Inference Approach 23 3.2.2 Other Recent Approaches 26 3.3 Anti-Adversarially Manipulated Attribution 28 3.3.1 Adversarial Attack 28 3.3.2 Proposed Method 29 3.3.3 Experiments 33 3.3.4 Discussion 36 3.3.5 Analysis of Results by Class 42 3.4 Reducing Information Bottleneck 46 3.4.1 Information Bottleneck 46 3.4.2 Motivation 47 3.4.3 Proposed Method 49 3.4.4 Experiments 52 3.5 Summary 60 4 Learning with Auxiliary Data 62 4.1 Introduction 62 4.2 Related Work 65 4.3 Methods 66 4.3.1 Collecting the Hard Out-of-Distribution Data 67 4.3.2 Learning with the Hard Out-of-Distribution Data 69 4.3.3 Training Segmentation Networks 71 4.4 Experiments 73 4.4.1 Experimental Setup 73 4.4.2 Experimental Results 73 4.4.3 Analysis and Discussion 76 4.5 Analysis of OoD Collection Process 81 4.6 Integrating Proposed Methods 82 4.7 Summary 83 5 Learning with Bounding Box Labels 85 5.1 Introduction 85 5.2 Related Work 87 5.3 Methods 89 5.3.1 Revisiting Object Detectors 89 5.3.2 Bounding Box Attribution Map 90 5.3.3 Training the Segmentation Network 91 5.4 Experiments 93 5.4.1 Experimental Setup 93 5.4.2 Weakly Supervised Instance Segmentation 94 5.4.3 Weakly Supervised Semantic Segmentation 96 5.4.4 Ablation Study 98 5.5 Detailed Analysis of the BBAM 100 5.6 Summary 104 6 Conclusion 105 6.1 Dissertation Summary 105 6.2 Limitations and Future Direction 107 Abstract (In Korean) 133박

    Learning image‐text associations

    Get PDF

    Data integration for the analysis of uncharacterized proteins in Mycobacterium tuberculosis

    Get PDF
    Includes abstract.Includes bibliographical references (leaves 126-150).Mycobacterium tuberculosis is a bacterial pathogen that causes tuberculosis, a leading cause of human death worldwide from infectious diseases, especially in Africa. Despite enormous advances achieved in recent years in controlling the disease, tuberculosis remains a public health challenge. The contribution of existing drugs is of immense value, but the deadly synergy of the disease with Human Immunodeficiency Virus (HIV) or Acquired Immunodeficiency Syndrome (AIDS) and the emergence of drug resistant strains are threatening to compromise gains in tuberculosis control. In fact, the development of active tuberculosis is the outcome of the delicate balance between bacterial virulence and host resistance, which constitute two distinct and independent components. Significant progress has been made in understanding the evolution of the bacterial pathogen and its interaction with the host. The end point of these efforts is the identification of virulence factors and drug targets within the bacterium in order to develop new drugs and vaccines for the eradication of the disease

    Securing Federated Sensitive Topic Classification against Poisoning Attacks

    Full text link
    We present a Federated Learning (FL) based solution for building a distributed classifier capable of detecting URLs containing GDPR-sensitive content related to categories such as health, sexual preference, political beliefs, etc. Although such a classifier addresses the limitations of previous offline/centralised classifiers,it is still vulnerable to poisoning attacks from malicious users that may attempt to reduce the accuracy for benign users by disseminating faulty model updates. To guard against this, we develop a robust aggregation scheme based on subjective logic and residual-based attack detection. Employing a combination of theoretical analysis, trace-driven simulation, as well as experimental validation with a prototype and real users, we show that our classifier can detect sensitive content with high accuracy, learn new labels fast, and remain robust in view of poisoning attacks from malicious users, as well as imperfect input from non-malicious ones

    Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

    Full text link
    While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.Comment: work in progress; 32 page
    corecore