26 research outputs found

    Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach

    Full text link
    Given restrictions on the availability of data, active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label. Although selecting the most useful points for training is an optimization problem, the scale of deep learning data sets forces most selection strategies to employ efficient heuristics. Instead, we propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool. We demonstrate that this problem can be tractably solved with a Generalized Benders Decomposition algorithm. Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool. Numerical results on several data sets show that our optimization approach is competitive with baselines and particularly outperforms them in the low budget regime where less than one percent of the data set is labeled

    Optimizing Data Collection for Machine Learning

    Full text link
    Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect. Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay workflows. We propose a new paradigm for modeling the data collection workflow as a formal optimal data collection problem that allows designers to specify performance targets, collection costs, a time horizon, and penalties for failing to meet the targets. Additionally, this formulation generalizes to tasks requiring multiple data sources, such as labeled and unlabeled data used in semi-supervised learning. To solve our problem, we develop Learn-Optimize-Collect (LOC), which minimizes expected future collection costs. Finally, we numerically compare our framework to the conventional baseline of estimating data requirements by extrapolating from neural scaling laws. We significantly reduce the risks of failing to meet desired performance targets on several classification, segmentation, and detection tasks, while maintaining low total collection costs.Comment: Accepted to NeurIPS 202

    Serum Myelin Oligodendrocyte Glycoprotein and Myelin Protein Zero as Diagnostic Biomarkers in Diabetic Neuropathy

    Get PDF
    Background: Diabetic neuropathy can affect any peripheral nerve, including sensory neurons, motor neurons, and the autonomic nervous system. Therefore, diabetic neuropathy has the potential to affect essentially any organ and can affect parts of the nervous system like the optic nerve, spinal cord, and brain. In addition, chronic hyperglycemia affects Schwann cells, and more severe patterns of diabetic neuropathy in humans involve demyelization. Schwann cell destruction might cause a number of changes in the axon. study aims to evaluate serum myelin protein level as a predicting marker in the diagnosis of diabetic neuropathy and to prevent early neuropathy complications of type 2 diabetes. Subjects and methods: To achieve the purpose of the objective, this study involved 120 individuals divided into three groups. The first group included 40 healthy individuals; the second group included 40 type 2 diabetic patients with a diabetes duration of more than 5 years; and the last group included 40 type 2 diabetic patients with a diabetes duration of less than or equal to 5 years. The enzyme-linked immunesorbent assay (ELISA) system is used to detect serum MOG and MPZ. Results: both groups of type 2 diabetes patients had significant (p≤ 0.05) increases in serum myelin protein zero P0 (MPZ) and myelin oligodendrocyte glycoprotein (MOG). Conclusion: According to the results, myelin protein can be used to diagnose patients with diabetic neuropathy at an early stage. But it did not rise to the level of a biomarker due to a lack of sensitivit

    Diving into the vertical dimension of elasmobranch movement ecology

    Get PDF
    Knowledge of the three-dimensional movement patterns of elasmobranchs is vital to understand their ecological roles and exposure to anthropogenic pressures. To date, comparative studies among species at global scales have mostly focused on horizontal movements. Our study addresses the knowledge gap of vertical movements by compiling the first global synthesis of vertical habitat use by elasmobranchs from data obtained by deployment of 989 biotelemetry tags on 38 elasmobranch species. Elasmobranchs displayed high intra- and interspecific variability in vertical movement patterns. Substantial vertical overlap was observed for many epipelagic elasmobranchs, indicating an increased likelihood to display spatial overlap, biologically interact, and share similar risk to anthropogenic threats that vary on a vertical gradient. We highlight the critical next steps toward incorporating vertical movement into global management and monitoring strategies for elasmobranchs, emphasizing the need to address geographic and taxonomic biases in deployments and to concurrently consider both horizontal and vertical movements
    corecore