17,041 research outputs found
UniverSeg: Universal Medical Image Segmentation
While deep learning models have become the predominant method for medical
image segmentation, they are typically not capable of generalizing to unseen
segmentation tasks involving new anatomies, image modalities, or labels. Given
a new segmentation task, researchers generally have to train or fine-tune
models, which is time-consuming and poses a substantial barrier for clinical
researchers, who often lack the resources and expertise to train neural
networks. We present UniverSeg, a method for solving unseen medical
segmentation tasks without additional training. Given a query image and example
set of image-label pairs that define a new segmentation task, UniverSeg employs
a new Cross-Block mechanism to produce accurate segmentation maps without the
need for additional training. To achieve generalization to new tasks, we have
gathered and standardized a collection of 53 open-access medical segmentation
datasets with over 22,000 scans, which we refer to as MegaMedical. We used this
collection to train UniverSeg on a diverse set of anatomies and imaging
modalities. We demonstrate that UniverSeg substantially outperforms several
related methods on unseen tasks, and thoroughly analyze and draw insights about
important aspects of the proposed system. The UniverSeg source code and model
weights are freely available at https://universeg.csail.mit.eduComment: Victor and Jose Javier contributed equally to this work. Project
Website: https://universeg.csail.mit.ed
Neural Architecture Search: Insights from 1000 Papers
In the past decade, advances in deep learning have resulted in breakthroughs
in a variety of areas, including computer vision, natural language
understanding, speech recognition, and reinforcement learning. Specialized,
high-performing neural architectures are crucial to the success of deep
learning in these areas. Neural architecture search (NAS), the process of
automating the design of neural architectures for a given task, is an
inevitable next step in automating machine learning and has already outpaced
the best human-designed architectures on many tasks. In the past few years,
research in NAS has been progressing rapidly, with over 1000 papers released
since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized
and comprehensive guide to neural architecture search. We give a taxonomy of
search spaces, algorithms, and speedup techniques, and we discuss resources
such as benchmarks, best practices, other surveys, and open-source libraries
Qluster: An easy-to-implement generic workflow for robust clustering of health data
The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional biostatistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant variability in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this article proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow makes a compromise between (1) genericity of applications (e.g. usable on small or big data, on continuous, categorical or mixed variables, on database of high-dimensionality or not), (2) ease of implementation (need for few packages, few algorithms, few parameters, ...), and (3) robustness (e.g. use of proven algorithms and robust packages, evaluation of the stability of clusters, management of noise and multicollinearity). This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. It can be useful both for data scientists with little experience in the field to make data clustering easier and more robust, and for more experienced data scientists who are looking for a straightforward and reliable solution to routinely perform preliminary data mining. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors
A Reinforcement Learning-assisted Genetic Programming Algorithm for Team Formation Problem Considering Person-Job Matching
An efficient team is essential for the company to successfully complete new
projects. To solve the team formation problem considering person-job matching
(TFP-PJM), a 0-1 integer programming model is constructed, which considers both
person-job matching and team members' willingness to communicate on team
efficiency, with the person-job matching score calculated using intuitionistic
fuzzy numbers. Then, a reinforcement learning-assisted genetic programming
algorithm (RL-GP) is proposed to enhance the quality of solutions. The RL-GP
adopts the ensemble population strategies. Before the population evolution at
each generation, the agent selects one from four population search modes
according to the information obtained, thus realizing a sound balance of
exploration and exploitation. In addition, surrogate models are used in the
algorithm to evaluate the formation plans generated by individuals, which
speeds up the algorithm learning process. Afterward, a series of comparison
experiments are conducted to verify the overall performance of RL-GP and the
effectiveness of the improved strategies within the algorithm. The
hyper-heuristic rules obtained through efficient learning can be utilized as
decision-making aids when forming project teams. This study reveals the
advantages of reinforcement learning methods, ensemble strategies, and the
surrogate model applied to the GP framework. The diversity and intelligent
selection of search patterns along with fast adaptation evaluation, are
distinct features that enable RL-GP to be deployed in real-world enterprise
environments.Comment: 16 page
Approaches to Improving the Accuracy of Machine Learning Models in Requirements Elicitation Techniques Selection
Selecting techniques is a crucial element of the business analysis approach
planning in IT projects. Particular attention is paid to the choice of
techniques for requirements elicitation. One of the promising methods for
selecting techniques is using machine learning algorithms trained on the
practitioners' experience considering different projects' contexts. The
effectiveness of ML models is significantly affected by the balance of the
training dataset, which is violated in the case of popular techniques. The
paper aims to analyze the efficiency of the Synthetic Minority Over-sampling
Technique usage in Machine Learning models for elicitation technique selection
in case of the imbalanced training dataset and possible ways for positive
feature importance selection. The computational experiment results confirmed
the effectiveness of using the proposed approaches to improve the accuracy of
machine learning models for selecting requirements elicitation techniques.
Proposed approaches can be used to build Machine Learning models for business
analysis activities planning in IT projects
MuRAL: Multi-Scale Region-based Active Learning for Object Detection
Obtaining large-scale labeled object detection dataset can be costly and
time-consuming, as it involves annotating images with bounding boxes and class
labels. Thus, some specialized active learning methods have been proposed to
reduce the cost by selecting either coarse-grained samples or fine-grained
instances from unlabeled data for labeling. However, the former approaches
suffer from redundant labeling, while the latter methods generally lead to
training instability and sampling bias. To address these challenges, we propose
a novel approach called Multi-scale Region-based Active Learning (MuRAL) for
object detection. MuRAL identifies informative regions of various scales to
reduce annotation costs for well-learned objects and improve training
performance. The informative region score is designed to consider both the
predicted confidence of instances and the distribution of each object category,
enabling our method to focus more on difficult-to-detect classes. Moreover,
MuRAL employs a scale-aware selection strategy that ensures diverse regions are
selected from different scales for labeling and downstream finetuning, which
enhances training stability. Our proposed method surpasses all existing
coarse-grained and fine-grained baselines on Cityscapes and MS COCO datasets,
and demonstrates significant improvement in difficult category performance
Model Diagnostics meets Forecast Evaluation: Goodness-of-Fit, Calibration, and Related Topics
Principled forecast evaluation and model diagnostics are vital in fitting probabilistic models and forecasting outcomes of interest. A common principle is that fitted or predicted distributions ought to be calibrated, ideally in the sense that the outcome is indistinguishable from a random draw from the posited distribution. Much of this thesis is centered on calibration properties of various types of forecasts.
In the first part of the thesis, a simple algorithm for exact multinomial goodness-of-fit tests is proposed. The algorithm computes exact -values based on various test statistics, such as the log-likelihood ratio and Pearson\u27s chi-square. A thorough analysis shows improvement on extant methods. However, the runtime of the algorithm grows exponentially in the number of categories and hence its use is limited.
In the second part, a framework rooted in probability theory is developed, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. Based on a general notion of conditional T-calibration, the thesis introduces population versions of T-reliability diagrams and revisits a score decomposition into measures of miscalibration, discrimination, and uncertainty. Stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, a universal coefficient of determination is introduced that nests and reinterprets the classical in least squares regression.
In the third part, probabilistic top lists are proposed as a novel type of prediction in classification, which bridges the gap between single-class predictions and predictive distributions. The probabilistic top list functional is elicited by strictly consistent evaluation metrics, based on symmetric proper scoring rules, which admit comparison of various types of predictions
Evolutionary Computation in Action: Feature Selection for Deep Embedding Spaces of Gigapixel Pathology Images
One of the main obstacles of adopting digital pathology is the challenge of
efficient processing of hyperdimensional digitized biopsy samples, called whole
slide images (WSIs). Exploiting deep learning and introducing compact WSI
representations are urgently needed to accelerate image analysis and facilitate
the visualization and interpretability of pathology results in a postpandemic
world. In this paper, we introduce a new evolutionary approach for WSI
representation based on large-scale multi-objective optimization (LSMOP) of
deep embeddings. We start with patch-based sampling to feed KimiaNet , a
histopathology-specialized deep network, and to extract a multitude of feature
vectors. Coarse multi-objective feature selection uses the reduced search space
strategy guided by the classification accuracy and the number of features. In
the second stage, the frequent features histogram (FFH), a novel WSI
representation, is constructed by multiple runs of coarse LSMOP. Fine
evolutionary feature selection is then applied to find a compact (short-length)
feature vector based on the FFH and contributes to a more robust deep-learning
approach to digital pathology supported by the stochastic power of evolutionary
algorithms. We validate the proposed schemes using The Cancer Genome Atlas
(TCGA) images in terms of WSI representation, classification accuracy, and
feature quality. Furthermore, a novel decision space for multicriteria decision
making in the LSMOP field is introduced. Finally, a patch-level visualization
approach is proposed to increase the interpretability of deep features. The
proposed evolutionary algorithm finds a very compact feature vector to
represent a WSI (almost 14,000 times smaller than the original feature vectors)
with 8% higher accuracy compared to the codes provided by the state-of-the-art
methods
Assessing performance of artificial neural networks and re-sampling techniques for healthcare datasets.
Re-sampling methods to solve class imbalance problems have shown to improve classification accuracy by mitigating the bias introduced by differences in class size. However, it is possible that a model which uses a specific re-sampling technique prior to Artificial neural networks (ANN) training may not be suitable for aid in classifying varied datasets from the healthcare industry. Five healthcare-related datasets were used across three re-sampling conditions: under-sampling, over-sampling and combi-sampling. Within each condition, different algorithmic approaches were applied to the dataset and the results were statistically analysed for a significant difference in ANN performance. The combi-sampling condition showed that four out of the five datasets did not show significant consistency for the optimal re-sampling technique between the f1-score and Area Under the Receiver Operating Characteristic Curve performance evaluation methods. Contrarily, the over-sampling and under-sampling condition showed all five datasets put forward the same optimal algorithmic approach across performance evaluation methods. Furthermore, the optimal combi-sampling technique (under-, over-sampling and convergence point), were found to be consistent across evaluation measures in only two of the five datasets. This study exemplifies how discrete ANN performances on datasets from the same industry can occur in two ways: how the same re-sampling technique can generate varying ANN performance on different datasets, and how different re-sampling techniques can generate varying ANN performance on the same dataset
Recommended from our members
CBASH: Combined Backbone and Advanced Selection Heads with Object Semantic Proposals for Weakly Supervised Object Detection
10.13039/501100001809-National Natural Science Foundation of China (Grant Number: 61971079 and U21A20447); Basic Research and Frontier Exploration Project of Chongqing (Grant Number: cstc2019jcyj- msxmX0666); National Key Research and Development Program of China (Grant Number: 2019YFC1511300);
Regional Creative Cooperation Program of Sichuan (Grant Number: 2020YFQ0025); Innovative Group Project of the National Natural Science Foundation of Chongqing (Grant Number: cstc2020jcyj-cxttX0002)
- …