639 research outputs found

    M4^4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts

    Full text link
    Medical imaging data is inherently heterogeneous across different modalities and clinical centers, posing unique challenges for developing generalizable foundation models. Conventional entails training distinct models per dataset or using a shared encoder with modality-specific decoders. However, these approaches incur heavy computational overheads and suffer from poor scalability. To address these limitations, we propose the Medical Multimodal Mixture of Experts (M4^4oE) framework, leveraging the SwinUNet architecture. Specifically, M4^4oE comprises modality-specific experts; each separately initialized to learn features encoding domain knowledge. Subsequently, a gating network is integrated during fine-tuning to modulate each expert's contribution to the collective predictions dynamically. This enhances model interpretability and generalization ability while retaining expertise specialization. Simultaneously, the M4^4oE architecture amplifies the model's parallel processing capabilities, and it also ensures the model's adaptation to new modalities with ease. Experiments across three modalities reveal that M4^4oE can achieve 3.45% over STU-Net-L, 5.11% over MED3D, and 11.93% over SAM-Med2D across the MICCAI FLARE22, AMOS2022, and ATLAS2023 datasets. Moreover, M4^4oE showcases a significant reduction in training duration with 7 hours less while maintaining a parameter count that is only 30% of its compared methods. The code is available at https://github.com/JefferyJiang-YF/M4oE

    Flexible Locally Weighted Penalized Regression With Applications on Prediction of Alzheimer's Disease Neuroimaging Initiative's Clinical Scores

    Get PDF
    In recent years, we have witnessed the explosion of large-scale data in various fields. Classical statistical methodologies, such as linear regression or generalized linear regression, often show inadequate performance on heterogeneous data because the key homogeneity assumption fails. In this paper, we present a flexible framework to handle heterogeneous populations that can be naturally grouped into several ordered subtypes. A local model technique utilizing ordinal class labels during the training stage is proposed. We define a new "progression score" that captures the progression of ordinal classes, and use a truncated Gaussian kernel to construct the weight function in a local regression framework. Furthermore, given the weights, we apply sparse shrinkage on the local fitting to handle high dimensionality. In this way, our local model is able to conduct variable selection on each query point. Numerical studies show the superiority of our proposed method over several existing ones. Our method is also applied to the Alzheimer's Disease Neuroimaging Initiative data to make predictions on the longitudinal clinical scores based on different modalities of baseline brain image features

    Graph-guided joint prediction of class label and clinical scores for the Alzheimer’s disease

    Get PDF
    Accurate diagnosis of Alzheimer’s disease and its prodromal stage, i.e., mild cognitive impairment, is very important for early treatment. Over the last decade, various machine learning methods have been proposed to predict disease status and clinical scores from brain images. It is worth noting that many features extracted from brain images are correlated significantly. In this case, feature selection combined with the additional correlation information among features can effectively improve classification/regression performance. Typically, the correlation information among features can be modeled by the connectivity of an undirected graph, where each node represents one feature and each edge indicates that the two involved features are correlated significantly. In this paper, we propose a new graph-guided multi-task learning method incorporating this undirected graph information to predict multiple response variables (i.e., class label and clinical scores) jointly. Specifically, based on the sparse undirected feature graph, we utilize a new latent group Lasso penalty to encourage the correlated features to be selected together. Furthermore, this new penalty also encourages the intrinsic correlated tasks to share a common feature subset. To validate our method, we have performed many numerical studies using simulated datasets and the Alzheimer’s Disease Neuroimaging Initiative dataset. Compared with the other methods, our proposed method has very promising performance

    Optimal Sparse Linear Prediction for Block-missing Multi-modality Data Without Imputation

    Get PDF
    In modern scientific research, data are often collected from multiple modalities. Since different modalities could provide complementary information, statistical prediction methods using multi-modality data could deliver better prediction performance than using single modality data. However, one special challenge for using multi-modality data is related to block-missing data. In practice, due to dropouts or the high cost of measures, the observations of a certain modality can be missing completely for some subjects. In this paper, we propose a new DIrect Sparse regression procedure using COvariance from Multi-modality data (DISCOM). Our proposed DISCOM method includes two steps to find the optimal linear prediction of a continuous response variable using block-missing multi-modality predictors. In the first step, rather than deleting or imputing missing data, we make use of all available information to estimate the covariance matrix of the predictors and the cross-covariance vector between the predictors and the response variable. The proposed new estimate of the covariance matrix is a linear combination of the identity matrix, the estimates of the intra-modality covariance matrix and the cross-modality covariance matrix. Flexible estimates for both the sub-Gaussian and heavy-tailed cases are considered. In the second step, based on the estimated covariance matrix and the estimated cross-covariance vector, an extended Lasso-type estimator is used to deliver a sparse estimate of the coefficients in the optimal linear prediction. The number of samples that are effectively used by DISCOM is the minimum number of samples with available observations from two modalities, which can be much larger than the number of samples with complete observations from all modalities. The effectiveness of the proposed method is demonstrated by theoretical studies, simulated examples, and a real application from the Alzheimer's Disease Neuroimaging Initiative. The comparison between DISCOM and some existing methods also indicates the advantages of our proposed method

    GEB-1.3B: Open Lightweight Large Language Model

    Full text link
    Recently developed large language models (LLMs) such as ChatGPT, Claude, and Llama have demonstrated impressive abilities, and even surpass human-level performance in several tasks. Despite their success, the resource-intensive demands of these models, requiring significant computational power for both training and inference, limit their deployment to high-performance servers. Additionally, the extensive calculation requirements of the models often lead to increased latency in response times. With the increasing need for LLMs to operate efficiently on CPUs, research about lightweight models that are optimized for CPU inference has emerged. In this work, we introduce GEB-1.3B, a lightweight LLM trained on 550 billion tokens in both Chinese and English languages. We employ novel training techniques, including ROPE, Group-Query-Attention, and FlashAttention-2, to accelerate training while maintaining model performance. Additionally, we fine-tune the model using 10 million samples of instruction data to enhance alignment. GEB-1.3B exhibits outstanding performance on general benchmarks such as MMLU, C-Eval, and CMMLU, outperforming comparative models such as MindLLM-1.3B and TinyLLaMA-1.1B. Notably, the FP32 version of GEB-1.3B achieves commendable inference times on CPUs, with ongoing efforts to further enhance speed through advanced quantization techniques. The release of GEB-1.3B as an open-source model marks a significant contribution to the development of lightweight LLMs, promising to foster further research and innovation in the field.Comment: GEB-1.3B technical repor

    Gold on graphene as a substrate for surface enhanced Raman scattering study

    Full text link
    In this paper, we report our study on gold (Au) films with different thicknesses deposited on single layer graphene (SLG) as surface enhanced Raman scattering (SERS) substrates for the characterization of rhodamine (R6G) molecules. We find that an Au film with a thickness of ~7 nm deposited on SLG is an ideal substrate for SERS, giving the strongest Raman signals for the molecules and the weakest photoluminescence (PL) background. While Au films effectively enhance both the Raman and PL signals of molecules, SLG effectively quenches the PL signals from the Au film and molecules. The former is due to the electromagnetic mechanism involved while the latter is due to the strong resonance energy transfer from Au to SLG. Hence, the combination of Au films and SLG can be widely used in the characterization of low concentration molecules with relatively weak Raman signals.Comment: 11 pages, 4 figure

    Consistency of Lloyd's Algorithm Under Perturbations

    Full text link
    In the context of unsupervised learning, Lloyd's algorithm is one of the most widely used clustering algorithms. It has inspired a plethora of work investigating the correctness of the algorithm under various settings with ground truth clusters. In particular, in 2016, Lu and Zhou have shown that the mis-clustering rate of Lloyd's algorithm on nn independent samples from a sub-Gaussian mixture is exponentially bounded after O(log(n))O(\log(n)) iterations, assuming proper initialization of the algorithm. However, in many applications, the true samples are unobserved and need to be learned from the data via pre-processing pipelines such as spectral methods on appropriate data matrices. We show that the mis-clustering rate of Lloyd's algorithm on perturbed samples from a sub-Gaussian mixture is also exponentially bounded after O(log(n))O(\log(n)) iterations under the assumptions of proper initialization and that the perturbation is small relative to the sub-Gaussian noise. In canonical settings with ground truth clusters, we derive bounds for algorithms such as kk-means++++ to find good initializations and thus leading to the correctness of clustering via the main result. We show the implications of the results for pipelines measuring the statistical significance of derived clusters from data such as SigClust. We use these general results to derive implications in providing theoretical guarantees on the misclustering rate for Lloyd's algorithm in a host of applications, including high-dimensional time series, multi-dimensional scaling, and community detection for sparse networks via spectral clustering.Comment: Preprint version
    corecore