121 research outputs found

    How standardization of the pre-analytical phase of both research and diagnostic biomaterials can increase reproducibility of biomedical research and diagnostics.

    Get PDF
    Comparison of published biomedical studies shows that a large proportion are irreproducible, causing severe damage to society and creating an image of wasted investments. These observations are of course damaging to the biomedical research field, which is currently full of future promise. Precision medicine and disease prevention are successful, but are progressing slowly due to irreproducible study results. Although standardization is mentioned as a possible solution, it is not always clear how this could decrease or prevent irreproducible results in biomedical studies. In this article more insight is given into what quality, norms, standardization, certification, accreditation and optimized infrastructure can accomplish to reveal causes of irreproducibility and increase reproducibility when collecting biomaterials. CEN and ISO standards for the sample pre-analytical phase are currently being developed with the support of the SPIDIA4P project, and their role in increasing reproducibility in both biomedical research and diagnostics is demonstrated. In particular, it is described how standardized methods and quality assurance documentation can be exploited as tools for: 1) recognition and rejection of 'not fit for purpose' samples on the basis of detailed sample metadata, and 2) identification of methods that contribute to irreproducibility which can be adapted or replaced

    eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

    Full text link
    Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering, a form of non-linear quantization, is one of the leading candidates for LLM compression, and supported by modern smartphones. Yet, its training overhead is prohibitively significant for LLM fine-tuning. Especially, Differentiable KMeans Clustering, or DKM, has shown the state-of-the-art trade-off between compression ratio and accuracy regression, but its large memory complexity makes it nearly impossible to apply to train-time LLM compression. In this paper, we propose a memory-efficient DKM implementation, eDKM powered by novel techniques to reduce the memory footprint of DKM by orders of magnitudes. For a given tensor to be saved on CPU for the backward pass of DKM, we compressed the tensor by applying uniquification and sharding after checking if there is no duplicated tensor previously copied to CPU. Our experimental results demonstrate that \prjname can fine-tune and compress a pretrained LLaMA 7B model from 12.6 GB to 2.5 GB (3bit/weight) with the Alpaca dataset by reducing the train-time memory footprint of a decoder layer by 130×\times, while delivering good accuracy on broader LLM benchmarks (i.e., 77.7\% for PIQA, 66.1\% for Winograde, and so on).Comment: preprin

    Impact of the pre-examination phase on multicenter metabolomic studies

    Get PDF
    The development of metabolomics in clinical applications has been limited by the lack of validation in large multicenter studies. Large population cohorts and their biobanks are a valuable resource for acquiring insights into molecular disease mechanisms. Nevertheless, most of their collections are not tailored for metabolomics and have been created without specific attention to the pre-analytical requirements for high-quality metabolome assessment. Thus, comparing samples obtained by different pre-analytical procedures remains a major challenge. Here, H-1 NMR-based analyses are used to demonstrate how human serum and plasma samples collected with different operating procedures within several large European cohort studies from the Biobanking and Biomolecular Resources Infrastructure - Large Prospective Cohorts (BBMRI-LPC) consortium can be easily revealed by supervised multivariate statistical analyses at the initial stages of the process, to avoid biases in the downstream analysis. The inter-biobank differences are discussed in terms of deviations from the validated CEN/TS 16945:2016 / ISO 23118:2021 norms. It clearly emerges that biobanks must adhere to the evidence-based guidelines in order to support wider-scale application of metabolomics in biomedicine, and that NMR spectroscopy is informative in comparing the quality of different sample sources in multi cohort/center studies.Peer reviewe
    corecore