497 research outputs found
Enhancing clinical potential of liquid biopsy through a multi-omic approach: A systematic review
In the last years, liquid biopsy gained increasing clinical relevance for detecting and monitoring several cancer types, being minimally invasive, highly informative and replicable over time. This revolutionary approach can be complementary and may, in the future, replace tissue biopsy, which is still considered the gold standard for cancer diagnosis. “Classical” tissue biopsy is invasive, often cannot provide sufficient bioptic material for advanced screening, and can provide isolated information about disease evolution and heterogeneity. Recent literature highlighted how liquid biopsy is informative of proteomic, genomic, epigenetic, and metabolic alterations. These biomarkers can be detected and investigated using single-omic and, recently, in combination through multi-omic approaches. This review will provide an overview of the most suitable techniques to thoroughly characterize tumor biomarkers and their potential clinical applications, highlighting the importance of an integrated multi-omic, multi-analyte approach. Personalized medical investigations will soon allow patients to receive predictable prognostic evaluations, early disease diagnosis, and subsequent ad hoc treatments
Data- og ekspertdreven variabelseleksjon for prediktive modeller i helsevesenet : mot økt tolkbarhet i underbestemte maskinlæringsproblemer
Modern data acquisition techniques in healthcare generate large collections of data from multiple sources, such as novel diagnosis and treatment methodologies. Some concrete examples are electronic healthcare record systems, genomics, and medical images. This leads to situations with often unstructured, high-dimensional heterogeneous patient cohort data where classical statistical methods may not be sufficient for optimal utilization of the data and informed decision-making. Instead, investigating such data structures with modern machine learning techniques promises to improve the understanding of patient health issues and may provide a better platform for informed decision-making by clinicians. Key requirements for this purpose include (a) sufficiently accurate predictions and (b) model interpretability. Achieving both aspects in parallel is difficult, particularly for datasets with few patients, which are common in the healthcare domain. In such cases, machine learning models encounter mathematically underdetermined systems and may overfit easily on the training data. An important approach to overcome this issue is feature selection, i.e., determining a subset of informative features from the original set of features with respect to the target variable. While potentially raising the predictive performance, feature selection fosters model interpretability by identifying a low number of relevant model parameters to better understand the underlying biological processes that lead to health issues.
Interpretability requires that feature selection is stable, i.e., small changes in the dataset do not lead to changes in the selected feature set. A concept to address instability is ensemble feature selection, i.e. the process of repeating the feature selection multiple times on subsets of samples of the original dataset and aggregating results in a meta-model. This thesis presents two approaches for ensemble feature selection, which are tailored towards high-dimensional data in healthcare: the Repeated Elastic Net Technique for feature selection (RENT) and the User-Guided Bayesian Framework for feature selection (UBayFS). While RENT is purely data-driven and builds upon elastic net regularized models, UBayFS is a general framework for ensembles with the capabilities to include expert knowledge in the feature selection process via prior weights and side constraints. A case study modeling the overall survival of cancer patients compares these novel feature selectors and demonstrates their potential in clinical practice.
Beyond the selection of single features, UBayFS also allows for selecting whole feature groups (feature blocks) that were acquired from multiple data sources, as those mentioned above. Importance quantification of such feature blocks plays a key role in tracing information about the target variable back to the acquisition modalities. Such information on feature block importance may lead to positive effects on the use of human, technical, and financial resources if systematically integrated into the planning of patient treatment by excluding the acquisition of non-informative features. Since a generalization of feature importance measures to block importance is not trivial, this thesis also investigates and compares approaches for feature block importance rankings.
This thesis demonstrates that high-dimensional datasets from multiple data sources in the medical domain can be successfully tackled by the presented approaches for feature selection. Experimental evaluations demonstrate favorable properties of both predictive performance, stability, as well as interpretability of results, which carries a high potential for better data-driven decision support in clinical practice.Moderne datainnsamlingsteknikker i helsevesenet genererer store datamengder fra flere kilder, som for eksempel nye diagnose- og behandlingsmetoder. Noen konkrete eksempler er elektroniske helsejournalsystemer, genomikk og medisinske bilder. Slike pasientkohortdata er ofte ustrukturerte, høydimensjonale og heterogene og hvor klassiske statistiske metoder ikke er tilstrekkelige for optimal utnyttelse av dataene og god informasjonsbasert beslutningstaking. Derfor kan det være lovende å analysere slike datastrukturer ved bruk av moderne maskinlæringsteknikker for å øke forståelsen av pasientenes helseproblemer og for å gi klinikerne en bedre plattform for informasjonsbasert beslutningstaking. Sentrale krav til dette formålet inkluderer (a) tilstrekkelig nøyaktige prediksjoner og (b) modelltolkbarhet. Å oppnå begge aspektene samtidig er vanskelig, spesielt for datasett med få pasienter, noe som er vanlig for data i helsevesenet. I slike tilfeller må maskinlæringsmodeller håndtere matematisk underbestemte systemer og dette kan lett føre til at modellene overtilpasses treningsdataene. Variabelseleksjon er en viktig tilnærming for å håndtere dette ved å identifisere en undergruppe av informative variabler med hensyn til responsvariablen. Samtidig som variabelseleksjonsmetoder kan lede til økt prediktiv ytelse, fremmes modelltolkbarhet ved å identifisere et lavt antall relevante modellparametere. Dette kan gi bedre forståelse av de underliggende biologiske prosessene som fører til helseproblemer.
Tolkbarhet krever at variabelseleksjonen er stabil, dvs. at små endringer i datasettet ikke fører til endringer i hvilke variabler som velges. Et konsept for å adressere ustabilitet er ensemblevariableseleksjon, dvs. prosessen med å gjenta variabelseleksjon flere ganger på en delmengde av prøvene i det originale datasett og aggregere resultater i en metamodell. Denne avhandlingen presenterer to tilnærminger for ensemblevariabelseleksjon, som er skreddersydd for høydimensjonale data i helsevesenet: "Repeated Elastic Net Technique for feature selection" (RENT) og "User-Guided Bayesian Framework for feature selection" (UBayFS). Mens RENT er datadrevet og bygger på elastic net-regulariserte modeller, er UBayFS et generelt rammeverk for ensembler som muliggjør inkludering av ekspertkunnskap i variabelseleksjonsprosessen gjennom forhåndsbestemte vekter og sidebegrensninger. En case-studie som modellerer overlevelsen av kreftpasienter sammenligner disse nye variabelseleksjonsmetodene og demonstrerer deres potensiale i klinisk praksis.
Utover valg av enkelte variabler gjør UBayFS det også mulig å velge blokker eller grupper av variabler som representerer de ulike datakildene som ble nevnt over. Kvantifisering av viktigheten av variabelgrupper spiller en nøkkelrolle for forståelsen av hvorvidt datakildene er viktige for responsvariablen. Tilgang til slik informasjon kan føre til at bruken av menneskelige, tekniske og økonomiske ressurser kan forbedres dersom informasjonen integreres systematisk i planleggingen av pasientbehandlingen. Slik kan man redusere innsamling av ikke-informative variabler. Siden generaliseringen av viktighet av variabelgrupper ikke er triviell, undersøkes og sammenlignes også tilnærminger for rangering av viktigheten til disse variabelgruppene.
Denne avhandlingen viser at høydimensjonale datasett fra flere datakilder fra det medisinske domenet effektivt kan håndteres ved bruk av variabelseleksjonmetodene som er presentert i avhandlingen. Eksperimentene viser at disse kan ha positiv en effekt på både prediktiv ytelse, stabilitet og tolkbarhet av resultatene. Bruken av disse variabelseleksjonsmetodene bærer et stort potensiale for bedre datadrevet beslutningsstøtte i klinisk praksis
Advances and Applications of DSmT for Information Fusion. Collected Works, Volume 5
This fifth volume on Advances and Applications of DSmT for Information Fusion collects theoretical and applied contributions of researchers working in different fields of applications and in mathematics, and is available in open-access. The collected contributions of this volume have either been published or presented after disseminating the fourth volume in 2015 in international conferences, seminars, workshops and journals, or they are new. The contributions of each part of this volume are chronologically ordered.
First Part of this book presents some theoretical advances on DSmT, dealing mainly with modified Proportional Conflict Redistribution Rules (PCR) of combination with degree of intersection, coarsening techniques, interval calculus for PCR thanks to set inversion via interval analysis (SIVIA), rough set classifiers, canonical decomposition of dichotomous belief functions, fast PCR fusion, fast inter-criteria analysis with PCR, and improved PCR5 and PCR6 rules preserving the (quasi-)neutrality of (quasi-)vacuous belief assignment in the fusion of sources of evidence with their Matlab codes.
Because more applications of DSmT have emerged in the past years since the apparition of the fourth book of DSmT in 2015, the second part of this volume is about selected applications of DSmT mainly in building change detection, object recognition, quality of data association in tracking, perception in robotics, risk assessment for torrent protection and multi-criteria decision-making, multi-modal image fusion, coarsening techniques, recommender system, levee characterization and assessment, human heading perception, trust assessment, robotics, biometrics, failure detection, GPS systems, inter-criteria analysis, group decision, human activity recognition, storm prediction, data association for autonomous vehicles, identification of maritime vessels, fusion of support vector machines (SVM), Silx-Furtif RUST code library for information fusion including PCR rules, and network for ship classification.
Finally, the third part presents interesting contributions related to belief functions in general published or presented along the years since 2015. These contributions are related with decision-making under uncertainty, belief approximations, probability transformations, new distances between belief functions, non-classical multi-criteria decision-making problems with belief functions, generalization of Bayes theorem, image processing, data association, entropy and cross-entropy measures, fuzzy evidence numbers, negator of belief mass, human activity recognition, information fusion for breast cancer therapy, imbalanced data classification, and hybrid techniques mixing deep learning with belief functions as well
Use of omic tools for environmental risk assessment of emerging contaminants in marine species of commercial interest
La presente tesis doctoral busca evaluar los posibles efectos toxicológicos que pueden
tener los contaminantes ‘emergentes’ (CE) en los organismos marinos expuestos a
ellos. En concreto, se han evaluado dos filtros solares, un repelente de insectos y un
biocida, todos ellos presentes en productos de cuidado personal (PCP). Estas
sustancias llegan a los sistemas marinos a través de los vertidos de aguas residuales,
así como las entradas directas procedentes de actividades recreativas, tales como la
natación y el baño. La exposición a estos contaminantes se hace a muy bajas
concentraciones y de forma crónica, por tanto, evaluar sus posibles efectos a nivel
molecular resulta un paso clave para determinar su toxicidad. Las tecnologías “ómicas”
permiten el estudio global de los diferentes niveles biológicos (transcriptoma, proteoma,
metaboloma…) desde un punto de vista holístico e integrado. La integración de datos
ómicos, junto con la fusión de datos de bioconcentración y toxicocinética, es un método
extremadamente útil para dilucidar el modo de acción (MoA) de los contaminantes. De
este modo, en la presente tesis doctoral se ha implementado el uso y la integración de
herramientas ómicas (metabolómica y transcriptómica) para evaluar el efecto
toxicológico de dos filtros solares (4-metilbencilideno alcanfor (4-MBC) y sulisobenzona
(BP-4)), un repelente de insectos (N,N-Dietil-meta-toluamida (DEET)) y un biocida
(triclosán (TCS)) en dos especies marinas de interés comercial, la dorada (Sparus
aurata) y la almeja japonesa (Ruditapes philippinarum). Adicionalmente se realizaron
estudios de bioconcentración y toxicocinética.
Primero, se realizaron siete experimentos de exposición con los contaminantes y
organismos anteriormente mencionados usando un flujo continuo para reproducir
escenarios de exposición ambiental bajo condiciones controladas de laboratorio. Tras la
aplicación de técnicas analíticas y de alto rendimiento, se observó que la
bioacumulación de los contaminantes es superior en la almeja que en los peces. Esto
evidencia la importancia de tener en cuenta organismos de distintos niveles tróficos para
evaluar el potencial de bioacumulación de los contaminantes ambientales. Además, se observó que el filtro solar 4-MBC presentó el factor de bioacumulación (BCF) más alto
(368 565 L Kg -1) y la tasa de eliminación más baja (61.65%) que, junto con la
persistencia de este compuesto en el medio, son probablemente indicativos de un riesgo
potencial para el medio acuático marino que puede llegar a magnificarse a través de la
red alimentaria. También se observó que este compuesto sufría varias
biotransformaciones (reducción, oxidación, hidroxilación…) con el fin de ser excretado,
afectando al metabolismo de las drogas y xenobióticos y al del glutatión, llegando a
producir estrés oxidativo. Por otro lado, se observó que, aunque el TCS y BP-4 tenían
altos BCF en la almeja (1309 y 850 L Kg -1 respectivamente), sus tasas de eliminación
también eran altas (97.12 y 99.99%). Sin embargo, la exposición a estos contaminantes
produjo, entre otros, alteraciones en el metabolismo de los lípidos lo que puede deberse
a su capacidad como disruptores endocrinos. El DEET presentó el BCF más bajo en la
almeja (9.9 L Kg -1) y una alta tasa de eliminación (98.85%). El principal impacto de la
exposición de este compuesto a nivel transcriptómico se observó en el metabolismo de
biodegradación de xenobióticos y en el de los carbohidratos, lo que sugiere que se
estaba realizando un gran consumo energético para poder llevar a cabo la excreción del
compuesto.
En el músculo de la dorada se observó que tanto el DEET como la BP-4 presentaban
bajos BCF (2.6 y 0.7 L Kg -1 respectivamente) mientras que el TCS presentaba un BCF
de 113 L Kg -1. Sin embargo, los análisis transcriptómicos en el hígado de las doradas
revelaron que tras la exposición a DEET y BP-4, se expresaron 250 y 371 genes
diferencialmente, mientras que no se encontraron genes diferencialmente expresados
en la dorada tras la exposición a TCS. Tras la integración con los datos metabolómicos
realizados en las mismas muestras, se determinó que el DEET causaba en la dorada
agotamiento energético a través de la alteración de los metabolismos de carbohidratos
y aminoácidos, estrés oxidativo que daba lugar a daños en el ADN, peroxidación lipídica
y daños en la membrana celular y apoptosis. También se observó la activación del
metabolismo de los xenobióticos, así como una reacción inmune-inflamatoria. Finalmente, la integración de datos multiómicos reveló que la BP-4 causaba en la dorada
impacto en el metabolismo energético y oxidación lipídica, así como, impacto en la
biosíntesis de las hormonas esteroideas y tiroideas y en el metabolismo de los
nucleótidos. En conclusión, esta tesis muestra la utilidad de integrar datos a distintos
niveles biológicos y demuestra que el enfoque multiómico desarrollado y aplicado
supone una gran ventaja para dilucidar modos de acción de los compuestos estudiados
que, con enfoques diferentes, como el uso de una sola herramienta ómica, podrían
haberse pasado por alto
A primer on correlation-based dimension reduction methods for multi-omics analysis
The continuing advances of omic technologies mean that it is now more
tangible to measure the numerous features collectively reflecting the molecular
properties of a sample. When multiple omic methods are used, statistical and
computational approaches can exploit these large, connected profiles.
Multi-omics is the integration of different omic data sources from the same
biological sample. In this review, we focus on correlation-based dimension
reduction approaches for single omic datasets, followed by methods for pairs of
omics datasets, before detailing further techniques for three or more omic
datasets. We also briefly detail network methods when three or more omic
datasets are available and which complement correlation-oriented tools. To aid
readers new to this area, these are all linked to relevant R packages that can
implement these procedures. Finally, we discuss scenarios of experimental
design and present road maps that simplify the selection of appropriate
analysis methods. This review will guide researchers navigate the emerging
methods for multi-omics and help them integrate diverse omic datasets
appropriately and embrace the opportunity of population multi-omics.Comment: 30 pages, 2 figures, 6 table
Polygenic Risk Score for Cardiovascular Diseases in Artificial Intelligence Paradigm
Cardiovascular disease (CVD) related mortality and morbidity heavily strain society. The relationship between external risk factors and our genetics have not been well established. It is widely acknowledged that environmental influence and individual behaviours play a significant role in CVD vulnerability, leading to the development of polygenic risk scores (PRS). We employed the PRISMA search method to locate pertinent research and literature to extensively review artificial intelligence (AI)-based PRS models for CVD risk prediction. Furthermore, we analyzed and compared conventional vs. AI-based solutions for PRS. We summarized the recent advances in our understanding of the use of AI-based PRS for risk prediction of CVD. Our study proposes three hypotheses: i) Multiple genetic variations and risk factors can be incorporated into AI-based PRS to improve the accuracy of CVD risk predicting. ii) AI-based PRS for CVD circumvents the drawbacks of conventional PRS calculators by incorporating a larger variety of genetic and non-genetic components, allowing for more precise and individualised risk estimations. iii) Using AI approaches, it is possible to significantly reduce the dimensionality of huge genomic datasets, resulting in more accurate and effective disease risk prediction models. Our study highlighted that the AI-PRS model outperformed traditional PRS calculators in predicting CVD risk. Furthermore, using AI-based methods to calculate PRS may increase the precision of risk predictions for CVD and have significant ramifications for individualized prevention and treatment plans
Diagnostic classification of childhood cancer using multiscale transcriptomics
The causes of pediatric cancers’ distinctiveness compared to adult-onset tumors of the same type are not completely clear and not fully explained by their genomes. In this study, we used an optimized multilevel RNA clustering approach to derive molecular definitions for most childhood cancers. Applying this method to 13,313 transcriptomes, we constructed a pediatric cancer atlas to explore age-associated changes. Tumor entities were sometimes unexpectedly grouped due to common lineages, drivers or stemness profiles. Some established entities were divided into subgroups that predicted outcome better than current diagnostic approaches. These definitions account for inter-tumoral and intra-tumoral heterogeneity and have the potential of enabling reproducible, quantifiable diagnostics. As a whole, childhood tumors had more transcriptional diversity than adult tumors, maintaining greater expression flexibility. To apply these insights, we designed an ensemble convolutional neural network classifier. We show that this tool was able to match or clarify the diagnosis for 85% of childhood tumors in a prospective cohort. If further validated, this framework could be extended to derive molecular definitions for all cancer types
Polygenic Risk Score for Cardiovascular Diseases in Artificial Intelligence Paradigm: A Review
Cardiovascular disease (CVD) related mortality and morbidity heavily strain society. The relationship between external risk factors and our genetics have not been well established. It is widely acknowledged that environmental influence and individual behaviours play a significant role in CVD vulnerability, leading to the development of polygenic risk scores (PRS). We employed the PRISMA search method to locate pertinent research and literature to extensively review artificial intelligence (AI)-based PRS models for CVD risk prediction. Furthermore, we analyzed and compared conventional vs. AI-based solutions for PRS. We summarized the recent advances in our understanding of the use of AI-based PRS for risk prediction of CVD. Our study proposes three hypotheses: i) Multiple genetic variations and risk factors can be incorporated into AI-based PRS to improve the accuracy of CVD risk predicting. ii) AI-based PRS for CVD circumvents the drawbacks of conventional PRS calculators by incorporating a larger variety of genetic and non-genetic components, allowing for more precise and individualised risk estimations. iii) Using AI approaches, it is possible to significantly reduce the dimensionality of huge genomic datasets, resulting in more accurate and effective disease risk prediction models. Our study highlighted that the AI-PRS model outperformed traditional PRS calculators in predicting CVD risk. Furthermore, using AI-based methods to calculate PRS may increase the precision of risk predictions for CVD and have significant ramifications for individualized prevention and treatment plans
Novel Analytical Methods in Food Analysis
This reprint provides information on the novel analytical methods used to address challenges occurring at academic, regulatory, and commercial level. All topics covered include information on the basic principles, procedures, advantages, limitations, and applications. Integration of biological reagents, (nano)materials, technologies, and physical principles (spectroscopy and spectrometry) are discussed. This reprint is ideal for professionals of the food industry, regulatory bodies, as well as researchers
- …