2,083 research outputs found
DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
Identification of drug-target interactions (DTIs) plays a key role in drug
discovery. The high cost and labor-intensive nature of in vitro and in vivo
experiments have highlighted the importance of in silico-based DTI prediction
approaches. In several computational models, conventional protein descriptors
are shown to be not informative enough to predict accurate DTIs. Thus, in this
study, we employ a convolutional neural network (CNN) on raw protein sequences
to capture local residue patterns participating in DTIs. With CNN on protein
sequences, our model performs better than previous protein descriptor-based
models. In addition, our model performs better than the previous deep learning
model for massive prediction of DTIs. By examining the pooled convolution
results, we found that our model can detect binding sites of proteins for DTIs.
In conclusion, our prediction model for detecting local residue patterns of
target proteins successfully enriches the protein features of a raw protein
sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure
Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.
The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included
Recommended from our members
INTEGRATION OF INTERNET OF THINGS AND HEALTH RECOMMENDER SYSTEMS
The Internet of Things (IoT) has become a part of our lives and has provided many enhancements to day-to-day living. In this project, IoT in healthcare is reviewed. IoT-based healthcare is utilized in remote health monitoring, observing chronic diseases, individual fitness programs, helping the elderly, and many other healthcare fields. There are three main architectures of smart IoT healthcare: Three-Layer Architecture, Service-Oriented Based Architecture (SoA), and The Middleware-Based IoT Architecture. Depending on the required services, different IoT architecture are being used. In addition, IoT healthcare services, IoT healthcare service enablers, IoT healthcare applications, and IoT healthcare services focusing on Smartwatch are presented in this research. Along with IoT in smart healthcare, Health Recommender Systems integration with IoT is important. Main Recommender Systems including Content-based filtering, Collaborative-based filtering, Knowledge-based filtering, and Hybrid filtering with machine learning algorithms are described for the Health Recommender Systems. In this study, a framework is presented for the IoT-based Health Recommender Systems. Also, a case is investigated on how different algorithms can be used for Recommender Systems and their accuracy levels are presented. Such a framework can help with the health issues, for example, risk of going to see the doctor during pandemic, taking quick actions in any health emergencies, affordability of healthcare services, and enhancing the personal lifestyle using recommendations in non-critical conditions. The proposed framework can necessitate further development of IoT-based Health Recommender Systems so that people can mitigate their medical emergencies and live a healthy life
Applying machine learning for healthcare: A case study on cervical pain assessment with motion capture
Given the exponential availability of data in health centers and the massive sensorization that is expected, there is an increasing need to manage and analyze these data in an effective way. For this purpose, data mining (DM) and machine learning (ML) techniques would be helpful. However, due to the specific characteristics of the field of healthcare, a suitable DM and ML methodology adapted to these particularities is required. The applied methodology must structure the different stages needed for data-driven healthcare, from the acquisition of raw data to decision-making by clinicians, considering the specific requirements of this field. In this paper, we focus on a case study of cervical assessment, where the goal is to predict the potential presence of cervical pain in patients affected with whiplash diseases, which is important for example in insurance-related investigations. By analyzing in detail this case study in a real scenario, we show how taking care of those particularities enables the generation of reliable predictive models in the field of healthcare. Using a database of 302 samples, we have generated several predictive models, including logistic regression, support vector machines, k-nearest neighbors, gradient boosting, decision trees, random forest, and neural network algorithms. The results show that it is possible to reliably predict the presence of cervical pain (accuracy, precision, and recall above 90%). We expect that the procedure proposed to apply ML techniques in the field of healthcare will help technologists, researchers, and clinicians to create more objective systems that provide support to objectify the diagnosis, improve test treatment efficacy, and save resources
Development, validation and application of in-silico methods to predict the macromolecular targets of small organic compounds
Computational methods to predict the macromolecular targets of small organic drugs and drug-like compounds play a key role in early drug discovery and drug repurposing efforts. These methods are developed by building predictive models that aim to learn the relationships between compounds and their targets in order to predict the bioactivity of the compounds.
In this thesis, we analyzed the strategies used to validate target prediction approaches and how current strategies leave crucial questions about performance unanswered. Namely, how does an approach perform on a compound of interest, with its structural specificities, as opposed to the average query compound in the test data? We constructed and present new guidelines on validation strategies to address these short-comings. We then present the development and validation of two ligand-based target prediction approaches: a similarity-based approach and a binary relevance random forest (machine learning) based approach, which have a wide coverage of the target space. Importantly, we applied a new validation protocol to benchmark the performance of these approaches. The approaches were tested under three scenarios: a standard testing scenario with external data, a standard time-split scenario, and a close-to-real-world test scenario. We disaggregated the performance based on the distance of the testing data to the reference knowledge base, giving a more nuanced view of the performance of the approaches. We showed that, surprisingly, the similarity-based approach generally performed better than the machine learning based approach under all testing scenarios, while also having a target coverage which was twice as large.
After validating two target prediction approaches, we present our work on a large-scale application of computational target prediction to curate optimized compound libraries. While screening large collections of compounds against biological targets is key to identifying new bioactivities, it is resource intensive and challenging. Small to medium-sized libraries, that have been optimized to have a higher chance of producing a true hit on an arbitrary target of interest are therefore valuable. We curated libraries of readily purchasable compounds by: i. utilizing property filters to ensure that the compounds have key physicochemical properties and are not overly reactive, ii. applying a similaritybased target prediction method, with a wide target scope, to predict the bioactivities of compounds, and iii. employing a genetic algorithm to select compounds for the library to maximize the biological diversity in the predicted bioactivities. These enriched small to medium-sized compound libraries provide valuable tool compounds to support early drug development and target identification efforts, and have been made available to the community.
The distinctive contributions of this thesis include the development and benchmarking of two ligand-based target prediction approaches under novel validation scenarios, and the application of target prediction to enrich screening libraries with biologically diverse bioactive compounds. We hope that the insights presented in this thesis will help push data driven drug discovery forward.Doktorgradsavhandlin
INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE
Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics.
1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research.
2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS).
3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes.
Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine
Diagnosis and Prognosis of Occupational disorders based on Machine Learn- ing Techniques applied to Occupational Profiles
Work-related disorders have a global influence on people’s well-being and quality of life
and are a financial burden for organizations because they reduce productivity, increase
absenteeism, and promote early retirement. Work-related musculoskeletal disorders, in
particular, represent a significant fraction of the total in all occupational contexts. In
automotive and industrial settings where workers are exposed to work-related muscu-
loskeletal disorders risk factors, occupational physicians are responsible for monitoring
workers’ health protection profiles. Occupational technicians report in the Occupational
Health Protection Profiles database to understand which exposure to occupational work-
related musculoskeletal disorder risk factors should be ensured for a given worker. Occu-
pational Health Protection Profiles databases describe the occupational physician states,
and which exposure the physicians considers necessary to ensure the worker’s health
protection in terms of their functional work ability. The application of Human-Centered
explainable artificial intelligence can support the decision making to go from worker’s
Functional Work Ability to explanations by integrating explainability into medical (re-
striction) and supporting in two decision contexts: prognosis and diagnosis of individual,
work related and organizational risk condition. Although previous machine learning ap-
proaches provided good predictions, their application in an actual occupational setting
is limited because their predictions are difficult to interpret and hence, not actionable.
In this thesis, injured body parts in which the ability changed in a worker’s functional
work ability status are targeted. On the one hand, artificial intelligence algorithms can
help technical teams, occupational physicians, and ergonomists determine a worker’s
workplace risk via the diagnosis and prognosis of body part(s) injuries; on the other hand,
these approaches can help prevent work-related musculoskeletal disorders by identifying
which processes are lacking in working condition improvement and which workplaces
have a better match between the remaining functional work abilities. A sample of 2025
for the prognosis part (from the years of 2019 to 2020) and 7857 for the prognosis part
of Occupational Health Protection Profiles based on Functional Work Ability textual re-
ports in the Portuguese language in automotive industry factory. Machine learning-based Natural Language Processing methods were implemented to extract standardized infor-
mation. The prognosis and diagnosis of Occupational Health Protection Profiles factors
were developed in reliable Human-Centered explainable artificial intelligence system to
promote a trustworthy Human-Centered explainable artificial intelligence system (enti-
tled Industrial microErgo application). The most suitable regression models to predict
the next medical appointment for the injured body regions were the models based on
CatBoost regression, with R square and an RMSLE of 0.84 and 1.23 weeks, respectively.
In parallel, CatBoost’s best regression model for most body parts is the prediction of
the next injured body parts based on these two errors. This information can help tech-
nical industrial teams understand potential risk factors for Occupational Health Protec-
tion Profiles and identify warning signs of the early stages of musculoskeletal disorders.Os transtornos relacionados ao trabalho têm influência global no bem-estar e na quali-
dade de vida das pessoas e são um ônus financeiro para as organizações, pois reduzem a
produtividade, aumentam o absenteísmo e promovem a aposentadoria precoce. Os distúr-
bios osteomusculares relacionados ao trabalho, em particular, representam uma fração
significativa do total em todos os contextos ocupacionais. Em ambientes automotivos e
industriais onde os trabalhadores estão expostos a fatores de risco de distúrbios osteomus-
culares relacionados ao trabalho, os médicos do trabalho são responsáveis por monitorar
os perfis de proteção à saúde dos trabalhadores. Os técnicos do trabalho reportam-se à
base de dados dos Perfis de Proteção da Saúde Ocupacional para compreender quais os
fatores de risco de exposição a perturbações músculo-esqueléticas relacionadas com o tra-
balho que devem ser assegurados para um determinado trabalhador. As bases de dados
de Perfis de Proteção à Saúde Ocupacional descrevem os estados do médico do trabalho
e quais exposições os médicos consideram necessária para garantir a proteção da saúde
do trabalhador em termos de sua capacidade funcional para o trabalho. A aplicação da
inteligência artificial explicável centrada no ser humano pode apoiar a tomada de decisão
para ir da capacidade funcional de trabalho do trabalhador às explicações, integrando a
explicabilidade à médica (restrição) e apoiando em dois contextos de decisão: prognóstico
e diagnóstico da condição de risco individual, relacionado ao trabalho e organizacional .
Embora as abordagens anteriores de aprendizado de máquina tenham fornecido boas pre-
visões, sua aplicação em um ambiente ocupacional real é limitada porque suas previsões
são difíceis de interpretar e portanto, não acionável. Nesta tese, as partes do corpo lesiona-
das nas quais a habilidade mudou no estado de capacidade funcional para o trabalho do
trabalhador são visadas. Por um lado, os algoritmos de inteligência artificial podem aju-
dar as equipes técnicas, médicos do trabalho e ergonomistas a determinar o risco no local
de trabalho de um trabalhador por meio do diagnóstico e prognóstico de lesões em partes
do corpo; por outro lado, essas abordagens podem ajudar a prevenir distúrbios muscu-
loesqueléticos relacionados ao trabalho, identificando quais processos estão faltando na
melhoria das condições de trabalho e quais locais de trabalho têm uma melhor correspon-
dência entre as habilidades funcionais restantes do trabalho. Para esta tese, foi utilizada uma base de dados com Perfis de Proteção à Saúde Ocupacional, que se baseiam em relató-
rios textuais de Aptidão para o Trabalho em língua portuguesa, de uma fábrica da indús-
tria automóvel (Auto Europa). Uma amostra de 2025 ficheiros foi utilizada para a parte de
prognóstico (de 2019 a 2020) e uma amostra de 7857 ficheiros foi utilizada para a parte de
diagnóstico. . Aprendizado de máquina- métodos baseados em Processamento de Lingua-
gem Natural foram implementados para extrair informações padronizadas. O prognóstico
e diagnóstico dos fatores de Perfis de Proteção à Saúde Ocupacional foram desenvolvidos
em um sistema confiável de inteligência artificial explicável centrado no ser humano (inti-
tulado Industrial microErgo application). Os modelos de regressão mais adequados para
prever a próxima consulta médica para as regiões do corpo lesionadas foram os modelos
baseados na regressão CatBoost, com R quadrado e RMSLE de 0,84 e 1,23 semanas, res-
pectivamente. Em paralelo, a previsão das próximas partes do corpo lesionadas com base
nesses dois erros relatados pelo CatBoost como o melhor modelo de regressão para a mai-
oria das partes do corpo. Essas informações podem ajudar as equipes técnicas industriais
a entender os possíveis fatores de risco para os Perfis de Proteção à Saúde Ocupacio-
nal e identificar sinais de alerta dos estágios iniciais de distúrbios musculoesqueléticos
- …