31 research outputs found
Recommended from our members
Modeling aspects of the language of life through transfer-learning protein sequences
Background
Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome. Both these problems are addressed by the new methodology introduced here.
Results
We introduced a novel way to represent protein sequences as continuous vectors (embeddings) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as SeqVec (Sequence-to-Vector) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although SeqVec embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast HHblits needed on average about two minutes to generate the evolutionary information for a target protein, SeqVec created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases, SeqVec provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis.
Conclusion
Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence
KRASG12C/TP53 co-mutations identify long-term responders to first line palliative treatment with pembrolizumab monotherapy in PD-L1 high (≥50%) lung adenocarcinoma
Background: Pembrolizumab is a standard of care as first line palliative therapy in PD-L1 overexpressing (≥50%) non-small cell lung cancer (NSCLC). This study aimed at the identification of KRAS and TP53-defined mutational subgroups in the PD-L1 high population to distinguish long-term responders from those with limited benefit.
Methods: In this retrospective, observational study, patients from 4 certified lung cancer centers in Berlin, Germany, having received pembrolizumab monotherapy as first line palliative treatment for lung adenocarcinoma (LuAD) from 2017 to 2018, with PD-L1 expression status and targeted NGS data available, were evaluated.
Results: A total of 119 patients were included. Rates for KRAS, TP53 and combined mutations were 52.1%, 47.1% and 21.9%, respectively, with no association given between KRAS and TP53 mutations (P=0.24). By trend, PD-L1 expression was higher in KRAS-positive patients (75% vs. 65%, P=0.13). Objective response rate (ORR), median progression-free survival (PFS) and overall survival (OS) in the KRASG12C group (n=32, 51.6%) were 63.3%, 19.8 months (mo.) and not estimable (NE), respectively. Results in KRASother and wild type patients were similar and by far lower (42.7%, P=0.06; 6.2 mo., P<0.001; 23.4 mo., P=0.08). TP53 mutations alone had no impact on response and survival. However, KRASG12C/TP53 co-mutations (n=12) defined a subset of long-term responders (ORR 100.0%, PFS 33.3 mo., OS NE). In contrast, patients with KRASother/TP53 mutations showed a dismal prognosis (ORR 27.3%, P=0.002; PFS 3.9 mo., P=0.001, OS 9.7 mo., P=0.02).
Conclusions: A comprehensive assessment of KRAS subtypes and TP53 mutations allows a highly relevant prognostic differentiation of patients with metastatic, PD-L1 high LuAD treated upfront with pembrolizumab
QuaDoSta - a freely configurable system which facilitates multi-centric data collection for healthcare and medical research
This article describes QuaDoSta (quality assurance, documentation and statistics), a flexible documentation system as well as a data collection and networking platform for medical facilities. The user can freely define the required documentation masks which are easily expandable and can be adapted to individual requirements without the need for additional programming. To avoid duplication, data transfer interfaces can be configured flexibly to external sources such as patient management systems used in surgeries or hospital information systems. The projects EvaMed (Evaluation Anthroposophical Medicine) and the Network Oncology are two scientific research projects which have been successfully established as nationally active networks on the basis of QuaDoSta. The EvaMed-Network serves as a modern pharmacovigilance project for the documentation of adverse drug events. All prescription data are electronically recorded to assess the relative risk of drugs. The Network Oncology was set up as a documentation system in four hospitals and seven specialist oncology practices where a complete record of all oncological therapies is being carried out to uniform standards on the basis of the ‘basic documentation for tumour patients’ (BDT) developed by the German Cancer Society. The QuaDoSta solution system made it possible to cater for the specific requirements of the presented projects. The following features of the system proved to be highly advantageous: flexible setup of catalogues and user friendly customisation and extensions, complete dissociation of system setup and documentation content, multi-centre networkability, and configurable data transfer interfaces
Overall survival of stage IV non-small cell lung cancer patients treated with Viscum album L. in addition to chemotherapy, a real-world observational multicenter analysis
Background Stage IV non-small cell lung cancer (NSCLC) is associated with a five-year survival rate of around 1%. Treatment with Viscum album L. (VA) extracts has been shown to reduce chemotherapy (CTx)-related adverse events, decrease CTx dose reductions and improve quality of life in a number of cancers. Recent data suggest a beneficial effect of add-on treatment with Viscum album L. (VA, European mistletoe) on survival in cancer patients. The objective of this study was to evaluate the effect of VA in addition to chemotherapy on survival in stage IV NSCLC patients. Methods The observational study was conducted using data from the Network Oncology clinical registry which is an accredited conjoint clinical registry of German oncological hospitals, practitioners and out-patient centers.Patients were included if they had stage IV NSCLC at diagnosis, lived at least for four weeks post-diagnosis and received chemotherapeutic treatment. Patients with EGFR mutations as well as patients receiving tyrosine kinase inhibitors or immune checkpoint inhibitors were not included. Overall survival and impact on hazard in patients with chemotherapy (CTx) to patients receiving CTx plus VA were compared. To identify factors associated with survival and to address potential sources of bias a multivariate analyses using Cox proportional hazard model was performed. Results The median age of the population was 64.1 years with 55.7% male patients. The highest proportion of patients had adenocarcinoma (72.2%) and most of the patients were current or past smokers (70.9%). Of 158 stage IV NSCLC patients, 108 received CTx only and 50 additional VA. Median survival was 17.0 months in the CTx plus VA group (95%CI: 11.0–40.0) and was 8.0 months (95%CI: 7.0–11.0) in the CTx only group (χ = 7.2, p = .007). Overall survival was significantly prolonged in the VA group (HR 0.44, 95%CI: 0.26–0.74, p = .002). One-year and three-year overall survival rates were greater with CTx plus VA compared to CTX alone (1y: 60.2% vs. 35.5%; 3y: 25.7% vs. 14.2%). Conclusion Our findings suggest that concomitant VA is positively associated with survival in stage IV NSCLC patients treated with standard CTx. These findings complement pre-existing knowl-dedge of add-on VA’s clinical impact, however, results should be interpreted with caution in light of the study’s observational character