188 research outputs found

    ProtiGeno: a prokaryotic short gene finder using protein language models

    Full text link
    Prokaryotic gene prediction plays an important role in understanding the biology of organisms and their function with applications in medicine and biotechnology. Although the current gene finders are highly sensitive in finding long genes, their sensitivity decreases noticeably in finding shorter genes (<180 nts). The culprit is insufficient annotated gene data to identify distinguishing features in short open reading frames (ORFs). We develop a deep learning-based method called ProtiGeno, specifically targeting short prokaryotic genes using a protein language model trained on millions of evolved proteins. In systematic large-scale experiments on 4,288 prokaryotic genomes, we demonstrate that ProtiGeno predicts short coding and noncoding genes with higher accuracy and recall than the current state-of-the-art gene finders. We discuss the predictive features of ProtiGeno and possible limitations by visualizing the three-dimensional structure of the predicted short genes. Data, codes, and models are available at https://github.com/tonytu16/protigeno.Comment: Accepted at the 2023 ICML Workshop on Computational Biolog

    IT Governance and Portfolio Management: An Exploration of the Superior IT Project Investment Portfolios

    Get PDF
    In this study, we explore the characteristics with the IT project investments for improving the IT portfolio superiority. Our methodology is based on the computational modeling approach. The preliminary findings implicate that a firm could manage to improve on the selectivity, heterogeneity, and scalability in the IT project investments for portfolio selection

    Metabolic labelling of cholesteryl glucosides in Helicobacter pylori reveals how the uptake of human lipids enhances bacterial virulence.

    Get PDF
    Helicobacter pylori infects approximately half of the human population and is the main cause of various gastric diseases. This pathogen is auxotrophic for cholesterol, which it converts upon uptake to various cholesteryl α-glucoside derivatives, including cholesteryl 6'-acyl and 6'-phosphatidyl α-glucosides (CAGs and CPGs). Owing to a lack of sensitive analytical methods, it is not known if CAGs and CPGs play distinct physiological roles or how the acyl chain component affects function. Herein we established a metabolite-labelling method for characterising these derivatives qualitatively and quantitatively with a femtomolar detection limit. The development generated an MS/MS database of CGds, allowing for profiling of all the cholesterol-derived metabolites. The subsequent analysis led to the unprecedented information that these bacteria acquire phospholipids from the membrane of epithelial cells for CAG biosynthesis. The resulting increase in longer or/and unsaturated CAG acyl chains helps to promote lipid raft formation and thus delivery of the virulence factor CagA into the host cell, supporting the idea that the host/pathogen interplay enhances bacterial virulence. These findings demonstrate an important connection between the chain length of CAGs and the bacterial pathogenicity

    A comparison of mantle versus involved-field radiotherapy for Hodgkin's lymphoma: reduction in normal tissue dose and second cancer risk

    Get PDF
    BACKGROUND: Hodgkin's lymphoma (HL) survivors who undergo radiotherapy experience increased risks of second cancers (SC) and cardiac sequelae. To reduce such risks, extended-field radiotherapy (RT) for HL has largely been replaced by involved field radiotherapy (IFRT). While it has generally been assumed that IFRT will reduce SC risks, there are few data that quantify the reduction in dose to normal tissues associated with modern RT practice for patients with mediastinal HL, and no estimates of the expected reduction in SC risk. METHODS: Organ-specific dose-volume histograms (DVH) were generated for 41 patients receiving 35 Gy mantle RT, 35 Gy IFRT, or 20 Gy IFRT, and integrated organ mean doses were compared for the three protocols. Organ-specific SC risk estimates were estimated using a dosimetric risk-modeling approach, analyzing DVH data with quantitative, mechanistic models of radiation-induced cancer. RESULTS: Dose reductions resulted in corresponding reductions in predicted excess relative risks (ERR) for SC induction. Moving from 35 Gy mantle RT to 35 Gy IFRT reduces predicted ERR for female breast and lung cancer by approximately 65%, and for male lung cancer by approximately 35%; moving from 35 Gy IFRT to 20 Gy IFRT reduces predicted ERRs approximately 40% more. The median reduction in integral dose to the whole heart with the transition to 35 Gy IFRT was 35%, with a smaller (2%) reduction in dose to proximal coronary arteries. There was no significant reduction in thyroid dose. CONCLUSION: The significant decreases estimated for radiation-induced SC risks associated with modern IFRT provide strong support for the use of IFRT to reduce the late effects of treatment. The approach employed here can provide new insight into the risks associated with contemporary IFRT for HL, and may facilitate the counseling of patients regarding the risks associated with this treatment

    International population-based health surveys linked to outcome data:A new resource for public health and epidemiology

    Get PDF
    Background: National health surveys linked to vital statistics and health care information provide a growing source of individual-level population health data. Pooling linked surveys across jurisdictions would create comprehensive datasets that are larger than most existing cohort studies, and that have a unique international and population perspective. This paper’s objectives are to examine the feasibility of pooling linked population health surveys from three countries, facilitate the examination of health behaviours, and present useful information to assist in the planning of international population health surveillance and research studies. Methods: The design, methodologies and content of the Canadian Community Health Survey (2003 to 2008), the United States National Health Interview Survey (2000, 2005) and the Scottish Health Survey (SHeS) (2003, 2008 to 2010) were examined for comparability and consistency. The feasibility of creating common variables for measuring smoking, alcohol consumption, physical activity and diet was assessed. Sample size and estimated mortality events were collected. Results: The surveys have comparable purposes, designs, sampling and administration methodologies, target populations, exclusions, and content. Similar health behaviour questions allow for comparable variables to be created across the surveys. However, the SHeS uses a more detailed risk factor evaluation for alcohol consumption and diet data. Therefore, comparisons of alcohol consumption and diet data between the SHeS and the other two surveys should be performed with caution. Pooling these linked surveys would create a dataset with over 350,000 participants, 28,424 deaths and over 2.4 million person-years of follow-up. Conclusions: Pooling linked national population health surveys could improve population health research and surveillance. Innovative methodologies must be used to account for survey dissimilarities, and further discussion is needed on how to best access and analyze data across jurisdictions

    SuRVoS: Super-Region Volume Segmentation workbench

    Get PDF
    Segmentation of biological volumes is a crucial step needed to fully analyse their scientific content. Not having access to convenient tools with which to segment or annotate the data means many biological volumes remain under-utilised. Automatic segmentation of biological volumes is still a very challenging research field, and current methods usually require a large amount of manually-produced training data to deliver a high-quality segmentation. However, the complex appearance of cellular features and the high variance from one sample to another, along with the time-consuming work of manually labelling complete volumes, makes the required training data very scarce or non-existent. Thus, fully automatic approaches are often infeasible for many practical applications. With the aim of unifying the segmentation power of automatic approaches with the user expertise and ability to manually annotate biological samples, we present a new workbench named SuRVoS (Super-Region Volume Segmentation). Within this software, a volume to be segmented is first partitioned into hierarchical segmentation layers (named Super-Regions) and is then interactively segmented with the user's knowledge input in the form of training annotations. SuRVoS first learns from and then extends user inputs to the rest of the volume, while using Super-Regions for quicker and easier segmentation than when using a voxel grid. These benefits are especially noticeable on noisy, low-dose, biological datasets

    Validation of Case-Finding Algorithms Derived from Administrative Data for Identifying Adults Living with Human Immunodeficiency Virus Infection

    Get PDF
    OBJECTIVE: We sought to validate a case-finding algorithm for human immunodeficiency virus (HIV) infection using administrative health databases in Ontario, Canada. METHODS: We constructed 48 case-finding algorithms using combinations of physician billing claims, hospital and emergency room separations and prescription drug claims. We determined the test characteristics of each algorithm over various time frames for identifying HIV infection, using data abstracted from the charts of 2,040 randomly selected patients receiving care at two medical practices in Toronto, Ontario as the reference standard. RESULTS: With the exception of algorithms using only a single physician claim, the specificity of all algorithms exceeded 99%. An algorithm consisting of three physician claims over a three year period had a sensitivity and specificity of 96.2% (95% CI 95.2%-97.9%) and 99.6% (95% CI 99.1%-99.8%), respectively. Application of the algorithm to the province of Ontario identified 12,179 HIV-infected patients in care for the period spanning April 1, 2007 to March 31, 2009. CONCLUSIONS: Case-finding algorithms generated from administrative data can accurately identify adults living with HIV. A relatively simple "3 claims in 3 years" definition can be used for assembling a population-based cohort and facilitating future research examining trends in health service use and outcomes among HIV-infected adults in Ontario
    corecore