Search CORE

24 research outputs found

ART: A machine learning Automated Recommendation Tool for synthetic biology

Author: A Espah Borujeni
A Esteva
AJ Jervis
AL Meadows
B Alipanahi
CE Hodgman
CG Begley
CJ Paddon
CJ Petzold
CM Denby
D Wolpert
DE Cameron
E Begoli
EC Hayden
F Pedregosa
F Prinz
G Renouard-Vallet
G Stephanopoulos
HM Salis
HR Beller
I Shaked
J Alonso-Gutierrez
J Alonso-Gutierrez
J Heinemann
J Nielsen
JA Doudna
JA Hoeting
JD Keasling
JM Granda
JV Kurian
K Kyrou
K Le
K Magnuson
L Breiman
M Baker
M HamediRad
M Kosinski
MD McKay
MM Noack
MT Bonde
NI Tracy
P Carbonell
P Opgenorth
PC Gach
PK Ajikumar
S Ma
S Unthan
S Van Dien
T Fuhrer
TS Batth
TS Gardner
V Chubukov
VG Yadav
W Duetz
WC Morrell
Y Chen
Y Yao
Z Costello
ZD Stephens
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Biology has changed radically in the last two decades, transitioning from a descriptive science into a design science. Synthetic biology allows us to bioengineer cells to synthesize novel valuable molecules such as renewable biofuels or anticancer drugs. However, traditional synthetic biology approaches involve ad-hoc engineering practices, which lead to long development times. Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system. Using sampling-based optimization, ART provides a set of recommended strains to be built in the next engineering cycle, alongside probabilistic predictions of their production levels. We demonstrate the capabilities of ART on simulated data sets, as well as experimental data from real metabolic engineering projects producing renewable biofuels, hoppy flavored beer without hops, and fatty acids. Finally, we discuss the limitations of this approach, and the practical consequences of the underlying assumptions failing

arXiv.org e-Print Archive

Crossref

BCAM's Institutional Repository Data

eScholarship - University of California

Recommended from our members

Diversity and scale: Genetic architecture of 2068 traits in the VA Million Veteran Program

Author: Assimes Themistocles L
Begoli Edmon
Bick Alexander G
Brunette Charles A
Cai Tianxi
Carroll Robert J
Casas Juan P
Cho Kelly
Clifford Royce
Cohen Jeremy
Conery Mitchell
Costa Lauren
Damrauer Scott
Davies Laura
Deak Joseph D
Devineni Poornima
Dochtermann Daniel R
Duvall Scott
Garcon Helene
Gaziano J Michael
Gelernter Joel
Goethert Ian
Grant Struan FA
Guare Lindsay
Heise David A
Ho Yuk-Lam
Honerlaw Jacqueline
Huffman Jennifer E
Hung Adriana
Iyengar Sudha K
Joseph Jacob
Justice Amy
Kember Rachel
Kim Youngdae
Kranzler Henry
Kripke Colleen M
Levey Daniel
Liao Katherine P
Linares Franciel
Liu Molei
Luoh Shiuh-Wen
Madduri Ravi K
Merritt Victoria C
Moser Jennifer
Muralidhar Sumitra
Murray Michael
Nandi Tarak Nath
O'Donnell Christopher J
Overstreet Cassie
Panickan Vidul Ayakulangara
Polimanti Renato
Posner Daniel C
Pyarajan Saiju
Ramoni Rachel
Rodriguez Alex
Roussos Panos
Sangar Rahul
Shakt Gabrielle
Shi Yunling
Sun Yan V
Tipton Ryan
Tourassi Georgia
Tsao Noah
Tsao Philip
Venkatesh Sanan
Verma Anurag
Voight Benjamin F
Voloudakis Georgios
Wang Xuan
Whitbourne Stacey
Zhou Wei
Publication venue: eScholarship, University of California
Publication date: 19/07/2024
Field of study

One of the justifiable criticisms of human genetic studies is the underrepresentation of participants from diverse populations. Lack of inclusion must be addressed at-scale to identify causal disease factors and understand the genetic causes of health disparities. We present genome-wide associations for 2068 traits from 635,969 participants in the Department of Veterans Affairs Million Veteran Program, a longitudinal study of diverse United States Veterans. Systematic analysis revealed 13,672 genomic risk loci; 1608 were only significant after including non-European populations. Fine-mapping identified causal variants at 6318 signals across 613 traits. One-third (n = 2069) were identified in participants from non-European populations. This reveals a broadly similar genetic architecture across populations, highlights genetic insights gained from underrepresented groups, and presents an extensive atlas of genetic associations

eScholarship - University of California

Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients

Author: Alina Peluso
Edmon Begoli
Gregory D. Peterson
Ioana Danciu
Maria Mahbub
Sudarshan Srinivasan
Suzanne Tamang
Publication venue: Public Library of Science (PLoS)
Publication date: 06/01/2022
Field of study

Mortality prediction for intensive care unit (ICU) patients is crucial for improving outcomes and efficient utilization of resources. Accessibility of electronic health records (EHR) has enabled data-driven predictive modeling using machine learning. However, very few studies rely solely on unstructured clinical notes from the EHR for mortality prediction. In this work, we propose a framework to predict short, mid, and long-term mortality in adult ICU patients using unstructured clinical notes from the MIMIC III database, natural language processing (NLP), and machine learning (ML) models. Depending on the statistical description of the patients’ length of stay, we define the short-term as 48-hour and 4-day period, the mid-term as 7-day and 10-day period, and the long-term as 15-day and 30-day period after admission. We found that by only using clinical notes within the 24 hours of admission, our framework can achieve a high area under the receiver operating characteristics (AU-ROC) score for short, mid and long-term mortality prediction tasks. The test AU-ROC scores are 0.87, 0.83, 0.83, 0.82, 0.82, and 0.82 for 48-hour, 4-day, 7-day, 10-day, 15-day, and 30-day period mortality prediction, respectively. We also provide a comparative study among three types of feature extraction techniques from NLP: frequency-based technique, fixed embedding-based technique, and dynamic embedding-based technique. Lastly, we provide an interpretation of the NLP-based predictive models using feature-importance scores.</jats:p

Crossref

Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients.

Author: Alina Peluso
Edmon Begoli
Gregory D Peterson
Ioana Danciu
Maria Mahbub
Sudarshan Srinivasan
Suzanne Tamang
Publication venue: Public Library of Science (PLoS)
Publication date: 01/01/2022
Field of study

Mortality prediction for intensive care unit (ICU) patients is crucial for improving outcomes and efficient utilization of resources. Accessibility of electronic health records (EHR) has enabled data-driven predictive modeling using machine learning. However, very few studies rely solely on unstructured clinical notes from the EHR for mortality prediction. In this work, we propose a framework to predict short, mid, and long-term mortality in adult ICU patients using unstructured clinical notes from the MIMIC III database, natural language processing (NLP), and machine learning (ML) models. Depending on the statistical description of the patients' length of stay, we define the short-term as 48-hour and 4-day period, the mid-term as 7-day and 10-day period, and the long-term as 15-day and 30-day period after admission. We found that by only using clinical notes within the 24 hours of admission, our framework can achieve a high area under the receiver operating characteristics (AU-ROC) score for short, mid and long-term mortality prediction tasks. The test AU-ROC scores are 0.87, 0.83, 0.83, 0.82, 0.82, and 0.82 for 48-hour, 4-day, 7-day, 10-day, 15-day, and 30-day period mortality prediction, respectively. We also provide a comparative study among three types of feature extraction techniques from NLP: frequency-based technique, fixed embedding-based technique, and dynamic embedding-based technique. Lastly, we provide an interpretation of the NLP-based predictive models using feature-importance scores

Directory of Open Access Journals

PubMed Central

Game Theoretic Approach for Understanding and Modeling Clinical Pathways (Stable Ischemic Heart Disease)

Author: Aileen Boone
Aneel Advani
Edmon Begoli
Hilda B. Klasky
Mark G. Pleszkoch
Stephan D. Fihn
Publication venue: Office of Scientific and Technical Information (OSTI)
Publication date: 01/10/2018
Field of study

Crossref

Test AU-ROC scores for four models trained with features extracted using PubMedBERT.

Author: Alina Peluso (9638408)
Edmon Begoli (11914244)
Gregory D. Peterson (8093489)
Ioana Danciu (11914241)
Maria Mahbub (11914235)
Sudarshan Srinivasan (11914238)
Suzanne Tamang (11914247)
Publication venue
Publication date: 06/01/2022
Field of study

Test AU-ROC scores for four models trained with features extracted using PubMedBERT.</p

The Francis Crick Institute

Hyperparameter optimization for classification models.

Author: Alina Peluso (9638408)
Edmon Begoli (11914244)
Gregory D. Peterson (8093489)
Ioana Danciu (11914241)
Maria Mahbub (11914235)
Sudarshan Srinivasan (11914238)
Suzanne Tamang (11914247)
Publication venue
Publication date: 06/01/2022
Field of study

Hyperparameter optimization for classification models.</p

The Francis Crick Institute

Test ROC curve, AU-ROC score, sensitivity and specificity scores for logistic regression model with TF-IDF, FastText, and PubMedBERT.

Author: Alina Peluso (9638408)
Edmon Begoli (11914244)
Gregory D. Peterson (8093489)
Ioana Danciu (11914241)
Maria Mahbub (11914235)
Sudarshan Srinivasan (11914238)
Suzanne Tamang (11914247)
Publication venue
Publication date: 06/01/2022
Field of study

The optimum threshold value for sensitivity and specificity scores has been calculated using Youden’s Index. The x-axis represents the prediction window. The grey boxes show the number of deceased and alive patients with respect to the prediction windows.</p

The Francis Crick Institute

Test AU-ROC scores by four models trained with features extracted using TF-IDF.

Author: Alina Peluso (9638408)
Edmon Begoli (11914244)
Gregory D. Peterson (8093489)
Ioana Danciu (11914241)
Maria Mahbub (11914235)
Sudarshan Srinivasan (11914238)
Suzanne Tamang (11914247)
Publication venue
Publication date: 06/01/2022
Field of study

Test AU-ROC scores by four models trained with features extracted using TF-IDF.</p

The Francis Crick Institute

Top 10 most important features that are predictive of the mortality prediction outcome—Alive (green bars), deceased (red bars), by logistic regression model with TF-IDF.

Author: Alina Peluso (9638408)
Edmon Begoli (11914244)
Gregory D. Peterson (8093489)
Ioana Danciu (11914241)
Maria Mahbub (11914235)
Sudarshan Srinivasan (11914238)
Suzanne Tamang (11914247)
Publication venue
Publication date: 06/01/2022
Field of study

Top 10 most important features that are predictive of the mortality prediction outcome—Alive (green bars), deceased (red bars), by logistic regression model with TF-IDF.</p

The Francis Crick Institute