7 research outputs found
TabText: A Flexible and Contextual Approach to Tabular Data Representation
Tabular data is essential for applying machine learning tasks across various
industries. However, traditional data processing methods do not fully utilize
all the information available in the tables, ignoring important contextual
information such as column header descriptions. In addition, pre-processing
data into a tabular format can remain a labor-intensive bottleneck in model
development. This work introduces TabText, a processing and feature extraction
framework that extracts contextual information from tabular data structures.
TabText addresses processing difficulties by converting the content into
language and utilizing pre-trained large language models (LLMs). We evaluate
our framework on nine healthcare prediction tasks ranging from patient
discharge, ICU admission, and mortality. We show that 1) applying our TabText
framework enables the generation of high-performing and simple machine learning
baseline models with minimal data pre-processing, and 2) augmenting
pre-processed tabular data with TabText representations improves the average
and worst-case AUC performance of standard machine learning models by as much
as 6%
Interconnected Microphysiological Systems for Quantitative Biology and Pharmacology Studies
Microphysiological systems (MPSs) are in vitro models that capture facets of in vivo organ function through use of specialized culture microenvironments, including 3D matrices and microperfusion. Here, we report an approach to co-culture multiple different MPSs linked together physiologically on re-useable, open-system microfluidic platforms that are compatible with the quantitative study of a range of compounds, including lipophilic drugs. We describe three different platform designs - "4-way", "7-way", and "10-way" - each accommodating a mixing chamber and up to 4, 7, or 10 MPSs. Platforms accommodate multiple different MPS flow configurations, each with internal re-circulation to enhance molecular exchange, and feature on-board pneumatically-driven pumps with independently programmable flow rates to provide precise control over both intra- and inter-MPS flow partitioning and drug distribution. We first developed a 4-MPS system, showing accurate prediction of secreted liver protein distribution and 2-week maintenance of phenotypic markers. We then developed 7-MPS and 10-MPS platforms, demonstrating reliable, robust operation and maintenance of MPS phenotypic function for 3 weeks (7-way) and 4 weeks (10-way) of continuous interaction, as well as PK analysis of diclofenac metabolism. This study illustrates several generalizable design and operational principles for implementing multi-MPS "physiome-on-a-chip" approaches in drug discovery.United States. Army Research Office (Grant W911NF-12-2-0039
A deep learning approach to programmable RNA switches
Engineered RNA elements are programmable tools capable of detecting small molecules, proteins, and nucleic acids. Predicting the behavior of these synthetic biology components remains a challenge, a situation that could be addressed through enhanced pattern recognition from deep learning. Here, we investigate Deep Neural Networks (DNN) to predict toehold switch function as a canonical riboswitch model in synthetic biology. To facilitate DNN training, we synthesize and characterize in vivo a dataset of 91,534 toehold switches spanning 23 viral genomes and 906 human transcription factors. DNNs trained on nucleotide sequences outperform (R = 0.43–0.70) previous state-of-the-art thermodynamic and kinetic models (R = 0.04–0.15) and allow for human-understandable attention-visualizations (VIS4Map) to identify success and failure modes. This work shows that deep learning approaches can be used for functionality predictions and insight generation in RNA synthetic biology. 2