7 research outputs found

    TabText: A Flexible and Contextual Approach to Tabular Data Representation

    Full text link
    Tabular data is essential for applying machine learning tasks across various industries. However, traditional data processing methods do not fully utilize all the information available in the tables, ignoring important contextual information such as column header descriptions. In addition, pre-processing data into a tabular format can remain a labor-intensive bottleneck in model development. This work introduces TabText, a processing and feature extraction framework that extracts contextual information from tabular data structures. TabText addresses processing difficulties by converting the content into language and utilizing pre-trained large language models (LLMs). We evaluate our framework on nine healthcare prediction tasks ranging from patient discharge, ICU admission, and mortality. We show that 1) applying our TabText framework enables the generation of high-performing and simple machine learning baseline models with minimal data pre-processing, and 2) augmenting pre-processed tabular data with TabText representations improves the average and worst-case AUC performance of standard machine learning models by as much as 6%

    Interconnected Microphysiological Systems for Quantitative Biology and Pharmacology Studies

    Get PDF
    Microphysiological systems (MPSs) are in vitro models that capture facets of in vivo organ function through use of specialized culture microenvironments, including 3D matrices and microperfusion. Here, we report an approach to co-culture multiple different MPSs linked together physiologically on re-useable, open-system microfluidic platforms that are compatible with the quantitative study of a range of compounds, including lipophilic drugs. We describe three different platform designs - "4-way", "7-way", and "10-way" - each accommodating a mixing chamber and up to 4, 7, or 10 MPSs. Platforms accommodate multiple different MPS flow configurations, each with internal re-circulation to enhance molecular exchange, and feature on-board pneumatically-driven pumps with independently programmable flow rates to provide precise control over both intra- and inter-MPS flow partitioning and drug distribution. We first developed a 4-MPS system, showing accurate prediction of secreted liver protein distribution and 2-week maintenance of phenotypic markers. We then developed 7-MPS and 10-MPS platforms, demonstrating reliable, robust operation and maintenance of MPS phenotypic function for 3 weeks (7-way) and 4 weeks (10-way) of continuous interaction, as well as PK analysis of diclofenac metabolism. This study illustrates several generalizable design and operational principles for implementing multi-MPS "physiome-on-a-chip" approaches in drug discovery.United States. Army Research Office (Grant W911NF-12-2-0039

    A deep learning approach to programmable RNA switches

    No full text
    Engineered RNA elements are programmable tools capable of detecting small molecules, proteins, and nucleic acids. Predicting the behavior of these synthetic biology components remains a challenge, a situation that could be addressed through enhanced pattern recognition from deep learning. Here, we investigate Deep Neural Networks (DNN) to predict toehold switch function as a canonical riboswitch model in synthetic biology. To facilitate DNN training, we synthesize and characterize in vivo a dataset of 91,534 toehold switches spanning 23 viral genomes and 906 human transcription factors. DNNs trained on nucleotide sequences outperform (R = 0.43–0.70) previous state-of-the-art thermodynamic and kinetic models (R = 0.04–0.15) and allow for human-understandable attention-visualizations (VIS4Map) to identify success and failure modes. This work shows that deep learning approaches can be used for functionality predictions and insight generation in RNA synthetic biology. 2
    corecore