56 research outputs found
Inscribing a discipline: tensions in the field of bioinformatics
Bioinformatics, the application of computer science to biological problems, is a central feature of post-genomic science which grew rapidly during the 1990s and 2000s. Post-genomic science is often high-throughput, involving the mass production of inscriptions [Latour and Woolgar (1986), Laboratory Life: the Construction of Scientific Facts. Princeton, NJ: Princeton University Press]. In order to render these mass inscriptions comprehensible, bioinformatic techniques are employed, with bioinformaticians producing what we call secondary inscriptions. However, despite bioinformaticians being highly skilled and credentialed scientists, the field struggles to develop disciplinary coherence. This paper describes two tensions militating against disciplinary coherence. The first arises from the fact that bioinformaticians as producers of secondary inscriptions are often institutionally dependent, subordinate even, to biologists. With bioinformatics positioned as service, it cannot determine its own boundaries but has them imposed from the outside. The second tension is a result of the interdisciplinary origin of bioinformatics – computer science and biology are disciplines with very different cultures, values and products. The paper uses interview data from two different UK projects to describe and examine these tensions by commenting on Calvert's [(2010) “Systems Biology, Interdisciplinarity and Disciplinary Identity.” In Collaboration in the New Life Sciences, edited by J. N. Parker, N. Vermeulen and B. Penders, 201–219. Farnham: Ashgate] notion of individual and collaborative interdisciplinarity and McNally's [(2008) “Sociomics: CESAGen Multidisciplinary Workshop on the Transformation of Knowledge Production in the Biosciences, and its Consequences.” Proteomics 8: 222–224] distinction between “black box optimists” and “black box pessimists.
Bidirectional Shaping and Spaces of Convergence:Interactions between Biology and Computing from the First DNA Sequencers to Global Genome Databases
Model-Based Deconvolution of Cell Cycle Time-Series Data Reveals Gene Expression Details at High Resolution
In both prokaryotic and eukaryotic cells, gene expression is regulated across the cell cycle to ensure “just-in-time” assembly of select cellular structures and molecular machines. However, present in all time-series gene expression measurements is variability that arises from both systematic error in the cell synchrony process and variance in the timing of cell division at the level of the single cell. Thus, gene or protein expression data collected from a population of synchronized cells is an inaccurate measure of what occurs in the average single-cell across a cell cycle. Here, we present a general computational method to extract “single-cell”-like information from population-level time-series expression data. This method removes the effects of 1) variance in growth rate and 2) variance in the physiological and developmental state of the cell. Moreover, this method represents an advance in the deconvolution of molecular expression data in its flexibility, minimal assumptions, and the use of a cross-validation analysis to determine the appropriate level of regularization. Applying our deconvolution algorithm to cell cycle gene expression data from the dimorphic bacterium Caulobacter crescentus, we recovered critical features of cell cycle regulation in essential genes, including ctrA and ftsZ, that were obscured in population-based measurements. In doing so, we highlight the problem with using population data alone to decipher cellular regulatory mechanisms and demonstrate how our deconvolution algorithm can be applied to produce a more realistic picture of temporal regulation in a cell
Overexpression of SPARC gene in human gastric carcinoma and its clinic–pathologic significance
Temporal Controls of the Asymmetric Cell Division Cycle in Caulobacter crescentus
The asymmetric cell division cycle of Caulobacter crescentus is orchestrated by an elaborate gene-protein regulatory network, centered on three major control proteins, DnaA, GcrA and CtrA. The regulatory network is cast into a quantitative computational model to investigate in a systematic fashion how these three proteins control the relevant genetic, biochemical and physiological properties of proliferating bacteria. Different controls for both swarmer and stalked cell cycles are represented in the mathematical scheme. The model is validated against observed phenotypes of wild-type cells and relevant mutants, and it predicts the phenotypes of novel mutants and of known mutants under novel experimental conditions. Because the cell cycle control proteins of Caulobacter are conserved across many species of alpha-proteobacteria, the model we are proposing here may be applicable to other genera of importance to agriculture and medicine (e.g., Rhizobium, Brucella)
Humanity's Last Exam
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 3,000 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai
Humanity's Last Exam
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 3,000 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai
- …
