186 research outputs found

    BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge

    Full text link
    Pre-trained language models like ChatGPT have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks. Moreover, in bioinformatics, generating functional programs poses additional notable challenges due to the amount of domain knowledge, the need for complicated data operations, and intricate functional dependencies between the operations. Here, we present BioCoder, a benchmark developed to evaluate existing pre-trained models in generating bioinformatics code. In relation to function-code generation, BioCoder covers potential package dependencies, class declarations, and global variables. It incorporates 1026 functions and 1243 methods in Python and Java from GitHub and 253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing framework for evaluation, and we have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes the importance of domain knowledge, pragmatic code generation, and contextual understanding. Our dataset, benchmark, Docker images, and scripts required for testing are all available at https://github.com/gersteinlab/biocoder

    LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

    Full text link
    Low-rank adaptations (LoRA) are often employed to fine-tune large language models (LLMs) for new tasks. This paper investigates LoRA composability for cross-task generalization and introduces LoraHub, a strategic framework devised for the purposive assembly of LoRA modules trained on diverse given tasks, with the objective of achieving adaptable performance on unseen tasks. With just a few examples from a novel task, LoraHub enables the fluid combination of multiple LoRA modules, eradicating the need for human expertise. Notably, the composition requires neither additional model parameters nor gradients. Our empirical results, derived from the Big-Bench Hard (BBH) benchmark, suggest that LoraHub can effectively mimic the performance of in-context learning in few-shot scenarios, excluding the necessity of in-context examples alongside each inference input. A significant contribution of our research is the fostering of a community for LoRA, where users can share their trained LoRA modules, thereby facilitating their application to new tasks. We anticipate this resource will widen access to and spur advancements in general intelligence as well as LLMs in production. Code will be available at https://github.com/sail-sg/lorahub.Comment: Work in progress. The first three authors contributed equally to this wor

    Re-using Chip Level DFT at Board Level

    Get PDF
    As chips are getting increasingly complex, there is no surprise to find more and more built-in DFX. This built-in DFT is obviously beneficial for chip/silicon DFX engineers; however, board/system level DFX engineers often have limited access to the build in DFX features. There is currently an increasing demand from board/system level DFX engineers to reuse chip/silicon DFX at board/system level. This special session will discuss: What chip access is needed for board-level for test and diagnosis? How to accomplish the access? Will IEEE P1687 and IEEE 1149.1 solve these problems

    Coastal and Inland Aquatic Data Products for the Hyperspectral Infrared Imager (HyspIRI)

    Get PDF
    The HyspIRI Aquatic Studies Group (HASG) has developed a conceptual list of data products for the HyspIRI mission to support aquatic remote sensing of coastal and inland waters. These data products were based on mission capabilities, characteristics, and expected performance. The topic of coastal and inland water remote sensing is very broad. Thus, this report focuses on aquatic data products to keep the scope of this document manageable. The HyspIRI mission requirements already include the global production of surface reflectance and temperature. Atmospheric correction and surface temperature algorithms, which are critical to aquatic remote sensing, are covered in other mission documents. Hence, these algorithms and their products were not evaluated in this report. In addition, terrestrial products (e.g., land use land cover, dune vegetation, and beach replenishment) were not considered. It is recognized that coastal studies are inherently interdisciplinary across aquatic and terrestrial disciplines. However, products supporting the latter are expected to already be evaluated by other components of the mission. The coastal and inland water data products that were identified by the HASG, covered six major environmental and ecological areas for scientific research and applications: wetlands, shoreline processes, the water surface, the water column, bathymetry and benthic cover types. Accordingly, each candidate product was evaluated for feasibility based on the HyspIRI mission characteristics and whether it was unique and relevant to the HyspIRI science objectives

    The Long-Baseline Neutrino Experiment: Exploring Fundamental Symmetries of the Universe

    Get PDF
    The preponderance of matter over antimatter in the early Universe, the dynamics of the supernova bursts that produced the heavy elements necessary for life and whether protons eventually decay --- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our Universe, its current state and its eventual fate. The Long-Baseline Neutrino Experiment (LBNE) represents an extensively developed plan for a world-class experiment dedicated to addressing these questions. LBNE is conceived around three central components: (1) a new, high-intensity neutrino source generated from a megawatt-class proton accelerator at Fermi National Accelerator Laboratory, (2) a near neutrino detector just downstream of the source, and (3) a massive liquid argon time-projection chamber deployed as a far detector deep underground at the Sanford Underground Research Facility. This facility, located at the site of the former Homestake Mine in Lead, South Dakota, is approximately 1,300 km from the neutrino source at Fermilab -- a distance (baseline) that delivers optimal sensitivity to neutrino charge-parity symmetry violation and mass ordering effects. This ambitious yet cost-effective design incorporates scalability and flexibility and can accommodate a variety of upgrades and contributions. With its exceptional combination of experimental configuration, technical capabilities, and potential for transformative discoveries, LBNE promises to be a vital facility for the field of particle physics worldwide, providing physicists from around the globe with opportunities to collaborate in a twenty to thirty year program of exciting science. In this document we provide a comprehensive overview of LBNE's scientific objectives, its place in the landscape of neutrino physics worldwide, the technologies it will incorporate and the capabilities it will possess.Comment: Major update of previous version. This is the reference document for LBNE science program and current status. Chapters 1, 3, and 9 provide a comprehensive overview of LBNE's scientific objectives, its place in the landscape of neutrino physics worldwide, the technologies it will incorporate and the capabilities it will possess. 288 pages, 116 figure

    Benchmarking spike-based visual recognition: a dataset and evaluation

    Get PDF
    Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organisation have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarks and that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware implementations. With this dataset we hope to (1) promote meaningful comparison between algorithms in the field of neural computation, (2) allow comparison with conventional image recognition methods, (3) provide an assessment of the state of the art in spike-based visual recognition, and (4) help researchers identify future directions and advance the field

    Advanced Technology Large-Aperture Space Telescope (ATLAST): A Technology Roadmap for the Next Decade

    Full text link
    The Advanced Technology Large-Aperture Space Telescope (ATLAST) is a set of mission concepts for the next generation of UVOIR space observatory with a primary aperture diameter in the 8-m to 16-m range that will allow us to perform some of the most challenging observations to answer some of our most compelling questions, including "Is there life elsewhere in the Galaxy?" We have identified two different telescope architectures, but with similar optical designs, that span the range in viable technologies. The architectures are a telescope with a monolithic primary mirror and two variations of a telescope with a large segmented primary mirror. This approach provides us with several pathways to realizing the mission, which will be narrowed to one as our technology development progresses. The concepts invoke heritage from HST and JWST design, but also take significant departures from these designs to minimize complexity, mass, or both. Our report provides details on the mission concepts, shows the extraordinary scientific progress they would enable, and describes the most important technology development items. These are the mirrors, the detectors, and the high-contrast imaging technologies, whether internal to the observatory, or using an external occulter. Experience with JWST has shown that determined competitors, motivated by the development contracts and flight opportunities of the new observatory, are capable of achieving huge advances in technical and operational performance while keeping construction costs on the same scale as prior great observatories.Comment: 22 pages, RFI submitted to Astro2010 Decadal Committe
    • …
    corecore