186 research outputs found
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
Pre-trained language models like ChatGPT have significantly improved code
generation. As these models scale up, there is an increasing need for the
output to handle more intricate tasks. Moreover, in bioinformatics, generating
functional programs poses additional notable challenges due to the amount of
domain knowledge, the need for complicated data operations, and intricate
functional dependencies between the operations. Here, we present BioCoder, a
benchmark developed to evaluate existing pre-trained models in generating
bioinformatics code. In relation to function-code generation, BioCoder covers
potential package dependencies, class declarations, and global variables. It
incorporates 1026 functions and 1243 methods in Python and Java from GitHub and
253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing
framework for evaluation, and we have applied it to evaluate many models
including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+,
InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes
the importance of domain knowledge, pragmatic code generation, and contextual
understanding. Our dataset, benchmark, Docker images, and scripts required for
testing are all available at https://github.com/gersteinlab/biocoder
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Low-rank adaptations (LoRA) are often employed to fine-tune large language
models (LLMs) for new tasks. This paper investigates LoRA composability for
cross-task generalization and introduces LoraHub, a strategic framework devised
for the purposive assembly of LoRA modules trained on diverse given tasks, with
the objective of achieving adaptable performance on unseen tasks. With just a
few examples from a novel task, LoraHub enables the fluid combination of
multiple LoRA modules, eradicating the need for human expertise. Notably, the
composition requires neither additional model parameters nor gradients. Our
empirical results, derived from the Big-Bench Hard (BBH) benchmark, suggest
that LoraHub can effectively mimic the performance of in-context learning in
few-shot scenarios, excluding the necessity of in-context examples alongside
each inference input. A significant contribution of our research is the
fostering of a community for LoRA, where users can share their trained LoRA
modules, thereby facilitating their application to new tasks. We anticipate
this resource will widen access to and spur advancements in general
intelligence as well as LLMs in production. Code will be available at
https://github.com/sail-sg/lorahub.Comment: Work in progress. The first three authors contributed equally to this
wor
Re-using Chip Level DFT at Board Level
As chips are getting increasingly complex, there is no surprise to find more and more built-in DFX. This built-in DFT is obviously beneficial for chip/silicon DFX engineers; however, board/system level DFX engineers often have limited access to the build in DFX features. There is currently an increasing demand from board/system level DFX engineers to reuse chip/silicon DFX at board/system level. This special session will discuss: What chip access is needed for board-level for test and diagnosis? How to accomplish the access? Will IEEE P1687 and IEEE 1149.1 solve these problems
Coastal and Inland Aquatic Data Products for the Hyperspectral Infrared Imager (HyspIRI)
The HyspIRI Aquatic Studies Group (HASG) has developed a conceptual list of data products for the HyspIRI mission to support aquatic remote sensing of coastal and inland waters. These data products were based on mission capabilities, characteristics, and expected performance. The topic of coastal and inland water remote sensing is very broad. Thus, this report focuses on aquatic data products to keep the scope of this document manageable. The HyspIRI mission requirements already include the global production of surface reflectance and temperature. Atmospheric correction and surface temperature algorithms, which are critical to aquatic remote sensing, are covered in other mission documents. Hence, these algorithms and their products were not evaluated in this report. In addition, terrestrial products (e.g., land use land cover, dune vegetation, and beach replenishment) were not considered. It is recognized that coastal studies are inherently interdisciplinary across aquatic and terrestrial disciplines. However, products supporting the latter are expected to already be evaluated by other components of the mission. The coastal and inland water data products that were identified by the HASG, covered six major environmental and ecological areas for scientific research and applications: wetlands, shoreline processes, the water surface, the water column, bathymetry and benthic cover types. Accordingly, each candidate product was evaluated for feasibility based on the HyspIRI mission characteristics and whether it was unique and relevant to the HyspIRI science objectives
The Long-Baseline Neutrino Experiment: Exploring Fundamental Symmetries of the Universe
The preponderance of matter over antimatter in the early Universe, the
dynamics of the supernova bursts that produced the heavy elements necessary for
life and whether protons eventually decay --- these mysteries at the forefront
of particle physics and astrophysics are key to understanding the early
evolution of our Universe, its current state and its eventual fate. The
Long-Baseline Neutrino Experiment (LBNE) represents an extensively developed
plan for a world-class experiment dedicated to addressing these questions. LBNE
is conceived around three central components: (1) a new, high-intensity
neutrino source generated from a megawatt-class proton accelerator at Fermi
National Accelerator Laboratory, (2) a near neutrino detector just downstream
of the source, and (3) a massive liquid argon time-projection chamber deployed
as a far detector deep underground at the Sanford Underground Research
Facility. This facility, located at the site of the former Homestake Mine in
Lead, South Dakota, is approximately 1,300 km from the neutrino source at
Fermilab -- a distance (baseline) that delivers optimal sensitivity to neutrino
charge-parity symmetry violation and mass ordering effects. This ambitious yet
cost-effective design incorporates scalability and flexibility and can
accommodate a variety of upgrades and contributions. With its exceptional
combination of experimental configuration, technical capabilities, and
potential for transformative discoveries, LBNE promises to be a vital facility
for the field of particle physics worldwide, providing physicists from around
the globe with opportunities to collaborate in a twenty to thirty year program
of exciting science. In this document we provide a comprehensive overview of
LBNE's scientific objectives, its place in the landscape of neutrino physics
worldwide, the technologies it will incorporate and the capabilities it will
possess.Comment: Major update of previous version. This is the reference document for
LBNE science program and current status. Chapters 1, 3, and 9 provide a
comprehensive overview of LBNE's scientific objectives, its place in the
landscape of neutrino physics worldwide, the technologies it will incorporate
and the capabilities it will possess. 288 pages, 116 figure
Benchmarking spike-based visual recognition: a dataset and evaluation
Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organisation have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarks and that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware implementations. With this dataset we hope to (1) promote meaningful comparison between algorithms in the field of neural computation, (2) allow comparison with conventional image recognition methods, (3) provide an assessment of the state of the art in spike-based visual recognition, and (4) help researchers identify future directions and advance the field
Advanced Technology Large-Aperture Space Telescope (ATLAST): A Technology Roadmap for the Next Decade
The Advanced Technology Large-Aperture Space Telescope (ATLAST) is a set of
mission concepts for the next generation of UVOIR space observatory with a
primary aperture diameter in the 8-m to 16-m range that will allow us to
perform some of the most challenging observations to answer some of our most
compelling questions, including "Is there life elsewhere in the Galaxy?" We
have identified two different telescope architectures, but with similar optical
designs, that span the range in viable technologies. The architectures are a
telescope with a monolithic primary mirror and two variations of a telescope
with a large segmented primary mirror. This approach provides us with several
pathways to realizing the mission, which will be narrowed to one as our
technology development progresses. The concepts invoke heritage from HST and
JWST design, but also take significant departures from these designs to
minimize complexity, mass, or both.
Our report provides details on the mission concepts, shows the extraordinary
scientific progress they would enable, and describes the most important
technology development items. These are the mirrors, the detectors, and the
high-contrast imaging technologies, whether internal to the observatory, or
using an external occulter. Experience with JWST has shown that determined
competitors, motivated by the development contracts and flight opportunities of
the new observatory, are capable of achieving huge advances in technical and
operational performance while keeping construction costs on the same scale as
prior great observatories.Comment: 22 pages, RFI submitted to Astro2010 Decadal Committe
- …