43 research outputs found
Digits micro-model for accurate and secure transactions
Automatic Speech Recognition (ASR) systems are used in the financial domain
to enhance the caller experience by enabling natural language understanding and
facilitating efficient and intuitive interactions. Increasing use of ASR
systems requires that such systems exhibit very low error rates. The
predominant ASR models to collect numeric data are large, general-purpose
commercial models -- Google Speech-to-text (STT), or Amazon Transcribe -- or
open source (OpenAI's Whisper). Such ASR models are trained on hundreds of
thousands of hours of audio data and require considerable resources to run.
Despite recent progress large speech recognition models, we highlight the
potential of smaller, specialized "micro" models. Such light models can be
trained perform well on number recognition specific tasks, competing with
general models like Whisper or Google STT while using less than 80 minutes of
training time and occupying at least an order of less memory resources. Also,
unlike larger speech recognition models, micro-models are trained on carefully
selected and curated datasets, which makes them highly accurate, agile, and
easy to retrain, while using low compute resources. We present our work on
creating micro models for multi-digit number recognition that handle diverse
speaking styles reflecting real-world pronunciation patterns. Our work
contributes to domain-specific ASR models, improving digit recognition
accuracy, and privacy of data. An added advantage, their low resource
consumption allows them to be hosted on-premise, keeping private data local
instead uploading to an external cloud. Our results indicate that our
micro-model makes less errors than the best-of-breed commercial or open-source
ASRs in recognizing digits (1.8% error rate of our best micro-model versus 5.8%
error rate of Whisper), and has a low memory footprint (0.66 GB VRAM for our
model versus 11 GB VRAM for Whisper).Comment: 7 pages, 1 figure, 5 table
The state of peer-to-peer network simulators
Networking research often relies on simulation in order to test and evaluate new ideas. An important requirement of this process is that results must be reproducible so that other researchers can replicate, validate and extend existing work. We look at the landscape of simulators for research in peer-to-peer (P2P) networks by conducting a survey of a combined total of over 280 papers from before and after 2007 (the year of the last survey in this area), and comment on the large quantity of research using bespoke, closed-source simulators. We propose a set of criteria that P2P simulators should meet, and poll the P2P research community for their agreement. We aim to drive the community towards performing their experiments on simulators that allow for others to validate their results
Sloan Digital Sky Survey Imaging of Low Galactic Latitude Fields: Technical Summary and Data Release
The Sloan Digital Sky Survey (SDSS) mosaic camera and telescope have obtained
five-band optical-wavelength imaging near the Galactic plane outside of the
nominal survey boundaries. These additional data were obtained during
commissioning and subsequent testing of the SDSS observing system, and they
provide unique wide-area imaging data in regions of high obscuration and star
formation, including numerous young stellar objects, Herbig-Haro objects and
young star clusters. Because these data are outside the Survey regions in the
Galactic caps, they are not part of the standard SDSS data releases. This paper
presents imaging data for 832 square degrees of sky (including repeats), in the
star-forming regions of Orion, Taurus, and Cygnus. About 470 square degrees are
now released to the public, with the remainder to follow at the time of SDSS
Data Release 4. The public data in Orion include the star-forming region NGC
2068/NGC 2071/HH24 and a large part of Barnard's loop.Comment: 31 pages, 9 figures (3 missing to save space), accepted by AJ, in
press, see http://photo.astro.princeton.edu/oriondatarelease for data and
paper with all figure
The Sloan Digital Sky Survey: Technical Summary
The Sloan Digital Sky Survey (SDSS) will provide the data to support detailed
investigations of the distribution of luminous and non- luminous matter in the
Universe: a photometrically and astrometrically calibrated digital imaging
survey of pi steradians above about Galactic latitude 30 degrees in five broad
optical bands to a depth of g' about 23 magnitudes, and a spectroscopic survey
of the approximately one million brightest galaxies and 10^5 brightest quasars
found in the photometric object catalog produced by the imaging survey. This
paper summarizes the observational parameters and data products of the SDSS,
and serves as an introduction to extensive technical on-line documentation.Comment: 9 pages, 7 figures, AAS Latex. To appear in AJ, Sept 200
The Second Data Release of the Sloan Digital Sky Survey
The Sloan Digital Sky Survey (SDSS) has validated and made publicly available its Second Data Release. This data release consists of 3324 deg2 of five-band (ugriz) imaging data with photometry for over 88 million unique objects, 367,360 spectra of galaxies, quasars, stars, and calibrating blank sky patches selected over 2627 deg2 of this area, and tables of measured parameters from these data. The imaging data reach a depth of r ≈ 22.2 (95% completeness limit for point sources) and are photometrically and astrometrically calibrated to 2% rms and 100 mas rms per coordinate, respectively. The imaging data have all been processed through a new version of the SDSS imaging pipeline, in which the most important improvement since the last data release is fixing an error in the model fits to each object. The result is that model magnitudes are now a good proxy for point-spread function magnitudes for point sources, and Petrosian magnitudes for extended sources. The spectroscopy extends from 3800 to 9200 Å at a resolution of 2000. The spectroscopic software now repairs a systematic error in the radial velocities of certain types of stars and has substantially improved spectrophotometry. All data included in the SDSS Early Data Release and First Data Release are reprocessed with the improved pipelines and included in the Second Data Release. Further characteristics of the data are described, as are the data products themselves and the tools for accessing them
The Third Data Release of the Sloan Digital Sky Survey
This paper describes the Third Data Release of the Sloan Digital Sky Survey
(SDSS). This release, containing data taken up through June 2003, includes
imaging data in five bands over 5282 deg^2, photometric and astrometric
catalogs of the 141 million objects detected in these imaging data, and spectra
of 528,640 objects selected over 4188 deg^2. The pipelines analyzing both
images and spectroscopy are unchanged from those used in our Second Data
Release.Comment: 14 pages, including 2 postscript figures. Submitted to AJ. Data
available at http://www.sdss.org/dr
Recommended from our members
Sloan Digital Sky Survey Imaging of Low Galactic Latitude Fields: Technical Summary and Data Release
The Sloan Digital Sky Survey (SDSS) mosaic camera and telescope have obtained five-band optical-wavelength imaging near the Galactic plane outside of the nominal survey boundaries. These additional data were obtained during commissioning and subsequent testing of the SDSS observing system, and they provide unique wide-area imaging data in regions of high obscuration and star formation, including numerous young stellar objects, Herbig-Haro objects, and young star clusters. Because these data are outside the survey regions in the Galactic caps, they are not part of the standard SDSS data releases. This paper presents imaging data for 832 square degrees of sky (including repeats), in the star-forming regions of Orion, Taurus, and Cygnus. About 470 deg2 are now released to the public, with the remainder to follow at the time of SDSS Data Release 4. The public data in Orion include the star-forming region NGC 2068/NGC 2071/HH 24 and a large part of Barnard's loop.Astronom
A Case Study of a Corporate Open Source Development Model
Open source practices and tools have proven to be highly effective for overcoming the many problems of geographically distributed software development. We know relatively little, however, about the range of settings in which they work. In particular, can corporations use the open source development model effectively for software projects inside the corporate domain? Or are these tools and practices incompatible with development environments, management practices, and market-driven schedule and feature decisions typical of a commercial software house? We present a case study of open source software development methodology adopted by a significant commercial software project in the telecommunications domain. We extract a number of lessons learned from the experience, and identify open research questions