43 research outputs found

    Digits micro-model for accurate and secure transactions

    Full text link
    Automatic Speech Recognition (ASR) systems are used in the financial domain to enhance the caller experience by enabling natural language understanding and facilitating efficient and intuitive interactions. Increasing use of ASR systems requires that such systems exhibit very low error rates. The predominant ASR models to collect numeric data are large, general-purpose commercial models -- Google Speech-to-text (STT), or Amazon Transcribe -- or open source (OpenAI's Whisper). Such ASR models are trained on hundreds of thousands of hours of audio data and require considerable resources to run. Despite recent progress large speech recognition models, we highlight the potential of smaller, specialized "micro" models. Such light models can be trained perform well on number recognition specific tasks, competing with general models like Whisper or Google STT while using less than 80 minutes of training time and occupying at least an order of less memory resources. Also, unlike larger speech recognition models, micro-models are trained on carefully selected and curated datasets, which makes them highly accurate, agile, and easy to retrain, while using low compute resources. We present our work on creating micro models for multi-digit number recognition that handle diverse speaking styles reflecting real-world pronunciation patterns. Our work contributes to domain-specific ASR models, improving digit recognition accuracy, and privacy of data. An added advantage, their low resource consumption allows them to be hosted on-premise, keeping private data local instead uploading to an external cloud. Our results indicate that our micro-model makes less errors than the best-of-breed commercial or open-source ASRs in recognizing digits (1.8% error rate of our best micro-model versus 5.8% error rate of Whisper), and has a low memory footprint (0.66 GB VRAM for our model versus 11 GB VRAM for Whisper).Comment: 7 pages, 1 figure, 5 table

    The state of peer-to-peer network simulators

    Get PDF
    Networking research often relies on simulation in order to test and evaluate new ideas. An important requirement of this process is that results must be reproducible so that other researchers can replicate, validate and extend existing work. We look at the landscape of simulators for research in peer-to-peer (P2P) networks by conducting a survey of a combined total of over 280 papers from before and after 2007 (the year of the last survey in this area), and comment on the large quantity of research using bespoke, closed-source simulators. We propose a set of criteria that P2P simulators should meet, and poll the P2P research community for their agreement. We aim to drive the community towards performing their experiments on simulators that allow for others to validate their results

    Sloan Digital Sky Survey Imaging of Low Galactic Latitude Fields: Technical Summary and Data Release

    Full text link
    The Sloan Digital Sky Survey (SDSS) mosaic camera and telescope have obtained five-band optical-wavelength imaging near the Galactic plane outside of the nominal survey boundaries. These additional data were obtained during commissioning and subsequent testing of the SDSS observing system, and they provide unique wide-area imaging data in regions of high obscuration and star formation, including numerous young stellar objects, Herbig-Haro objects and young star clusters. Because these data are outside the Survey regions in the Galactic caps, they are not part of the standard SDSS data releases. This paper presents imaging data for 832 square degrees of sky (including repeats), in the star-forming regions of Orion, Taurus, and Cygnus. About 470 square degrees are now released to the public, with the remainder to follow at the time of SDSS Data Release 4. The public data in Orion include the star-forming region NGC 2068/NGC 2071/HH24 and a large part of Barnard's loop.Comment: 31 pages, 9 figures (3 missing to save space), accepted by AJ, in press, see http://photo.astro.princeton.edu/oriondatarelease for data and paper with all figure

    The Sloan Digital Sky Survey: Technical Summary

    Get PDF
    The Sloan Digital Sky Survey (SDSS) will provide the data to support detailed investigations of the distribution of luminous and non- luminous matter in the Universe: a photometrically and astrometrically calibrated digital imaging survey of pi steradians above about Galactic latitude 30 degrees in five broad optical bands to a depth of g' about 23 magnitudes, and a spectroscopic survey of the approximately one million brightest galaxies and 10^5 brightest quasars found in the photometric object catalog produced by the imaging survey. This paper summarizes the observational parameters and data products of the SDSS, and serves as an introduction to extensive technical on-line documentation.Comment: 9 pages, 7 figures, AAS Latex. To appear in AJ, Sept 200

    The Second Data Release of the Sloan Digital Sky Survey

    Get PDF
    The Sloan Digital Sky Survey (SDSS) has validated and made publicly available its Second Data Release. This data release consists of 3324 deg2 of five-band (ugriz) imaging data with photometry for over 88 million unique objects, 367,360 spectra of galaxies, quasars, stars, and calibrating blank sky patches selected over 2627 deg2 of this area, and tables of measured parameters from these data. The imaging data reach a depth of r ≈ 22.2 (95% completeness limit for point sources) and are photometrically and astrometrically calibrated to 2% rms and 100 mas rms per coordinate, respectively. The imaging data have all been processed through a new version of the SDSS imaging pipeline, in which the most important improvement since the last data release is fixing an error in the model fits to each object. The result is that model magnitudes are now a good proxy for point-spread function magnitudes for point sources, and Petrosian magnitudes for extended sources. The spectroscopy extends from 3800 to 9200 Å at a resolution of 2000. The spectroscopic software now repairs a systematic error in the radial velocities of certain types of stars and has substantially improved spectrophotometry. All data included in the SDSS Early Data Release and First Data Release are reprocessed with the improved pipelines and included in the Second Data Release. Further characteristics of the data are described, as are the data products themselves and the tools for accessing them

    The Third Data Release of the Sloan Digital Sky Survey

    Get PDF
    This paper describes the Third Data Release of the Sloan Digital Sky Survey (SDSS). This release, containing data taken up through June 2003, includes imaging data in five bands over 5282 deg^2, photometric and astrometric catalogs of the 141 million objects detected in these imaging data, and spectra of 528,640 objects selected over 4188 deg^2. The pipelines analyzing both images and spectroscopy are unchanged from those used in our Second Data Release.Comment: 14 pages, including 2 postscript figures. Submitted to AJ. Data available at http://www.sdss.org/dr

    A Case Study of a Corporate Open Source Development Model

    No full text
    Open source practices and tools have proven to be highly effective for overcoming the many problems of geographically distributed software development. We know relatively little, however, about the range of settings in which they work. In particular, can corporations use the open source development model effectively for software projects inside the corporate domain? Or are these tools and practices incompatible with development environments, management practices, and market-driven schedule and feature decisions typical of a commercial software house? We present a case study of open source software development methodology adopted by a significant commercial software project in the telecommunications domain. We extract a number of lessons learned from the experience, and identify open research questions