15,593 research outputs found

    Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs

    Get PDF
    Search engine is the popular term for an information retrieval (IR) system. Typically, search engine can be based on full-text indexing. Changing the presentation from the text data to multimedia data types make an information retrieval process more complex such as a retrieval of image or sounds in large databases. This paper introduces the use of language and text independent speech as input queries in a large sound database by using Speaker identification algorithm. The method consists of 2 main processing first steps, we separate vocal and non-vocal identification after that vocal be used to speaker identification for audio query by speaker voice. For the speaker identification and audio query by process, we estimate the similarity of the example signal and the samples in the queried database by calculating the Euclidian distance between the Mel frequency cepstral coefficients (MFCC) and Energy spectrum of acoustic features. The simulations show that the good performance with a sustainable computational cost and obtained the average accuracy rate more than 90%

    The use of voice interface systems to augment selling and buying on university campuses

    Get PDF
    Applied project submitted to the Department of Computer Science, Ashesi University, in partial fulfillment of Bachelor of Science degree in Computer Science, April 2019Our everyday shopping lives have been significantly augmented by rapid advances in Commerce Technologies. Buying and Selling of items as well as payments are currently mostly done online using advanced technologies, which have sped up shopping activities and made lives more comfortable for consumers. In spite of the rise in e-commerce, e-business, internet communication, and payment systems, physical cash is still popularly used in buying and selling of items on the University Campus in Ghana. There is no issue with that. However, a lot more problems surface when the seller has to give change to the buyer. The inconvenience of getting change for buyers especially when the change amount is quite small, such as GHC 20 pesewas is becoming menacing. This project thus seeks to reduce the inconveniency associated with change collection during buying and selling by allowing students and staff to accumulate their change amounts electronically through voice interface systems. This paper presents a comprehensive implementation of the OkNsesa system which comprises of speaker recognition and speech recognition components to allow users update their electronic accounts using voice commands. The key advantage of using voice interfaces is the ability to automatically log users into the system by recognizing who the user is from his voice.Ashesi Universit

    Learnable PINs: Cross-Modal Embeddings for Person Identity

    Full text link
    We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task, that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas.Comment: To appear in ECCV 201

    Automatic Speaker Recognition by Speech Signal

    Get PDF
    • …
    corecore