1,179 research outputs found

    Development of a Field-Deployable Voice-Controlled Ultrasound Scanner System

    Get PDF
    Modern ultrasound scanners are portable and have become very useful for clinical diagnosis. However, they have limitations for field use purposes, primarily because they occupy both hands of the physician who performs the scanning. The goal of this thesis is to develop a wearable voice-controlled ultrasound scanner system that would enable the physician to provide a fast and efficient diagnosis. This is expected to become very useful for emergency and trauma applications. A commercially available ultrasound scanner system, Terason 2000, was chosen as the basis for development. This system consists of a laptop, a hardware unit containing the RF beamforming and signal processing chips and the ultrasound transducer. In its commercial version, the control of the ultrasound system is performed via a Graphical User Interface with a Windows-application look and feel. In the system we developed, a command and control speech recognition engine and a noise-canceling microphone are selected to control the scanner using voice commands. A mini-joystick is attached to the top of the ultrasound transducer for distance and area measurements and to perform zooming of the ultrasound images. An eye-wear viewer connected to the laptop enables the user to view the ultrasound images directly. Power management features are incorporated into the ultrasound system in order to conserve the battery power. A wireless connection is set up with a remote laptop to enable real-time transmission of wireless images. The result is a truly untethered, voice-controlled, ultrasound system enclosed in a backpack and monitored by the eye-wear viewer. (In the second generation of this system, the laptop is replaced by an embedded PC and is incorporated into a photographer’s vest). The voice-controlled system has to be made reliable under various forms of background noise. Three command and control speech recognition systems were selected and their recognition performances were determined under different types and levels of ambient noise. The variation of recognition rates was also analyzed over 6 different speakers. A detailed testing was also conducted to identify the ideal combination of a microphone and speech recognition engine suitable for the ultrasound scanner system. Six different microphones, each with their own unique methods of implementing noise cancellation features, were chosen as candidates for this analysis. The testing was conducted by making recordings inside a highly reverberant acoustic noisy chamber, and the recordings were fed to the automatic speech recognition engines offline for performance evaluation. The speech recognition engine and microphone selected as a result of this extensive testing were then incorporated into the wearable ultrasound scanner system. This thesis also discusses the implementation of the human-speech interface, which also plays a major role in the effectiveness of the voice-controlled ultrasound scanner system

    Evaluation of room acoustic qualities and defects by use of auralization

    Get PDF

    We can hear you with Wi-Fi!

    Get PDF

    Machine Learning and Signal Processing Design for Edge Acoustic Applications

    Get PDF

    Machine Learning and Signal Processing Design for Edge Acoustic Applications

    Get PDF

    State of the art of audio- and video based solutions for AAL

    Get PDF
    Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio

    Enhancing Usability, Security, and Performance in Mobile Computing

    Get PDF
    We have witnessed the prevalence of smart devices in every aspect of human life. However, the ever-growing smart devices present significant challenges in terms of usability, security, and performance. First, we need to design new interfaces to improve the device usability which has been neglected during the rapid shift from hand-held mobile devices to wearables. Second, we need to protect smart devices with abundant private data against unauthorized users. Last, new applications with compute-intensive tasks demand the integration of emerging mobile backend infrastructure. This dissertation focuses on addressing these challenges. First, we present GlassGesture, a system that improves the usability of Google Glass through a head gesture user interface with gesture recognition and authentication. We accelerate the recognition by employing a novel similarity search scheme, and improve the authentication performance by applying new features of head movements in an ensemble learning method. as a result, GlassGesture achieves 96% gesture recognition accuracy. Furthermore, GlassGesture accepts authorized users in nearly 92% of trials, and rejects attackers in nearly 99% of trials. Next, we investigate the authentication between a smartphone and a paired smartwatch. We design and implement WearLock, a system that utilizes one\u27s smartwatch to unlock one\u27s smartphone via acoustic tones. We build an acoustic modem with sub-channel selection and adaptive modulation, which generates modulated acoustic signals to maximize the unlocking success rate against ambient noise. We leverage the motion similarities of the devices to eliminate unnecessary unlocking. We also offload heavy computation tasks from the smartwatch to the smartphone to shorten response time and save energy. The acoustic modem achieves a low bit error rate (BER) of 8%. Compared to traditional manual personal identification numbers (PINs) entry, WearLock not only automates the unlocking but also speeds it up by at least 18%. Last, we consider low-latency video analytics on mobile devices, leveraging emerging mobile backend infrastructure. We design and implement LAVEA, a system which offloads computation from mobile clients to edge nodes, to accomplish tasks with intensive computation at places closer to users in a timely manner. We formulate an optimization problem for offloading task selection and prioritize offloading requests received at the edge node to minimize the response time. We design and compare various task placement schemes for inter-edge collaboration to further improve the overall response time. Our results show that the client-edge configuration has a speedup ranging from 1.3x to 4x against running solely by the client and 1.2x to 1.7x against the client-cloud configuration

    Interfaces de fala silenciosa multimodais para português europeu com base na articulação

    Get PDF
    Doutoramento conjunto MAPi em InformáticaThe concept of silent speech, when applied to Human-Computer Interaction (HCI), describes a system which allows for speech communication in the absence of an acoustic signal. By analyzing data gathered during different parts of the human speech production process, Silent Speech Interfaces (SSI) allow users with speech impairments to communicate with a system. SSI can also be used in the presence of environmental noise, and in situations in which privacy, confidentiality, or non-disturbance are important. Nonetheless, despite recent advances, performance and usability of Silent Speech systems still have much room for improvement. A better performance of such systems would enable their application in relevant areas, such as Ambient Assisted Living. Therefore, it is necessary to extend our understanding of the capabilities and limitations of silent speech modalities and to enhance their joint exploration. Thus, in this thesis, we have established several goals: (1) SSI language expansion to support European Portuguese; (2) overcome identified limitations of current SSI techniques to detect EP nasality (3) develop a Multimodal HCI approach for SSI based on non-invasive modalities; and (4) explore more direct measures in the Multimodal SSI for EP acquired from more invasive/obtrusive modalities, to be used as ground truth in articulation processes, enhancing our comprehension of other modalities. In order to achieve these goals and to support our research in this area, we have created a multimodal SSI framework that fosters leveraging modalities and combining information, supporting research in multimodal SSI. The proposed framework goes beyond the data acquisition process itself, including methods for online and offline synchronization, multimodal data processing, feature extraction, feature selection, analysis, classification and prototyping. Examples of applicability are provided for each stage of the framework. These include articulatory studies for HCI, the development of a multimodal SSI based on less invasive modalities and the use of ground truth information coming from more invasive/obtrusive modalities to overcome the limitations of other modalities. In the work here presented, we also apply existing methods in the area of SSI to EP for the first time, noting that nasal sounds may cause an inferior performance in some modalities. In this context, we propose a non-invasive solution for the detection of nasality based on a single Surface Electromyography sensor, conceivable of being included in a multimodal SSI.O conceito de fala silenciosa, quando aplicado a interação humano-computador, permite a comunicação na ausência de um sinal acústico. Através da análise de dados, recolhidos no processo de produção de fala humana, uma interface de fala silenciosa (referida como SSI, do inglês Silent Speech Interface) permite a utilizadores com deficiências ao nível da fala comunicar com um sistema. As SSI podem também ser usadas na presença de ruído ambiente, e em situações em que privacidade, confidencialidade, ou não perturbar, é importante. Contudo, apesar da evolução verificada recentemente, o desempenho e usabilidade de sistemas de fala silenciosa tem ainda uma grande margem de progressão. O aumento de desempenho destes sistemas possibilitaria assim a sua aplicação a áreas como Ambientes Assistidos. É desta forma fundamental alargar o nosso conhecimento sobre as capacidades e limitações das modalidades utilizadas para fala silenciosa e fomentar a sua exploração conjunta. Assim, foram estabelecidos vários objetivos para esta tese: (1) Expansão das linguagens suportadas por SSI com o Português Europeu; (2) Superar as limitações de técnicas de SSI atuais na deteção de nasalidade; (3) Desenvolver uma abordagem SSI multimodal para interação humano-computador, com base em modalidades não invasivas; (4) Explorar o uso de medidas diretas e complementares, adquiridas através de modalidades mais invasivas/intrusivas em configurações multimodais, que fornecem informação exata da articulação e permitem aumentar a nosso entendimento de outras modalidades. Para atingir os objetivos supramencionados e suportar a investigação nesta área procedeu-se à criação de uma plataforma SSI multimodal que potencia os meios para a exploração conjunta de modalidades. A plataforma proposta vai muito para além da simples aquisição de dados, incluindo também métodos para sincronização de modalidades, processamento de dados multimodais, extração e seleção de características, análise, classificação e prototipagem. Exemplos de aplicação para cada fase da plataforma incluem: estudos articulatórios para interação humano-computador, desenvolvimento de uma SSI multimodal com base em modalidades não invasivas, e o uso de informação exata com origem em modalidades invasivas/intrusivas para superar limitações de outras modalidades. No trabalho apresentado aplica-se ainda, pela primeira vez, métodos retirados do estado da arte ao Português Europeu, verificando-se que sons nasais podem causar um desempenho inferior de um sistema de fala silenciosa. Neste contexto, é proposta uma solução para a deteção de vogais nasais baseada num único sensor de eletromiografia, passível de ser integrada numa interface de fala silenciosa multimodal
    corecore