10,484 research outputs found
A review of abnormal behavior detection in activities of daily living
Abnormal behavior detection (ABD) systems are built to automatically identify and recognize abnormal behavior from various input data types, such as sensor-based and vision-based input. As much as the attention received for ABD systems, the number of studies on ABD in activities of daily living (ADL) is limited. Owing to the increasing rate of elderly accidents in the home compound, ABD in ADL research should be given as much attention to preventing accidents by sending out signals when abnormal behavior such as falling is detected. In this study, we compare and contrast the formation of the ABD system in ADL from input data types (sensor-based input and vision-based input) to modeling techniques (conventional and deep learning approaches). We scrutinize the public datasets available and provide solutions for one of the significant issues: the lack of datasets in ABD in ADL. This work aims to guide new research to understand the field of ABD in ADL better and serve as a reference for future study of better Ambient Assisted Living with the growing smart home trend
IMPACTO DA PANDEMIA PELA COVID-19 E MODELOS DE APRENDIZAGEM DE MÁQUINA PARA PREDIÇÃO DE NASCIMENTOS PREMATUROS NAS CAPITAIS DA REGIÃO NORDESTE DO BRASIL, 2018-2021
O nascimento prematuro é um problema global devido a suas implicações para a morbidade e mortalidade. Consiste em um dos principais fatores de risco para a mortalidade neonatal e infantil. O parto pré-termo é definido como aquele cuja gestação termina entre a 20ª e a 37ª semanas ou entre 140 e 257 dias após o primeiro dia da última menstruação. Para este estudo, utilizou-se dados do Sistema de Informações sobre Nascidos Vivos (SINASC) das capitais da região Nordeste do Brasil, entre 2018 e 2021. Foi Verificado se os dois primeiros anos da pandemia pela covid-19 trouxeram impactos significativos para as distribuições das métricas de performance, em comparação ao que foi utilizado para treinamento e validação dos modelos. Foram aplicados seis algoritmos de aprendizado de máquina (Regressão Logística, Análise Discriminante Linear, Perceptron Multicamadas, AdaBoost, Árvore de decisão e Floresta Aleatória) para predição de prematuridade. Os modelos apresentaram como resultado queda na métrica Area Under the roc Curve (AUC) nos anos de 2020 e 2021 em relação a 2018 e 2019, com ênfase para os modelos Adaboost, Floresta Aleatória e Árvore de decisão, com quedas superiores a 10% atestadas pelos testes estatísticos de Kruskal-Wallis e Nemenyi. Como causadores da queda de performance dos modelos, foi identificado que as variáveis mês do início do pré-natal e idade perderam aderência em relação à base de treino. Os modelos apresentaram boa performance preditiva, contudo, a utilização de modelos baseados em árvores deve ser feita com cautela, visto que estes são mais instáveis e que a covid-19 trouxe impacto na distribuição das variáveis idade e mês de início de pré-natal. Para treinamento de novos modelos, atenção às variáveis de entrada e ao período utilizado para treinamento. Para soluções já estabelecidas, considerar o seu retreinamento.
PALABRAS-CHAVE: Prematuridade. Saúde. Inteligência Artificial. Aprendizado de Máquina. covid-19
Audio-Visual Automatic Speech Recognition Towards Education for Disabilities
Education is a fundamental right that enriches everyone’s life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition
Machine Learning Research Trends in Africa: A 30 Years Overview with Bibliometric Analysis Review
In this paper, a critical bibliometric analysis study is conducted, coupled
with an extensive literature survey on recent developments and associated
applications in machine learning research with a perspective on Africa. The
presented bibliometric analysis study consists of 2761 machine learning-related
documents, of which 98% were articles with at least 482 citations published in
903 journals during the past 30 years. Furthermore, the collated documents were
retrieved from the Science Citation Index EXPANDED, comprising research
publications from 54 African countries between 1993 and 2021. The bibliometric
study shows the visualization of the current landscape and future trends in
machine learning research and its application to facilitate future
collaborative research and knowledge exchange among authors from different
research institutions scattered across the African continent
Decoding spatial location of attended audio-visual stimulus with EEG and fNIRS
When analyzing complex scenes, humans often focus their attention on an object at a particular spatial location in the presence of background noises and irrelevant visual objects. The ability to decode the attended spatial location would facilitate brain computer interfaces (BCI) for complex scene analysis. Here, we tested two different neuroimaging technologies and investigated their capability to decode audio-visual spatial attention in the presence of competing stimuli from multiple locations. For functional near-infrared spectroscopy (fNIRS), we targeted dorsal frontoparietal network including frontal eye field (FEF) and intra-parietal sulcus (IPS) as well as superior temporal gyrus/planum temporal (STG/PT). They all were shown in previous functional magnetic resonance imaging (fMRI) studies to be activated by auditory, visual, or audio-visual spatial tasks. We found that fNIRS provides robust decoding of attended spatial locations for most participants and correlates with behavioral performance. Moreover, we found that FEF makes a large contribution to decoding performance. Surprisingly, the performance was significantly above chance level 1s after cue onset, which is well before the peak of the fNIRS response.
For electroencephalography (EEG), while there are several successful EEG-based algorithms, to date, all of them focused exclusively on auditory modality where eye-related artifacts are minimized or controlled. Successful integration into a more ecological typical usage requires careful consideration for eye-related artifacts which are inevitable. We showed that fast and reliable decoding can be done with or without ocular-removal algorithm. Our results show that EEG and fNIRS are promising platforms for compact, wearable technologies that could be applied to decode attended spatial location and reveal contributions of specific brain regions during complex scene analysis
Rapid literature mapping on the recent use of machine learning for wildlife imagery
Machine (especially deep) learning algorithms are changing the way wildlife imagery is processed. They dramatically speed up the time to detect, count, and classify animals and their behaviours. Yet, we currently have very few systematic literature surveys on its use in wildlife imagery. Through a literature survey (a ‘rapid’ review) and bibliometric mapping, we explored its use across: 1) species (vertebrates), 2) image types (e.g., camera traps, or drones), 3) study locations, 4) alternative machine learning algorithms, 5) outcomes (e.g., recognition, classification, or tracking), 6) reporting quality and openness, 7) author affiliation, and 8) publication journal types. We found that an increasing number of studies used convolutional neural networks (i.e., deep learning). Typically, studies have focused on large charismatic or iconic mammalian species. An increasing number of studies have been published in ecology-specific journals indicating the uptake of deep learning to transform the detection, classification and tracking of wildlife. Sharing of code was limited, with only 20% of studies providing links to analysis code. Much of the published research and focus on animals came from India, China, Australia, or the USA. There were relatively few collaborations across countries. Given the power of machine learning, we recommend increasing collaboration and sharing approaches to utilise increasing amounts of wildlife imagery more rapidly and transform and improve understanding of wildlife behaviour and conservation. Our survey, augmented with bibliometric analyses, provides valuable signposts for future studies to resolve and address shortcomings, gaps, and biases
Modelling uncertainties for measurements of the H → γγ Channel with the ATLAS Detector at the LHC
The Higgs boson to diphoton (H → γγ) branching ratio is only 0.227 %, but this
final state has yielded some of the most precise measurements of the particle. As
measurements of the Higgs boson become increasingly precise, greater import is
placed on the factors that constitute the uncertainty. Reducing the effects of these
uncertainties requires an understanding of their causes. The research presented
in this thesis aims to illuminate how uncertainties on simulation modelling are
determined and proffers novel techniques in deriving them.
The upgrade of the FastCaloSim tool is described, used for simulating events in
the ATLAS calorimeter at a rate far exceeding the nominal detector simulation,
Geant4. The integration of a method that allows the toolbox to emulate the
accordion geometry of the liquid argon calorimeters is detailed. This tool allows
for the production of larger samples while using significantly fewer computing
resources.
A measurement of the total Higgs boson production cross-section multiplied
by the diphoton branching ratio (σ × Bγγ) is presented, where this value was
determined to be (σ × Bγγ)obs = 127 ± 7 (stat.) ± 7 (syst.) fb, within agreement
with the Standard Model prediction. The signal and background shape modelling
is described, and the contribution of the background modelling uncertainty to the
total uncertainty ranges from 18–2.4 %, depending on the Higgs boson production
mechanism.
A method for estimating the number of events in a Monte Carlo background
sample required to model the shape is detailed. It was found that the size of
the nominal γγ background events sample required a multiplicative increase by
a factor of 3.60 to adequately model the background with a confidence level of
68 %, or a factor of 7.20 for a confidence level of 95 %. Based on this estimate,
0.5 billion additional simulated events were produced, substantially reducing the
background modelling uncertainty.
A technique is detailed for emulating the effects of Monte Carlo event generator
differences using multivariate reweighting. The technique is used to estimate the
event generator uncertainty on the signal modelling of tHqb events, improving the
reliability of estimating the tHqb production cross-section. Then this multivariate
reweighting technique is used to estimate the generator modelling uncertainties
on background V γγ samples for the first time. The estimated uncertainties were
found to be covered by the currently assumed background modelling uncertainty
Learning disentangled speech representations
A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody.
The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions.
In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks.
This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically
- …