16,050 research outputs found
Bridging Discrete and Backpropagation: Straight-Through and Beyond
Backpropagation, the cornerstone of deep learning, is limited to computing
gradients solely for continuous variables. This limitation hinders various
research on problems involving discrete latent variables. To address this
issue, we propose a novel approach for approximating the gradient of parameters
involved in generating discrete latent variables. First, we examine the widely
used Straight-Through (ST) heuristic and demonstrate that it works as a
first-order approximation of the gradient. Guided by our findings, we propose a
novel method called ReinMax, which integrates Heun's Method, a second-order
numerical method for solving ODEs, to approximate the gradient. Our method
achieves second-order accuracy without requiring Hessian or other second-order
derivatives. We conduct experiments on structured output prediction and
unsupervised generative modeling tasks. Our results show that \ours brings
consistent improvements over the state of the art, including ST and
Straight-Through Gumbel-Softmax. Implementations are released at
https://github.com/microsoft/ReinMax.Comment: Work in progres
PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data
With calls for increasing transparency, governments are releasing greater
amounts of data in multiple domains including finance, education and
healthcare. The efficient exploratory analysis of healthcare data constitutes a
significant challenge. Key concerns in public health include the quick
identification and analysis of trends, and the detection of outliers. This
allows policies to be rapidly adapted to changing circumstances. We present an
efficient outlier detection technique, termed PIKS (Pruned iterative-k means
searchlight), which combines an iterative k-means algorithm with a pruned
searchlight based scan. We apply this technique to identify outliers in two
publicly available healthcare datasets from the New York Statewide Planning and
Research Cooperative System, and California's Office of Statewide Health
Planning and Development. We provide a comparison of our technique with three
other existing outlier detection techniques, consisting of auto-encoders,
isolation forests and feature bagging. We identified outliers in conditions
including suicide rates, immunity disorders, social admissions,
cardiomyopathies, and pregnancy in the third trimester. We demonstrate that the
PIKS technique produces results consistent with other techniques such as the
auto-encoder. However, the auto-encoder needs to be trained, which requires
several parameters to be tuned. In comparison, the PIKS technique has far fewer
parameters to tune. This makes it advantageous for fast, "out-of-the-box" data
exploration. The PIKS technique is scalable and can readily ingest new
datasets. Hence, it can provide valuable, up-to-date insights to citizens,
patients and policy-makers. We have made our code open source, and with the
availability of open data, other researchers can easily reproduce and extend
our work. This will help promote a deeper understanding of healthcare policies
and public health issues
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
We propose an unsupervised method for parsing large 3D scans of real-world
scenes into interpretable parts. Our goal is to provide a practical tool for
analyzing 3D scenes with unique characteristics in the context of aerial
surveying and mapping, without relying on application-specific user
annotations. Our approach is based on a probabilistic reconstruction model that
decomposes an input 3D point cloud into a small set of learned prototypical
shapes. Our model provides an interpretable reconstruction of complex scenes
and leads to relevant instance and semantic segmentations. To demonstrate the
usefulness of our results, we introduce a novel dataset of seven diverse aerial
LiDAR scans. We show that our method outperforms state-of-the-art unsupervised
methods in terms of decomposition accuracy while remaining visually
interpretable. Our method offers significant advantage over existing
approaches, as it does not require any manual annotations, making it a
practical and efficient tool for 3D scene analysis. Our code and dataset are
available at https://imagine.enpc.fr/~loiseaur/learnable-earth-parse
Colour technologies for content production and distribution of broadcast content
The requirement of colour reproduction has long been a priority driving the development of new colour imaging systems that maximise human perceptual plausibility. This thesis explores machine learning algorithms for colour processing to assist both content production and distribution. First, this research studies colourisation technologies with practical use cases in restoration and processing of archived content. The research targets practical deployable solutions, developing a cost-effective pipeline which integrates the activity of the producer into the processing workflow. In particular, a fully automatic image colourisation paradigm using Conditional GANs is proposed to improve content generalisation and colourfulness of existing baselines. Moreover, a more conservative solution is considered by providing references to guide the system towards more accurate colour predictions. A fast-end-to-end architecture is proposed to improve existing exemplar-based image colourisation methods while decreasing the complexity and runtime. Finally, the proposed image-based methods are integrated into a video colourisation pipeline. A general framework is proposed to reduce the generation of temporal flickering or propagation of errors when such methods are applied frame-to-frame. The proposed model is jointly trained to stabilise the input video and to cluster their frames with the aim of learning scene-specific modes. Second, this research explored colour processing technologies for content distribution with the aim to effectively deliver the processed content to the broad audience. In particular, video compression is tackled by introducing a novel methodology for chroma intra prediction based on attention models. Although the proposed architecture helped to gain control over the reference samples and better understand the prediction process, the complexity of the underlying neural network significantly increased the encoding and decoding time. Therefore, aiming at efficient deployment within the latest video coding standards, this work also focused on the simplification of the proposed architecture to obtain a more compact and explainable model
Security and Privacy Problems in Voice Assistant Applications: A Survey
Voice assistant applications have become omniscient nowadays. Two models that
provide the two most important functions for real-life applications (i.e.,
Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR)
models and Speaker Identification (SI) models. According to recent studies,
security and privacy threats have also emerged with the rapid development of
the Internet of Things (IoT). The security issues researched include attack
techniques toward machine learning models and other hardware components widely
used in voice assistant applications. The privacy issues include technical-wise
information stealing and policy-wise privacy breaches. The voice assistant
application takes a steadily growing market share every year, but their privacy
and security issues never stopped causing huge economic losses and endangering
users' personal sensitive information. Thus, it is important to have a
comprehensive survey to outline the categorization of the current research
regarding the security and privacy problems of voice assistant applications.
This paper concludes and assesses five kinds of security attacks and three
types of privacy threats in the papers published in the top-tier conferences of
cyber security and voice domain.Comment: 5 figure
RAFEN -- Regularized Alignment Framework for Embeddings of Nodes
Learning representations of nodes has been a crucial area of the graph
machine learning research area. A well-defined node embedding model should
reflect both node features and the graph structure in the final embedding. In
the case of dynamic graphs, this problem becomes even more complex as both
features and structure may change over time. The embeddings of particular nodes
should remain comparable during the evolution of the graph, what can be
achieved by applying an alignment procedure. This step was often applied in
existing works after the node embedding was already computed. In this paper, we
introduce a framework -- RAFEN -- that allows to enrich any existing node
embedding method using the aforementioned alignment term and learning aligned
node embedding during training time. We propose several variants of our
framework and demonstrate its performance on six real-world datasets. RAFEN
achieves on-par or better performance than existing approaches without
requiring additional processing steps.Comment: ICCS 202
Machine Learning Applications in Studying Mental Health Among Immigrants and Racial and Ethnic Minorities: A Systematic Review
Background: The use of machine learning (ML) in mental health (MH) research
is increasing, especially as new, more complex data types become available to
analyze. By systematically examining the published literature, this review aims
to uncover potential gaps in the current use of ML to study MH in vulnerable
populations of immigrants, refugees, migrants, and racial and ethnic
minorities.
Methods: In this systematic review, we queried Google Scholar for ML-related
terms, MH-related terms, and a population of a focus search term strung
together with Boolean operators. Backward reference searching was also
conducted. Included peer-reviewed studies reported using a method or
application of ML in an MH context and focused on the populations of interest.
We did not have date cutoffs. Publications were excluded if they were narrative
or did not exclusively focus on a minority population from the respective
country. Data including study context, the focus of mental healthcare, sample,
data type, type of ML algorithm used, and algorithm performance was extracted
from each.
Results: Our search strategies resulted in 67,410 listed articles from Google
Scholar. Ultimately, 12 were included. All the articles were published within
the last 6 years, and half of them studied populations within the US. Most
reviewed studies used supervised learning to explain or predict MH outcomes.
Some publications used up to 16 models to determine the best predictive power.
Almost half of the included publications did not discuss their cross-validation
method.
Conclusions: The included studies provide proof-of-concept for the potential
use of ML algorithms to address MH concerns in these special populations, few
as they may be. Our systematic review finds that the clinical application of
these models for classifying and predicting MH disorders is still under
development
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is
demonstrated to be one small step for generative AI (GAI), but one giant leap
for artificial general intelligence (AGI). Since its official release in
November 2022, ChatGPT has quickly attracted numerous users with extensive
media coverage. Such unprecedented attention has also motivated numerous
researchers to investigate ChatGPT from various aspects. According to Google
scholar, there are more than 500 articles with ChatGPT in their titles or
mentioning it in their abstracts. Considering this, a review is urgently
needed, and our work fills this gap. Overall, this work is the first to survey
ChatGPT with a comprehensive review of its underlying technology, applications,
and challenges. Moreover, we present an outlook on how ChatGPT might evolve to
realize general-purpose AIGC (a.k.a. AI-generated content), which will be a
significant milestone for the development of AGI.Comment: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated
([email protected]
Kurcuma: a kitchen utensil recognition collection for unsupervised domain adaptation
The use of deep learning makes it possible to achieve extraordinary results in all kinds of tasks related to computer vision. However, this performance is strongly related to the availability of training data and its relationship with the distribution in the eventual application scenario. This question is of vital importance in areas such as robotics, where the targeted environment data are barely available in advance. In this context, domain adaptation (DA) techniques are especially important to building models that deal with new data for which the corresponding label is not available. To promote further research in DA techniques applied to robotics, this work presents Kurcuma (Kitchen Utensil Recognition Collection for Unsupervised doMain Adaptation), an assortment of seven datasets for the classification of kitchen utensils—a task of relevance in home-assistance robotics and a suitable showcase for DA. Along with the data, we provide a broad description of the main characteristics of the dataset, as well as a baseline using the well-known domain-adversarial training of neural networks approach. The results show the challenge posed by DA on these types of tasks, pointing to the need for new approaches in future work.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the I+D+i project TED2021-132103A-I00 (DOREMI), funded by MCIN/AEI/10.13039/501100011033. Some of the computing resources were provided by the Generalitat Valenciana and the European Union through the FEDER funding program (IDIFEDER/2020/003). The second author is supported by grant APOSTD/2020/256 from “Programa I+D+i de la Generalitat Valenciana”
- …