12,726 research outputs found
EasyPortrait - Face Parsing and Portrait Segmentation Dataset
Recently, due to COVID-19 and the growing demand for remote work, video
conferencing apps have become especially widespread. The most valuable features
of video chats are real-time background removal and face beautification. While
solving these tasks, computer vision researchers face the problem of having
relevant data for the training stage. There is no large dataset with
high-quality labeled and diverse images of people in front of a laptop or
smartphone camera to train a lightweight model without additional approaches.
To boost the progress in this area, we provide a new image dataset,
EasyPortrait, for portrait segmentation and face parsing tasks. It contains
20,000 primarily indoor photos of 8,377 unique users, and fine-grained
segmentation masks separated into 9 classes. Images are collected and labeled
from crowdsourcing platforms. Unlike most face parsing datasets, in
EasyPortrait, the beard is not considered part of the skin mask, and the inside
area of the mouth is separated from the teeth. These features allow using
EasyPortrait for skin enhancement and teeth whitening tasks. This paper
describes the pipeline for creating a large-scale and clean image segmentation
dataset using crowdsourcing platforms without additional synthetic data.
Moreover, we trained several models on EasyPortrait and showed experimental
results. Proposed dataset and trained models are publicly available.Comment: portrait segmentation, face parsing, image segmentation datase
An end-to-end, interactive Deep Learning based Annotation system for cursive and print English handwritten text
With the surging inclination towards carrying out tasks on computational
devices and digital mediums, any method that converts a task that was
previously carried out manually, to a digitized version, is always welcome.
Irrespective of the various documentation tasks that can be done online today,
there are still many applications and domains where handwritten text is
inevitable, which makes the digitization of handwritten documents a very
essential task. Over the past decades, there has been extensive research on
offline handwritten text recognition. In the recent past, most of these
attempts have shifted to Machine learning and Deep learning based approaches.
In order to design more complex and deeper networks, and ensure stellar
performances, it is essential to have larger quantities of annotated data. Most
of the databases present for offline handwritten text recognition today, have
either been manually annotated or semi automatically annotated with a lot of
manual involvement. These processes are very time consuming and prone to human
errors. To tackle this problem, we present an innovative, complete end-to-end
pipeline, that annotates offline handwritten manuscripts written in both print
and cursive English, using Deep Learning and User Interaction techniques. This
novel method, which involves an architectural combination of a detection system
built upon a state-of-the-art text detection model, and a custom made Deep
Learning model for the recognition system, is combined with an easy-to-use
interactive interface, aiming to improve the accuracy of the detection,
segmentation, serialization and recognition phases, in order to ensure high
quality annotated data with minimal human interaction.Comment: 17 pages, 8 figures, 2 table
Point-supervised Single-cell Segmentation via Collaborative Knowledge Sharing
Despite their superior performance, deep-learning methods often suffer from
the disadvantage of needing large-scale well-annotated training data. In
response, recent literature has seen a proliferation of efforts aimed at
reducing the annotation burden. This paper focuses on a weakly-supervised
training setting for single-cell segmentation models, where the only available
training label is the rough locations of individual cells. The specific problem
is of practical interest due to the widely available nuclei counter-stain data
in biomedical literature, from which the cell locations can be derived
programmatically. Of more general interest is a proposed self-learning method
called collaborative knowledge sharing, which is related to but distinct from
the more well-known consistency learning methods. This strategy achieves
self-learning by sharing knowledge between a principal model and a very
light-weight collaborator model. Importantly, the two models are entirely
different in their architectures, capacities, and model outputs: In our case,
the principal model approaches the segmentation problem from an
object-detection perspective, whereas the collaborator model a sematic
segmentation perspective. We assessed the effectiveness of this strategy by
conducting experiments on LIVECell, a large single-cell segmentation dataset of
bright-field images, and on A431 dataset, a fluorescence image dataset in which
the location labels are generated automatically from nuclei counter-stain data.
Implementing code is available at https://github.com/jiyuuchc/lacss_ja
Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles
This thesis presents the utilization of multiple modalities, such as image and lidar, to incorporate spatio-temporal information from sequence data into deep learning architectures for 3Dobject detection in instrumented vehicles. The race to autonomy in instrumented vehicles or self-driving cars has stimulated significant research in developing autonomous driver assistance systems (ADAS) technologies related explicitly to perception systems. Object detection plays a crucial role in perception systems by providing spatial information to its subsequent modules; hence, accurate detection is a significant task supporting autonomous driving. The advent of deep learning in computer vision applications and the availability of multiple sensing modalities such as 360° imaging, lidar, and radar have led to state-of-the-art 2D and 3Dobject detection architectures. Most current state-of-the-art 3D object detection frameworks consider single-frame reference. However, these methods do not utilize temporal information associated with the objects or scenes from the sequence data. Thus, the present research hypothesizes that multimodal temporal information can contribute to bridging the gap between 2D and 3D metric space by improving the accuracy of deep learning frameworks for 3D object estimations. The thesis presents understanding multimodal data representations and selecting hyper-parameters using public datasets such as KITTI and nuScenes with Frustum-ConvNet as a baseline architecture. Secondly, an attention mechanism was employed along with convolutional-LSTM to extract spatial-temporal information from sequence data to improve 3D estimations and to aid the architecture in focusing on salient lidar point cloud features. Finally, various fusion strategies are applied to fuse the modalities and temporal information into the architecture to assess its efficacy on performance and computational complexity. Overall, this thesis has established the importance and utility of multimodal systems for refined 3D object detection and proposed a complex pipeline incorporating spatial, temporal and attention mechanisms to improve specific, and general class accuracy demonstrated on key autonomous driving data sets
Security and Privacy Problems in Voice Assistant Applications: A Survey
Voice assistant applications have become omniscient nowadays. Two models that
provide the two most important functions for real-life applications (i.e.,
Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR)
models and Speaker Identification (SI) models. According to recent studies,
security and privacy threats have also emerged with the rapid development of
the Internet of Things (IoT). The security issues researched include attack
techniques toward machine learning models and other hardware components widely
used in voice assistant applications. The privacy issues include technical-wise
information stealing and policy-wise privacy breaches. The voice assistant
application takes a steadily growing market share every year, but their privacy
and security issues never stopped causing huge economic losses and endangering
users' personal sensitive information. Thus, it is important to have a
comprehensive survey to outline the categorization of the current research
regarding the security and privacy problems of voice assistant applications.
This paper concludes and assesses five kinds of security attacks and three
types of privacy threats in the papers published in the top-tier conferences of
cyber security and voice domain.Comment: 5 figure
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is
demonstrated to be one small step for generative AI (GAI), but one giant leap
for artificial general intelligence (AGI). Since its official release in
November 2022, ChatGPT has quickly attracted numerous users with extensive
media coverage. Such unprecedented attention has also motivated numerous
researchers to investigate ChatGPT from various aspects. According to Google
scholar, there are more than 500 articles with ChatGPT in their titles or
mentioning it in their abstracts. Considering this, a review is urgently
needed, and our work fills this gap. Overall, this work is the first to survey
ChatGPT with a comprehensive review of its underlying technology, applications,
and challenges. Moreover, we present an outlook on how ChatGPT might evolve to
realize general-purpose AIGC (a.k.a. AI-generated content), which will be a
significant milestone for the development of AGI.Comment: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated
([email protected]
Annual report of the officers of the town of Jackson, New Hampshire for the fiscal year ending December 31, 2022.
This is an annual report containing vital statistics for a town/city in the state of New Hampshire
Bounding Box Annotation with Visible Status
Training deep-learning-based vision systems requires the manual annotation of
a significant amount of data to optimize several parameters of the deep
convolutional neural networks. Such manual annotation is highly time-consuming
and labor-intensive. To reduce this burden, a previous study presented a fully
automated annotation approach that does not require any manual intervention.
The proposed method associates a visual marker with an object and captures it
in the same image. However, because the previous method relied on moving the
object within the capturing range using a fixed-point camera, the collected
image dataset was limited in terms of capturing viewpoints. To overcome this
limitation, this study presents a mobile application-based free-viewpoint
image-capturing method. With the proposed application, users can collect
multi-view image datasets automatically that are annotated with bounding boxes
by moving the camera. However, capturing images through human involvement is
laborious and monotonous. Therefore, we propose gamified application features
to track the progress of the collection status. Our experiments demonstrated
that using the gamified mobile application for bounding box annotation, with
visible collection progress status, can motivate users to collect multi-view
object image datasets with less mental workload and time pressure in an
enjoyable manner, leading to increased engagement.Comment: 10 pages, 16 figure
The Psychology of Trust from Relational Messages
A fundamental underpinning of all social relationships is trust. Trust can be established through implicit forms of communication called relational messages. A multidisciplinary, multi-university, cross-cultural investigation addressed how these message themes are expressed and whether they are moderated by culture and veracity. A multi-round decision-making game with 695 international participants assessed the nonverbal and verbal behaviors that express such meanings as affection, dominance, and composure, from which people ultimately determine who can be trusted and who not. Analysis of subjective judgments showed that trust was most predicted by dominance, then affection, and lastly, composure. Behaviorally, several nonverbal and verbal behaviors associated with these message themes were combined to predict trust. Results were similar across cultures but moderated by veracity. Methodologically, automated software extracted facial features, vocal features, and linguistic metrics associated with these message themes. A new attentional computer vision method retrospectively identified specific meaningful segments where relational messages were expressed. The new software tools and attentional model hold promise for identifying nuanced, implicit meanings that together predict trust and that can, in combination, serve as proxies for trust
- …