15,031 research outputs found
Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification
Generalizable person re-identification (Re-ID) is a very hot research topic
in machine learning and computer vision, which plays a significant role in
realistic scenarios due to its various applications in public security and
video surveillance. However, previous methods mainly focus on the visual
representation learning, while neglect to explore the potential of semantic
features during training, which easily leads to poor generalization capability
when adapted to the new domain. In this paper, we propose a Multi-Modal
Equivalent Transformer called MMET for more robust visual-semantic embedding
learning on visual, textual and visual-textual tasks respectively. To further
enhance the robust feature learning in the context of transformer, a dynamic
masking mechanism called Masked Multimodal Modeling strategy (MMM) is
introduced to mask both the image patches and the text tokens, which can
jointly works on multimodal or unimodal data and significantly boost the
performance of generalizable person Re-ID. Extensive experiments on benchmark
datasets demonstrate the competitive performance of our method over previous
approaches. We hope this method could advance the research towards
visual-semantic representation learning. Our source code is also publicly
available at https://github.com/JeremyXSC/MMET
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
Ambiguous Medical Image Segmentation using Diffusion Models
Collective insights from a group of experts have always proven to outperform
an individual's best diagnostic for clinical tasks. For the task of medical
image segmentation, existing research on AI-based alternatives focuses more on
developing models that can imitate the best individual rather than harnessing
the power of expert groups. In this paper, we introduce a single diffusion
model-based approach that produces multiple plausible outputs by learning a
distribution over group insights. Our proposed model generates a distribution
of segmentation masks by leveraging the inherent stochastic sampling process of
diffusion using only minimal additional learning. We demonstrate on three
different medical image modalities- CT, ultrasound, and MRI that our model is
capable of producing several possible variants while capturing the frequencies
of their occurrences. Comprehensive results show that our proposed approach
outperforms existing state-of-the-art ambiguous segmentation networks in terms
of accuracy while preserving naturally occurring variation. We also propose a
new metric to evaluate the diversity as well as the accuracy of segmentation
predictions that aligns with the interest of clinical practice of collective
insights
Bounding Box Annotation with Visible Status
Training deep-learning-based vision systems requires the manual annotation of
a significant amount of data to optimize several parameters of the deep
convolutional neural networks. Such manual annotation is highly time-consuming
and labor-intensive. To reduce this burden, a previous study presented a fully
automated annotation approach that does not require any manual intervention.
The proposed method associates a visual marker with an object and captures it
in the same image. However, because the previous method relied on moving the
object within the capturing range using a fixed-point camera, the collected
image dataset was limited in terms of capturing viewpoints. To overcome this
limitation, this study presents a mobile application-based free-viewpoint
image-capturing method. With the proposed application, users can collect
multi-view image datasets automatically that are annotated with bounding boxes
by moving the camera. However, capturing images through human involvement is
laborious and monotonous. Therefore, we propose gamified application features
to track the progress of the collection status. Our experiments demonstrated
that using the gamified mobile application for bounding box annotation, with
visible collection progress status, can motivate users to collect multi-view
object image datasets with less mental workload and time pressure in an
enjoyable manner, leading to increased engagement.Comment: 10 pages, 16 figure
In-situ crack and keyhole pore detection in laser directed energy deposition through acoustic signal and deep learning
Cracks and keyhole pores are detrimental defects in alloys produced by laser
directed energy deposition (LDED). Laser-material interaction sound may hold
information about underlying complex physical events such as crack propagation
and pores formation. However, due to the noisy environment and intricate signal
content, acoustic-based monitoring in LDED has received little attention. This
paper proposes a novel acoustic-based in-situ defect detection strategy in
LDED. The key contribution of this study is to develop an in-situ acoustic
signal denoising, feature extraction, and sound classification pipeline that
incorporates convolutional neural networks (CNN) for online defect prediction.
Microscope images are used to identify locations of the cracks and keyhole
pores within a part. The defect locations are spatiotemporally registered with
acoustic signal. Various acoustic features corresponding to defect-free
regions, cracks, and keyhole pores are extracted and analysed in time-domain,
frequency-domain, and time-frequency representations. The CNN model is trained
to predict defect occurrences using the Mel-Frequency Cepstral Coefficients
(MFCCs) of the lasermaterial interaction sound. The CNN model is compared to
various classic machine learning models trained on the denoised acoustic
dataset and raw acoustic dataset. The validation results shows that the CNN
model trained on the denoised dataset outperforms others with the highest
overall accuracy (89%), keyhole pore prediction accuracy (93%), and AUC-ROC
score (98%). Furthermore, the trained CNN model can be deployed into an
in-house developed software platform for online quality monitoring. The
proposed strategy is the first study to use acoustic signals with deep learning
for insitu defect detection in LDED process.Comment: 36 Pages, 16 Figures, accepted at journal Additive Manufacturin
CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities
Sleep abnormalities can have severe health consequences. Automated sleep
staging, i.e. labelling the sequence of sleep stages from the patient's
physiological recordings, could simplify the diagnostic process. Previous work
on automated sleep staging has achieved great results, mainly relying on the
EEG signal. However, often multiple sources of information are available beyond
EEG. This can be particularly beneficial when the EEG recordings are noisy or
even missing completely. In this paper, we propose CoRe-Sleep, a Coordinated
Representation multimodal fusion network that is particularly focused on
improving the robustness of signal analysis on imperfect data. We demonstrate
how appropriately handling multimodal information can be the key to achieving
such robustness. CoRe-Sleep tolerates noisy or missing modalities segments,
allowing training on incomplete data. Additionally, it shows state-of-the-art
performance when testing on both multimodal and unimodal data using a single
model on SHHS-1, the largest publicly available study that includes sleep stage
labels. The results indicate that training the model on multimodal data does
positively influence performance when tested on unimodal data. This work aims
at bridging the gap between automated analysis tools and their clinical
utility.Comment: 10 pages, 4 figures, 2 tables, journa
Passive Radio Frequency-based 3D Indoor Positioning System via Ensemble Learning
Passive radio frequency (PRF)-based indoor positioning systems (IPS) have
attracted researchers' attention due to their low price, easy and customizable
configuration, and non-invasive design. This paper proposes a PRF-based
three-dimensional (3D) indoor positioning system (PIPS), which is able to use
signals of opportunity (SoOP) for positioning and also capture a scenario
signature. PIPS passively monitors SoOPs containing scenario signatures through
a single receiver. Moreover, PIPS leverages the Dynamic Data Driven
Applications System (DDDAS) framework to devise and customize the sampling
frequency, enabling the system to use the most impacted frequency band as the
rated frequency band. Various regression methods within three ensemble learning
strategies are used to train and predict the receiver position. The PRF
spectrum of 60 positions is collected in the experimental scenario, and three
criteria are applied to evaluate the performance of PIPS. Experimental results
show that the proposed PIPS possesses the advantages of high accuracy,
configurability, and robustness.Comment: DDDAS 202
Technical Dimensions of Programming Systems
Programming requires much more than just writing code in a programming language. It is usually done in the context of a stateful environment, by interacting with a system through a graphical user interface. Yet, this wide space of possibilities lacks a common structure for navigation. Work on programming systems fails to form a coherent body of research, making it hard to improve on past work and advance the state of the art.
In computer science, much has been said and done to allow comparison of programming languages, yet no similar theory exists for programming systems; we believe that programming systems deserve a theory too.
We present a framework of technical dimensions which capture the underlying characteristics of programming systems and provide a means for conceptualizing and comparing them.
We identify technical dimensions by examining past influential programming systems and reviewing their design principles, technical capabilities, and styles of user interaction. Technical dimensions capture characteristics that may be studied, compared and advanced independently. This makes it possible to talk about programming systems in a way that can be shared and constructively debated rather than relying solely on personal impressions.
Our framework is derived using a qualitative analysis of past programming systems. We outline two concrete ways of using our framework. First, we show how it can analyze a recently developed novel programming system. Then, we use it to identify an interesting unexplored point in the design space of programming systems.
Much research effort focuses on building programming systems that are easier to use, accessible to non-experts, moldable and/or powerful, but such efforts are disconnected. They are informal, guided by the personal vision of their authors and thus are only evaluable and comparable on the basis of individual experience using them. By providing foundations for more systematic research, we can help programming systems researchers to stand, at last, on the shoulders of giants
Personalführung im heterogenen Kollegium von Grund- und Mittelschulen in Bayern
Personalführung bei zunehmend heterogenen Lehrkräften wird zusehends wichtiger für Schulleitung. Eine theoretische und praktische Neuorientierung in diesem bisher übersehenen Feld erscheint notwendig. Dazu werden in der Schulleitungsforschung Erkenntnisse zur differenzierenden Personalführung gesucht. In der Betriebswirtschaftslehre sowie in der Praxis von Unternehmen werden Neuerungen im Umgang mit größerer Diversität bei den Mitarbeitern analysiert. Erfahrungen aus der Praxis von Schulleitungen werden mithilfe offener Interviews ausgewertet. Nach kritischer Auswahl der Übertragungsmöglichkeiten wird eine Integration zu einem neuen Ansatz von differenzierender Personalführung versucht. Konkrete Vorschläge zur Reform der Ausbildung der Schulleitungen und zur Umsetzung in der Praxis zeigen mögliche Wege der nötigen Verbesserung auf
- …