7,404 research outputs found
A Political Theory of Engineered Systems and A Study of Engineering and Justice Workshops
Since there are good reasons to think that some engineered systems are socially undesirable—for example, internal combustion engines that cause climate change, algorithms that are racist, and nuclear weapons that can destroy all life—there is a well-established literature that attempts to identify best practices for designing and regulating engineered systems in order to prevent harm and promote justice. Most of this literature, especially the design theory and engineering justice literature meant to help guide engineers, focuses on environmental, physical, social, and mental harms such as ecosystem and bodily poisoning, racial and gender discrimination, and urban alienation. However, the literature that focuses on how engineered systems can produce political harms—harms to how we shape the way we live in community together—is not well established. The first part of this thesis contributes to identifying how particular types of engineered systems can harm a democratic politics. Building on democratic theory, philosophy of collective harms, and design theory, it argues that engineered systems that extend in space and time beyond a certain threshold subvert the knowledge and empowerment necessary for a democratic politics. For example, the systems of global shipping and the internet that fundamentally shape our lives are so large that people cannot attain the knowledge necessary to regulate them well nor the empowerment necessary to shape them.
The second part of this thesis is an empirical study of a workshop designed to encourage engineering undergraduates to understand how engineered systems can subvert a democratic politics, with the ultimate goal of supporting students in incorporating that understanding into their work. 32 Dartmouth undergraduate engineering students participated in the study. Half were assigned to participate in a workshop group, half to a control group. The workshop group participants took a pretest; then participated in a 3-hour, semi-structured workshop with 4 participants per session (as well as a discussion leader and note-taker) over lunch or dinner; and then took a posttest. The control group participants took the same pre- and post- tests, but had no suggested activity in the intervening 3 hours. We find that the students who participated in workshops had a statistically significant test-score improvement as compared to the control group (Brunner-Munzel test, p \u3c .001). Using thematic analysis methods, we show the data is consistent with the hypothesis that workshops produced a score improvement because of certain structure (small size, long duration, discussion-based, over homemade food) and content (theoretically rich, challenging). Thematic analysis also reveals workshop failures and areas for improvement (too much content for the duration, not well enough organized).
The thesis concludes with a discussion of limitations and suggestions for future theoretical, empirical, and pedagogical research
Co-segmentation assisted cross-modality person re-identification
We present a deep learning-based method for Visible-Infrared person Re-Identification (VI-ReID). The major contribution lies in the incorporation of co-segmentation into a multi-task learning framework for VI-ReID, where the co-segmentation concept aids in making the distributions of RGB images and IR images the same for the same identity but diverse for different identities. Accordingly, a novel multi-task learning based model, i.e., co-segmentation assisted VI-ReID (CSVI), is proposed in this paper. Specifically, the co-segmentation network first takes as the inputs the modality-shared features extracted from a set of RGB and IR images by using the VI-ReID model. Then, it exploits their semantic similarities for predicting the person masks of the common identities within the input RGB and IR images by using a cross-modality center based weight generation module and a segmentation decoder. Doing so enables the VI-ReID model to extract more additional modality-shared shape features for boosting performance. Meanwhile, the co-segmentation network implicitly establishes the interactions among the set of RGB and IR images, thus further bridging the large modality discrepancies. Our model’s effectiveness and superiority are verified through experimental comparisons with state-of-the-art algorithms on several benchmark datasets
Persuasionsstile in Europa II
Persuasionsstile in Europa ist der Name eines seit 2011 laufenden Projekts, an dem eine Gruppe von Forschern unterschiedlicher Disziplinen (Germanisten, Romanisten, Anglisten, Medienforscher, Computerlinguisten) aus acht europäischen Ländern arbeitet. Im Mittelpunkt steht die Analyse von Zeitungskommentaren
Backpropagation Beyond the Gradient
Automatic differentiation is a key enabler of deep learning: previously, practitioners were limited to models
for which they could manually compute derivatives. Now, they can create sophisticated models with almost
no restrictions and train them using first-order, i. e. gradient, information. Popular libraries like PyTorch
and TensorFlow compute this gradient efficiently, automatically, and conveniently with a single line of
code. Under the hood, reverse-mode automatic differentiation, or gradient backpropagation, powers the
gradient computation in these libraries. Their entire design centers around gradient backpropagation.
These frameworks are specialized around one specific task—computing the average gradient in a mini-batch.
This specialization often complicates the extraction of other information like higher-order statistical moments
of the gradient, or higher-order derivatives like the Hessian. It limits practitioners and researchers to methods
that rely on the gradient. Arguably, this hampers the field from exploring the potential of higher-order
information and there is evidence that focusing solely on the gradient has not lead to significant recent
advances in deep learning optimization.
To advance algorithmic research and inspire novel ideas, information beyond the batch-averaged gradient
must be made available at the same level of computational efficiency, automation, and convenience.
This thesis presents approaches to simplify experimentation with rich information beyond the gradient
by making it more readily accessible. We present an implementation of these ideas as an extension to the
backpropagation procedure in PyTorch. Using this newly accessible information, we demonstrate possible use
cases by (i) showing how it can inform our understanding of neural network training by building a diagnostic
tool, and (ii) enabling novel methods to efficiently compute and approximate curvature information.
First, we extend gradient backpropagation for sequential feedforward models to Hessian backpropagation
which enables computing approximate per-layer curvature. This perspective unifies recently proposed block-
diagonal curvature approximations. Like gradient backpropagation, the computation of these second-order
derivatives is modular, and therefore simple to automate and extend to new operations.
Based on the insight that rich information beyond the gradient can be computed efficiently and at the
same time, we extend the backpropagation in PyTorch with the BackPACK library. It provides efficient and
convenient access to statistical moments of the gradient and approximate curvature information, often at a
small overhead compared to computing just the gradient.
Next, we showcase the utility of such information to better understand neural network training. We build
the Cockpit library that visualizes what is happening inside the model during training through various
instruments that rely on BackPACK’s statistics. We show how Cockpit provides a meaningful statistical
summary report to the deep learning engineer to identify bugs in their machine learning pipeline, guide
hyperparameter tuning, and study deep learning phenomena.
Finally, we use BackPACK’s extended automatic differentiation functionality to develop ViViT, an approach
to efficiently compute curvature information, in particular curvature noise. It uses the low-rank structure
of the generalized Gauss-Newton approximation to the Hessian and addresses shortcomings in existing
curvature approximations. Through monitoring curvature noise, we demonstrate how ViViT’s information
helps in understanding challenges to make second-order optimization methods work in practice.
This work develops new tools to experiment more easily with higher-order information in complex deep
learning models. These tools have impacted works on Bayesian applications with Laplace approximations,
out-of-distribution generalization, differential privacy, and the design of automatic differentia-
tion systems. They constitute one important step towards developing and establishing more efficient deep
learning algorithms
Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations
The local explanation provides heatmaps on images to explain how
Convolutional Neural Networks (CNNs) derive their output. Due to its visual
straightforwardness, the method has been one of the most popular explainable AI
(XAI) methods for diagnosing CNNs. Through our formative study (S1), however,
we captured ML engineers' ambivalent perspective about the local explanation as
a valuable and indispensable envision in building CNNs versus the process that
exhausts them due to the heuristic nature of detecting vulnerability. Moreover,
steering the CNNs based on the vulnerability learned from the diagnosis seemed
highly challenging. To mitigate the gap, we designed DeepFuse, the first
interactive design that realizes the direct feedback loop between a user and
CNNs in diagnosing and revising CNN's vulnerability using local explanations.
DeepFuse helps CNN engineers to systemically search "unreasonable" local
explanations and annotate the new boundaries for those identified as
unreasonable in a labor-efficient manner. Next, it steers the model based on
the given annotation such that the model doesn't introduce similar mistakes. We
conducted a two-day study (S2) with 12 experienced CNN engineers. Using
DeepFuse, participants made a more accurate and "reasonable" model than the
current state-of-the-art. Also, participants found the way DeepFuse guides
case-based reasoning can practically improve their current practice. We provide
implications for design that explain how future HCI-driven design can move our
practice forward to make XAI-driven insights more actionable.Comment: 32 pages, 6 figures, 5 tables. Accepted for publication in the
Proceedings of the ACM on Human-Computer Interaction (PACM HCI), CSCW 202
FedForgery: Generalized Face Forgery Detection with Residual Federated Learning
With the continuous development of deep learning in the field of image
generation models, a large number of vivid forged faces have been generated and
spread on the Internet. These high-authenticity artifacts could grow into a
threat to society security. Existing face forgery detection methods directly
utilize the obtained public shared or centralized data for training but ignore
the personal privacy and security issues when personal data couldn't be
centralizedly shared in real-world scenarios. Additionally, different
distributions caused by diverse artifact types would further bring adverse
influences on the forgery detection task. To solve the mentioned problems, the
paper proposes a novel generalized residual Federated learning for face Forgery
detection (FedForgery). The designed variational autoencoder aims to learn
robust discriminative residual feature maps to detect forgery faces (with
diverse or even unknown artifact types). Furthermore, the general federated
learning strategy is introduced to construct distributed detection model
trained collaboratively with multiple local decentralized devices, which could
further boost the representation generalization. Experiments conducted on
publicly available face forgery detection datasets prove the superior
performance of the proposed FedForgery. The designed novel generalized face
forgery detection protocols and source code would be publicly available.Comment: The code is available at https://github.com/GANG370/FedForgery. The
paper has been accepted in the IEEE Transactions on Information Forensics &
Securit
Local 3D Editing via 3D Distillation of CLIP Knowledge
3D content manipulation is an important computer vision task with many
real-world applications (e.g., product design, cartoon generation, and 3D
Avatar editing). Recently proposed 3D GANs can generate diverse photorealistic
3D-aware contents using Neural Radiance fields (NeRF). However, manipulation of
NeRF still remains a challenging problem since the visual quality tends to
degrade after manipulation and suboptimal control handles such as 2D semantic
maps are used for manipulations. While text-guided manipulations have shown
potential in 3D editing, such approaches often lack locality. To overcome these
problems, we propose Local Editing NeRF (LENeRF), which only requires text
inputs for fine-grained and localized manipulation. Specifically, we present
three add-on modules of LENeRF, the Latent Residual Mapper, the Attention Field
Network, and the Deformation Network, which are jointly used for local
manipulations of 3D features by estimating a 3D attention field. The 3D
attention field is learned in an unsupervised way, by distilling the zero-shot
mask generation capability of CLIP to the 3D space with multi-view guidance. We
conduct diverse experiments and thorough evaluations both quantitatively and
qualitatively.Comment: conference: CVPR 202
Knowledge Distillation and Continual Learning for Optimized Deep Neural Networks
Over the past few years, deep learning (DL) has been achieving state-of-theart performance on various human tasks such as speech generation, language translation, image segmentation, and object detection. While traditional machine learning models require hand-crafted features, deep learning algorithms can automatically extract discriminative features and learn complex knowledge from large datasets. This powerful learning ability makes deep learning models attractive to both academia and big corporations.
Despite their popularity, deep learning methods still have two main limitations: large memory consumption and catastrophic knowledge forgetting. First, DL algorithms use very deep neural networks (DNNs) with many billion parameters, which have a big model size and a slow inference speed. This restricts the application of DNNs in resource-constraint devices such as mobile phones and autonomous vehicles. Second, DNNs are known to suffer from catastrophic forgetting. When incrementally learning new tasks, the model performance on old tasks significantly drops. The ability to accommodate new knowledge while retaining previously learned knowledge is called continual learning. Since the realworld environments in which the model operates are always evolving, a robust neural network needs to have this continual learning ability for adapting to new changes
Seamless Multimodal Biometrics for Continuous Personalised Wellbeing Monitoring
Artificially intelligent perception is increasingly present in the lives of
every one of us. Vehicles are no exception, (...) In the near future, pattern
recognition will have an even stronger role in vehicles, as self-driving cars
will require automated ways to understand what is happening around (and within)
them and act accordingly. (...) This doctoral work focused on advancing
in-vehicle sensing through the research of novel computer vision and pattern
recognition methodologies for both biometrics and wellbeing monitoring. The
main focus has been on electrocardiogram (ECG) biometrics, a trait well-known
for its potential for seamless driver monitoring. Major efforts were devoted to
achieving improved performance in identification and identity verification in
off-the-person scenarios, well-known for increased noise and variability. Here,
end-to-end deep learning ECG biometric solutions were proposed and important
topics were addressed such as cross-database and long-term performance,
waveform relevance through explainability, and interlead conversion. Face
biometrics, a natural complement to the ECG in seamless unconstrained
scenarios, was also studied in this work. The open challenges of masked face
recognition and interpretability in biometrics were tackled in an effort to
evolve towards algorithms that are more transparent, trustworthy, and robust to
significant occlusions. Within the topic of wellbeing monitoring, improved
solutions to multimodal emotion recognition in groups of people and
activity/violence recognition in in-vehicle scenarios were proposed. At last,
we also proposed a novel way to learn template security within end-to-end
models, dismissing additional separate encryption processes, and a
self-supervised learning approach tailored to sequential data, in order to
ensure data security and optimal performance. (...)Comment: Doctoral thesis presented and approved on the 21st of December 2022
to the University of Port
- …