4,259 research outputs found
Inefficiency of K-FAC for Large Batch Size Training
In stochastic optimization, using large batch sizes during training can
leverage parallel resources to produce faster wall-clock training times per
training epoch. However, for both training loss and testing error, recent
results analyzing large batch Stochastic Gradient Descent (SGD) have found
sharp diminishing returns, beyond a certain critical batch size. In the hopes
of addressing this, it has been suggested that the Kronecker-Factored
Approximate Curvature (\mbox{K-FAC}) method allows for greater scalability to
large batch sizes, for non-convex machine learning problems such as neural
network optimization, as well as greater robustness to variation in model
hyperparameters. Here, we perform a detailed empirical analysis of large batch
size training %of these two hypotheses, for both \mbox{K-FAC} and SGD,
evaluating performance in terms of both wall-clock time and aggregate
computational cost. Our main results are twofold: first, we find that both
\mbox{K-FAC} and SGD doesn't have ideal scalability behavior beyond a certain
batch size, and that \mbox{K-FAC} does not exhibit improved large-batch
scalability behavior, as compared to SGD; and second, we find that
\mbox{K-FAC}, in addition to requiring more hyperparameters to tune, suffers
from similar hyperparameter sensitivity behavior as does SGD. We discuss
extensive results using ResNet and AlexNet on \mbox{CIFAR-10} and SVHN,
respectively, as well as more general implications of our findings
FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances
Visual effects commonly requires both the creation of realistic synthetic
humans as well as retargeting actors' performances to humanoid characters such
as aliens and monsters. Achieving the expressive performances demanded in
entertainment requires manipulating complex models with hundreds of parameters.
Full creative control requires the freedom to make edits at any stage of the
production, which prohibits the use of a fully automatic ``black box'' solution
with uninterpretable parameters. On the other hand, producing realistic
animation with these sophisticated models is difficult and laborious. This
paper describes FDLS (Facial Deep Learning Solver), which is Weta Digital's
solution to these challenges. FDLS adopts a coarse-to-fine and
human-in-the-loop strategy, allowing a solved performance to be verified and
edited at several stages in the solving process. To train FDLS, we first
transform the raw motion-captured data into robust graph features. Secondly,
based on the observation that the artists typically finalize the jaw pass
animation before proceeding to finer detail, we solve for the jaw motion first
and predict fine expressions with region-based networks conditioned on the jaw
position. Finally, artists can optionally invoke a non-linear finetuning
process on top of the FDLS solution to follow the motion-captured virtual
markers as closely as possible. FDLS supports editing if needed to improve the
results of the deep learning solution and it can handle small daily changes in
the actor's face shape. FDLS permits reliable and production-quality
performance solving with minimal training and little or no manual effort in
many cases, while also allowing the solve to be guided and edited in unusual
and difficult cases. The system has been under development for several years
and has been used in major movies.Comment: DigiPro '22: The Digital Production Symposiu
Directed Diffusion: Direct Control of Object Placement through Attention Guidance
Text-guided diffusion models such as DALLE-2, Imagen, and Stable Diffusion
are able to generate an effectively endless variety of images given only a
short text prompt describing the desired image content. In many cases the
images are of very high quality. However, these models often struggle to
compose scenes containing several key objects such as characters in specified
positional relationships. The missing capability to "direct" the placement of
characters and objects both within and across images is crucial in
storytelling, as recognized in the literature on film and animation theory. In
this work, we take a particularly straightforward approach to providing the
needed direction. Drawing on the observation that the cross-attention maps for
prompt words reflect the spatial layout of objects denoted by those words, we
introduce an optimization objective that produces ``activation'' at desired
positions in these cross-attention maps. The resulting approach is a step
toward generalizing the applicability of text-guided diffusion models beyond
single images to collections of related images, as in storybooks. To the best
of our knowledge, our Directed Diffusion method is the first diffusion
technique that provides positional control over multiple objects, while making
use of an existing pre-trained model and maintaining a coherent blend between
the positioned objects and the background. Moreover, it requires only a few
lines to implement.Comment: Our project page:
https://hohonu-vicml.github.io/DirectedDiffusion.Pag
Derrick's theorem beyond a potential
Scalar field theories with derivative interactions are known to possess
solitonic excitations, but such solitons are generally unsatisfactory because
the effective theory fails precisely where nonlinearities responsible for the
solitons are important. A new class of theories possessing (internal) galilean
invariance can in principle bypass this difficulty. Here, we show that these
galileon theories do not possess stable solitonic solutions. As a by-product,
we show that no stable solitons exist for a different class of derivatively
coupled theories, describing for instance the infrared dynamics of superfluids,
fluids, solids and some k-essence models.Comment: 4 page
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Transformer based architectures have become de-facto models used for a range
of Natural Language Processing tasks. In particular, the BERT based models
achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However,
BERT based models have a prohibitive memory footprint and latency. As a result,
deploying BERT based models in resource constrained environments has become a
challenging task. In this work, we perform an extensive analysis of fine-tuned
BERT models using second order Hessian information, and we use our results to
propose a novel method for quantizing BERT models to ultra low precision. In
particular, we propose a new group-wise quantization scheme, and we use a
Hessian based mix-precision method to compress the model further. We
extensively test our proposed method on BERT downstream tasks of SST-2, MNLI,
CoNLL-03, and SQuAD. We can achieve comparable performance to baseline with at
most performance degradation, even with ultra-low precision
quantization down to 2 bits, corresponding up to compression of the
model parameters, and up to compression of the embedding table as
well as activations. Among all tasks, we observed the highest performance loss
for BERT fine-tuned on SQuAD. By probing into the Hessian based analysis as
well as visualization, we show that this is related to the fact that current
training/fine-tuning strategy of BERT does not converge for SQuAD
Resident Immune Cells of the Liver in the Tumor Microenvironment
The liver is a central immunomodulator that ensures a homeostatic balance between protection and immunotolerance. A hallmark of hepatocellular carcinoma (HCC) is the deregulation of this tightly controlled immunological network. Immune response in the liver involves a complex interplay between resident innate, innate, and adaptive immune cells. The immune response in the liver is modulated by its continuous exposure to toxic molecules and microorganisms that requires a degree of immune tolerance to protect normal tissue from damage. In HCC pathogenesis, immune cells must balance a dual role that includes the elimination of malignant cells, as well as the repair of damaged liver tissue to maintain homeostasis. Immune response in the innate and adaptive immune systems extends to the cross-talk and interaction involving immune-regulating non-hematopoietic cells, myeloid immune cells, and lymphoid immune cells. In this review, we discuss the different immune responses of resident immune cells in the tumor microenvironment. Current FDA-approved targeted therapies, including immunotherapy options, have produced modest results to date for the treatment of advanced HCC. Although immunotherapy therapy to date has demonstrated its potential efficacy, immune cell pathways need to be better understood. In this review article, we summarize the roles of specific resident immune cell subsets and their cross-talk subversion in HCC pathogenesis, with a view to identifying potential new biomarkers and therapy options
Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
Using FPGAs to accelerate ConvNets has attracted significant attention in
recent years. However, FPGA accelerator design has not leveraged the latest
progress of ConvNets. As a result, the key application characteristics such as
frames-per-second (FPS) are ignored in favor of simply counting GOPs, and
results on accuracy, which is critical to application success, are often not
even reported. In this work, we adopt an algorithm-hardware co-design approach
to develop a ConvNet accelerator called Synetgy and a novel ConvNet model
called DiracDeltaNet. Both the accelerator and ConvNet are tailored
to FPGA requirements. DiracDeltaNet, as the name suggests, is a ConvNet with
only convolutions while spatial convolutions are replaced by more
efficient shift operations. DiracDeltaNet achieves competitive accuracy on
ImageNet (88.7\% top-5), but with 42 fewer parameters and 48
fewer OPs than VGG16. We further quantize DiracDeltaNet's weights to 4-bit and
activations to 4-bits, with less than 1\% accuracy loss. These quantizations
exploit well the nature of FPGA hardware. In short, DiracDeltaNet's small model
size, low computational OP count, low precision and simplified operators allow
us to co-design a highly customized computing unit for an FPGA. We implement
the computing units for DiracDeltaNet on an Ultra96 SoC system through
high-level synthesis. Our accelerator's final top-5 accuracy of 88.1\% on
ImageNet, is higher than all the previously reported embedded FPGA
accelerators. In addition, the accelerator reaches an inference speed of 66.3
FPS on the ImageNet classification task, surpassing prior works with similar
accuracy by at least 11.6.Comment: Update to the latest result
Learning Evaluation: Blending Quality Improvement and Implementation Research Methods to Study Healthcare Innovations
Background: In healthcare change interventions, on-the-ground learning about the implementation process is often lost because of a primary focus on outcome improvements. This paper describes the Learning Evaluation, a methodological approach that blends quality improvement and implementation research methods to study healthcare innovations. Methods: Learning Evaluation is an approach to multi-organization assessment. Qualitative and quantitative data are collected to conduct real-time assessment of implementation processes while also assessing changes in context, facilitating quality improvement using run charts and audit and feedback, and generating transportable lessons. Five principles are the foundation of this approach: (1) gather data to describe changes made by healthcare organizations and how changes are implemented; (2) collect process and outcome data relevant to healthcare organizations and to the research team; (3) assess multi-level contextual factors that affect implementation, process, outcome, and transportability; (4) assist healthcare organizations in using data for continuous quality improvement; and (5) operationalize common measurement strategies to generate transportable results. Results: Learning Evaluation principles are applied across organizations by the following: (1) establishing a detailed understanding of the baseline implementation plan; (2) identifying target populations and tracking relevant process measures; (3) collecting and analyzing real-time quantitative and qualitative data on important contextual factors; (4) synthesizing data and emerging findings and sharing with stakeholders on an ongoing basis; and (5) harmonizing and fostering learning from process and outcome data. Application to a multi-site program focused on primary care and behavioral health integration shows the feasibility and utility of Learning Evaluation for generating real-time insights into evolving implementation processes. Conclusions: Learning Evaluation generates systematic and rigorous cross-organizational findings about implementing healthcare innovations while also enhancing organizational capacity and accelerating translation of findings by facilitating continuous learning within individual sites. Researchers evaluating change initiatives and healthcare organizations implementing improvement initiatives may benefit from a Learning Evaluation approach
MeerKLASS: MeerKAT Large Area Synoptic Survey
We discuss the ground-breaking science that will be possible with a wide area
survey, using the MeerKAT telescope, known as MeerKLASS (MeerKAT Large Area
Synoptic Survey). The current specifications of MeerKAT make it a great fit for
science applications that require large survey speeds but not necessarily high
angular resolutions. In particular, for cosmology, a large survey over for hours will potentially provide the first
ever measurements of the baryon acoustic oscillations using the 21cm intensity
mapping technique, with enough accuracy to impose constraints on the nature of
dark energy. The combination with multi-wavelength data will give unique
additional information, such as exquisite constraints on primordial
non-Gaussianity using the multi-tracer technique, as well as a better handle on
foregrounds and systematics. Such a wide survey with MeerKAT is also a great
match for HI galaxy studies, providing unrivalled statistics in the pre-SKA era
for galaxies resolved in the HI emission line beyond local structures at z >
0.01. It will also produce a large continuum galaxy sample down to a depth of
about 5\,Jy in L-band, which is quite unique over such large areas and
will allow studies of the large-scale structure of the Universe out to high
redshifts, complementing the galaxy HI survey to form a transformational
multi-wavelength approach to study galaxy dynamics and evolution. Finally, the
same survey will supply unique information for a range of other science
applications, including a large statistical investigation of galaxy clusters as
well as produce a rotation measure map across a huge swathe of the sky. The
MeerKLASS survey will be a crucial step on the road to using SKA1-MID for
cosmological applications and other commensal surveys, as described in the top
priority SKA key science projects (abridged).Comment: Larger version of the paper submitted to the Proceedings of Science,
"MeerKAT Science: On the Pathway to the SKA", Stellenbosch, 25-27 May 201
- …