Search CORE

4,259 research outputs found

Inefficiency of K-FAC for Large Batch Size Training

Author: Gholami Amir
Keutzer Kurt
Ma Linjian
Mahoney Michael W.
Montague Gabe
Yao Zhewei
Ye Jiayu
Publication venue
Publication date: 31/07/2019
Field of study

In stochastic optimization, using large batch sizes during training can leverage parallel resources to produce faster wall-clock training times per training epoch. However, for both training loss and testing error, recent results analyzing large batch Stochastic Gradient Descent (SGD) have found sharp diminishing returns, beyond a certain critical batch size. In the hopes of addressing this, it has been suggested that the Kronecker-Factored Approximate Curvature (\mbox{K-FAC}) method allows for greater scalability to large batch sizes, for non-convex machine learning problems such as neural network optimization, as well as greater robustness to variation in model hyperparameters. Here, we perform a detailed empirical analysis of large batch size training %of these two hypotheses, for both \mbox{K-FAC} and SGD, evaluating performance in terms of both wall-clock time and aggregate computational cost. Our main results are twofold: first, we find that both \mbox{K-FAC} and SGD doesn't have ideal scalability behavior beyond a certain batch size, and that \mbox{K-FAC} does not exhibit improved large-batch scalability behavior, as compared to SGD; and second, we find that \mbox{K-FAC}, in addition to requiring more hyperparameters to tune, suffers from similar hyperparameter sensitivity behavior as does SGD. We discuss extensive results using ResNet and AlexNet on \mbox{CIFAR-10} and SVHN, respectively, as well as more general implications of our findings

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances

Author: Choi Byungkuk
Eom Haekwang
Ghifary Muhammad
Lewis J. P.
Ma Wan-Duo Kurt
Publication venue
Publication date: 26/09/2023
Field of study

Visual effects commonly requires both the creation of realistic synthetic humans as well as retargeting actors' performances to humanoid characters such as aliens and monsters. Achieving the expressive performances demanded in entertainment requires manipulating complex models with hundreds of parameters. Full creative control requires the freedom to make edits at any stage of the production, which prohibits the use of a fully automatic ``black box'' solution with uninterpretable parameters. On the other hand, producing realistic animation with these sophisticated models is difficult and laborious. This paper describes FDLS (Facial Deep Learning Solver), which is Weta Digital's solution to these challenges. FDLS adopts a coarse-to-fine and human-in-the-loop strategy, allowing a solved performance to be verified and edited at several stages in the solving process. To train FDLS, we first transform the raw motion-captured data into robust graph features. Secondly, based on the observation that the artists typically finalize the jaw pass animation before proceeding to finer detail, we solve for the jaw motion first and predict fine expressions with region-based networks conditioned on the jaw position. Finally, artists can optionally invoke a non-linear finetuning process on top of the FDLS solution to follow the motion-captured virtual markers as closely as possible. FDLS supports editing if needed to improve the results of the deep learning solution and it can handle small daily changes in the actor's face shape. FDLS permits reliable and production-quality performance solving with minimal training and little or no manual effort in many cases, while also allowing the solve to be guided and edited in unusual and difficult cases. The system has been under development for several years and has been used in major movies.Comment: DigiPro '22: The Digital Production Symposiu

arXiv.org e-Print Archive

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Author: Kleijn W. Bastiaan
Lahiri Avisek
Leung Thomas
Lewis J. P.
Ma Wan-Duo Kurt
Publication venue
Publication date: 10/07/2023
Field of study

Text-guided diffusion models such as DALLE-2, Imagen, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images are of very high quality. However, these models often struggle to compose scenes containing several key objects such as characters in specified positional relationships. The missing capability to "direct" the placement of characters and objects both within and across images is crucial in storytelling, as recognized in the literature on film and animation theory. In this work, we take a particularly straightforward approach to providing the needed direction. Drawing on the observation that the cross-attention maps for prompt words reflect the spatial layout of objects denoted by those words, we introduce an optimization objective that produces ``activation'' at desired positions in these cross-attention maps. The resulting approach is a step toward generalizing the applicability of text-guided diffusion models beyond single images to collections of related images, as in storybooks. To the best of our knowledge, our Directed Diffusion method is the first diffusion technique that provides positional control over multiple objects, while making use of an existing pre-trained model and maintaining a coherent blend between the positioned objects and the background. Moreover, it requires only a few lines to implement.Comment: Our project page: https://hohonu-vicml.github.io/DirectedDiffusion.Pag

arXiv.org e-Print Archive

Derrick's theorem beyond a potential

Author: A Adams
A Nicolis
A Nicolis
A Nicolis
Alberto Nicolis
C Armendariz-Picon
GH Derrick
GR Dvali
Junpu Wang
Kurt Hinterbichler
Lam Hui
MA Luty
N Arkani-Hamed
S Dubovsky
Solomon Endlich
SR Coleman
THR Skyrme
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/02/2010
Field of study

Scalar field theories with derivative interactions are known to possess solitonic excitations, but such solitons are generally unsatisfactory because the effective theory fails precisely where nonlinearities responsible for the solitons are important. A new class of theories possessing (internal) galilean invariance can in principle bypass this difficulty. Here, we show that these galileon theories do not possess stable solitonic solutions. As a by-product, we show that no stable solitons exist for a different class of derivatively coupled theories, describing for instance the infrared dynamics of superfluids, fluids, solids and some k-essence models.Comment: 4 page

arXiv.org e-Print Archive

Crossref

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Author: Dong Zhen
Gholami Amir
Keutzer Kurt
Ma Linjian
Mahoney Michael W.
Shen Sheng
Yao Zhewei
Ye Jiayu
Publication venue
Publication date: 24/09/2019
Field of study

Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in resource constrained environments has become a challenging task. In this work, we perform an extensive analysis of fine-tuned BERT models using second order Hessian information, and we use our results to propose a novel method for quantizing BERT models to ultra low precision. In particular, we propose a new group-wise quantization scheme, and we use a Hessian based mix-precision method to compress the model further. We extensively test our proposed method on BERT downstream tasks of SST-2, MNLI, CoNLL-03, and SQuAD. We can achieve comparable performance to baseline with at most

2.3\%

performance degradation, even with ultra-low precision quantization down to 2 bits, corresponding up to

13\times

compression of the model parameters, and up to

4\times

compression of the embedding table as well as activations. Among all tasks, we observed the highest performance loss for BERT fine-tuned on SQuAD. By probing into the Hessian based analysis as well as visualization, we show that this is related to the fact that current training/fine-tuning strategy of BERT does not converge for SQuAD

arXiv.org e-Print Archive

eScholarship - University of California

Association for the Advancement of Artificial Intelligence: AAAI Publications

Resident Immune Cells of the Liver in the Tumor Microenvironment

Author: Kurt Sartorius
Kurt Sartorius
Kurt Sartorius
Pengcheng Sun
Qi Zhou
Shiying Ma
Wei Ding
Yunfei Duan
Yunjie Lu
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2022
Field of study

The liver is a central immunomodulator that ensures a homeostatic balance between protection and immunotolerance. A hallmark of hepatocellular carcinoma (HCC) is the deregulation of this tightly controlled immunological network. Immune response in the liver involves a complex interplay between resident innate, innate, and adaptive immune cells. The immune response in the liver is modulated by its continuous exposure to toxic molecules and microorganisms that requires a degree of immune tolerance to protect normal tissue from damage. In HCC pathogenesis, immune cells must balance a dual role that includes the elimination of malignant cells, as well as the repair of damaged liver tissue to maintain homeostasis. Immune response in the innate and adaptive immune systems extends to the cross-talk and interaction involving immune-regulating non-hematopoietic cells, myeloid immune cells, and lymphoid immune cells. In this review, we discuss the different immune responses of resident immune cells in the tumor microenvironment. Current FDA-approved targeted therapies, including immunotherapy options, have produced modest results to date for the treatment of advanced HCC. Although immunotherapy therapy to date has demonstrated its potential efficacy, immune cell pathways need to be better understood. In this review article, we summarize the roles of specific resident immune cell subsets and their cross-talk subversion in HCC pathogenesis, with a view to identifying potential new biomarkers and therapy options

Directory of Open Access Journals

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Author: Blott Michaela
Gambardella Giulio
Huang Qijing
Keutzer Kurt
Lavagno Luciano
Ma Liang
Vissers Kees
Wawrzynek John
Wu Bichen
Yang Yifan
Zhang Tianjun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) are ignored in favor of simply counting GOPs, and results on accuracy, which is critical to application success, are often not even reported. In this work, we adopt an algorithm-hardware co-design approach to develop a ConvNet accelerator called Synetgy and a novel ConvNet model called DiracDeltaNet

^{\dagger}

. Both the accelerator and ConvNet are tailored to FPGA requirements. DiracDeltaNet, as the name suggests, is a ConvNet with only

1\times 1

convolutions while spatial convolutions are replaced by more efficient shift operations. DiracDeltaNet achieves competitive accuracy on ImageNet (88.7\% top-5), but with 42

\times

fewer parameters and 48

\times

fewer OPs than VGG16. We further quantize DiracDeltaNet's weights to 4-bit and activations to 4-bits, with less than 1\% accuracy loss. These quantizations exploit well the nature of FPGA hardware. In short, DiracDeltaNet's small model size, low computational OP count, low precision and simplified operators allow us to co-design a highly customized computing unit for an FPGA. We implement the computing units for DiracDeltaNet on an Ultra96 SoC system through high-level synthesis. Our accelerator's final top-5 accuracy of 88.1\% on ImageNet, is higher than all the previously reported embedded FPGA accelerators. In addition, the accelerator reaches an inference speed of 66.3 FPS on the ImageNet classification task, surpassing prior works with similar accuracy by at least 11.6

\times

.Comment: Update to the latest result

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Learning Evaluation: Blending Quality Improvement and Implementation Research Methods to Study Healthcare Innovations

Author: Balasubramanian Bijal A, MBBS, PhD
Cohen Deborah J, PhD
Crabtree Benjamin F, PhD
Davis Melinda M, PhD
Dickinson L. Miriam
Gunn Rose, MA
Miller William L, MD, MA
Stange Kurt C, MD, PhD
Publication venue: LVHN Scholarly Works
Publication date: 10/03/2015
Field of study

Background: In healthcare change interventions, on-the-ground learning about the implementation process is often lost because of a primary focus on outcome improvements. This paper describes the Learning Evaluation, a methodological approach that blends quality improvement and implementation research methods to study healthcare innovations. Methods: Learning Evaluation is an approach to multi-organization assessment. Qualitative and quantitative data are collected to conduct real-time assessment of implementation processes while also assessing changes in context, facilitating quality improvement using run charts and audit and feedback, and generating transportable lessons. Five principles are the foundation of this approach: (1) gather data to describe changes made by healthcare organizations and how changes are implemented; (2) collect process and outcome data relevant to healthcare organizations and to the research team; (3) assess multi-level contextual factors that affect implementation, process, outcome, and transportability; (4) assist healthcare organizations in using data for continuous quality improvement; and (5) operationalize common measurement strategies to generate transportable results. Results: Learning Evaluation principles are applied across organizations by the following: (1) establishing a detailed understanding of the baseline implementation plan; (2) identifying target populations and tracking relevant process measures; (3) collecting and analyzing real-time quantitative and qualitative data on important contextual factors; (4) synthesizing data and emerging findings and sharing with stakeholders on an ongoing basis; and (5) harmonizing and fostering learning from process and outcome data. Application to a multi-site program focused on primary care and behavioral health integration shows the feasibility and utility of Learning Evaluation for generating real-time insights into evolving implementation processes. Conclusions: Learning Evaluation generates systematic and rigorous cross-organizational findings about implementing healthcare innovations while also enhancing organizational capacity and accelerating translation of findings by facilitating continuous learning within individual sites. Researchers evaluating change initiatives and healthcare organizations implementing improvement initiatives may benefit from a Learning Evaluation approach

Lehigh Valley Health Network: LVHN Scholarly Works

MeerKLASS: MeerKAT Large Area Synoptic Survey

We discuss the ground-breaking science that will be possible with a wide area survey, using the MeerKAT telescope, known as MeerKLASS (MeerKAT Large Area Synoptic Survey). The current specifications of MeerKAT make it a great fit for science applications that require large survey speeds but not necessarily high angular resolutions. In particular, for cosmology, a large survey over

\sim 4,000 \, {\rm deg}^2

for

\sim 4,000

hours will potentially provide the first ever measurements of the baryon acoustic oscillations using the 21cm intensity mapping technique, with enough accuracy to impose constraints on the nature of dark energy. The combination with multi-wavelength data will give unique additional information, such as exquisite constraints on primordial non-Gaussianity using the multi-tracer technique, as well as a better handle on foregrounds and systematics. Such a wide survey with MeerKAT is also a great match for HI galaxy studies, providing unrivalled statistics in the pre-SKA era for galaxies resolved in the HI emission line beyond local structures at z > 0.01. It will also produce a large continuum galaxy sample down to a depth of about 5\,

\mu

Jy in L-band, which is quite unique over such large areas and will allow studies of the large-scale structure of the Universe out to high redshifts, complementing the galaxy HI survey to form a transformational multi-wavelength approach to study galaxy dynamics and evolution. Finally, the same survey will supply unique information for a range of other science applications, including a large statistical investigation of galaxy clusters as well as produce a rotation measure map across a huge swathe of the sky. The MeerKLASS survey will be a crucial step on the road to using SKA1-MID for cosmological applications and other commensal surveys, as described in the top priority SKA key science projects (abridged).Comment: Larger version of the paper submitted to the Proceedings of Science, "MeerKAT Science: On the Pathway to the SKA", Stellenbosch, 25-27 May 201

arXiv.org e-Print Archive

Oxford University Research Archive

The University of Manchester - Institutional Repository