256 research outputs found

    Law of Balance and Stationary Distribution of Stochastic Gradient Descent

    Full text link
    The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this work, we prove that the minibatch noise of SGD regularizes the solution towards a balanced solution whenever the loss function contains a rescaling symmetry. Because the difference between a simple diffusion process and SGD dynamics is the most significant when symmetries are present, our theory implies that the loss function symmetries constitute an essential probe of how SGD works. We then apply this result to derive the stationary distribution of stochastic gradient flow for a diagonal linear network with arbitrary depth and width. The stationary distribution exhibits complicated nonlinear phenomena such as phase transitions, broken ergodicity, and fluctuation inversion. These phenomena are shown to exist uniquely in deep networks, implying a fundamental difference between deep and shallow models.Comment: Preprin

    The Attenuating Effect of Intelligent Agents and Agent Autonomy on Managers’ Ability to Diffuse Responsibility for and Engage in Earnings Management

    Get PDF
    Advances in IT suggest that computerized intelligent agents (IAs) may soon occupy many roles that presently employ human agents. A significant concern is the ethical conduct of those who use IAs, including their possible utilization by managers to engage in earnings management. Following economics and moral disengagement theory, we investigate how financial reporting decisions are affected when they are supported by the work of an IA versus a human agent, with varying autonomy. In a 2 x 2 between-participants experiment with experienced managers, we manipulate agent type and autonomy, finding that managers engage in less aggressive financial reporting decisions with IAs than with human agents, and engage in less aggressive reporting decisions with less autonomous agents than with more autonomous agents. Path analysis suggests that managers’ perception of control over their agent and ability to diffuse responsibility for their financial reporting decisions serially mediate the effect of agent type and autonomy on managers’ financial reporting decisions. Our results have implications for regulators and practitioners, where the adoption of computerized intelligent agents can attenuate managers’ earnings management activity by preventing them from diffusing responsibility for their actions to others

    SGD with a Constant Large Learning Rate Can Converge to Local Maxima

    Full text link
    Previous works on stochastic gradient descent (SGD) often focus on its success. In this work, we construct worst-case optimization problems illustrating that, when not in the regimes that the previous works often assume, SGD can exhibit many strange and potentially undesirable behaviors. Specifically, we construct landscapes and data distributions such that (1) SGD converges to local maxima, (2) SGD escapes saddle points arbitrarily slowly, (3) SGD prefers sharp minima over flat ones, and (4) AMSGrad converges to local maxima. We also realize results in a minimal neural network-like example. Our results highlight the importance of simultaneously analyzing the minibatch sampling, discrete-time updates rules, and realistic landscapes to understand the role of SGD in deep learning.Comment: ICLR 2022 Spotligh

    An Essential Farnesylated Kinesin in Trypanosoma brucei

    Get PDF
    Kinesins are a family of motor proteins conserved throughout eukaryotes. In our present study we characterize a novel kinesin, KinesinCaaX, orthologs of which are only found in the kinetoplastids and not other eukaryotes. KinesinCaaX has the CVIM amino acids at the C-terminus, and CVIM was previously shown to be an ideal signal for protein farnesylation in T. brucei. In this study we show KinesinCaaX is farnesylated using radiolabeling studies and that farnesylation is dependent on the CVIM motif. Using RNA interference, we show KinesinCaaX is essential for T. brucei proliferation. Additionally RNAi KinesinCaaX depleted T. brucei are 4 fold more sensitive to the protein farneysltransferase (PFT) inhibitor LN-59, suggesting that KinesinCaaX is a target of PFT inhibitors' action to block proliferation of T. brucei. Using tetracycline-induced exogenous tagged KinesinCaaX and KinesinCVIMdeletion (non-farnesylated Kinesin) expression lines in T. brucei, we demonstrate KinesinCaaX is farnesylated in T. brucei cells and this farnesylation has functional effects. In cells expressing a CaaX-deleted version of Kinesin, the localization is more diffuse which suggests correct localization depends on farnesylation. Through our investigation of cell cycle, nucleus and kinetoplast quantitation and immunofluorescence assays an important role is suggested for KinesinCaaX in the separation of nuclei and kinetoplasts during and after they have been replicated. Taken together, our work suggests KinesinCaaX is a target of PFT inhibition of T. brucei cell proliferation and KinesinCaaX functions through both the motor and farnesyl groups

    Editorial: Celebrating Microbial Diversity: The Many Cell Cycles of Eukaryotic Microbes.

    Get PDF
    Editorial on the Research Topic Celebrating Microbial Diversity: The Many Cell Cycles of Eukaryotic MicrobesCM: ERC research grant ‘Plasmocycle’. ZL: NIH R01 grant AI101437. MB: Swiss National Science Foundation 31003A_179321

    Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

    Full text link
    Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the problem of how to better optimize a text recognition model from the perspective of loss functions is largely overlooked. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with accuracy degradation. This is because CTC loss emphasizes the optimization of the entire sequence target while neglecting to learn individual characters. We propose a self-distillation scheme for CTC-based model to address this issue. It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models. We refer to the regularized CTC loss as Distillation Connectionist Temporal Classification (DCTC) loss. DCTC loss is module-free, requiring no extra parameters, longer inference lag, or additional training data or phases. Extensive experiments on public benchmarks demonstrate that DCTC can boost text recognition model accuracy by up to 2.6%, without any of these drawbacks.Comment: Ziyin Zhang and Ning Lu are co-first author

    Re-channelization of turbidity currents in South China Sea abyssal plain due to seamounts and ridges

    Get PDF
    Turbidity currents can be characterized as net-erosive, net-depositional or net-bypassing. Whether a flow is erosive, depositional or bypasses depends on the flow velocity, concentration and size but these can also be impacted by external controls such as the degree of confinement, slope gradient and substrate type and erodibility. Our understanding of the relative importance of these controls comes from laboratory experiments and numerical modelling, as well as from field data due to the proliferation of high-resolution 3D seismic and bathymetric data, as well as the outcrop and rock record. In this study, based on extensive multibeam and seismic reflection surveys in combination with International Ocean Discovery Program cores from the South China Sea, we document a new mechanism of turbidity current transformation from depositional to erosive resulting in channel incision. We show how confinement by seamounts and bedrock highs of previously unconfined turbidity currents has resulted in the development of seafloor channels. These channels are inferred to be the result of confinement of flows, which have traversed the abyssal plain, leading to flow acceleration allowing them to erode the seafloor substrate. This interpretation is further supported by the coarsening of flow deposits within the area of the seamounts, indicating that confinement has increased flow competency, allowing turbidity currents to carry larger volumes of coarse sediment which has been deposited in this region. This basin-scale depositional pattern suggests that pre-established basin topography can have an important control on sedimentation which can impact characteristics such as potential hydrocarbon storage
    corecore