64 research outputs found

    Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression

    Full text link
    There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to previous works, our first layer is activated by a softmax unit. This sets the stage for future analyses of creating more activation functions based on the softmax function. Rearranging the softmax function leads to significantly different analyses. Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss. We prove that the loss function for the Hessian matrix is positive definite and Lipschitz continuous under certain assumptions. This enables us to establish local convergence guarantees for the proposed training algorithm. Specifically, with an appropriate initialization and after O(log(1/ϵ))O(\log(1/\epsilon)) iterations, our algorithm can find an ϵ\epsilon-approximate minimizer of the training loss with high probability. Each iteration requires approximately O(nnz(C)+dω)O(\mathrm{nnz}(C) + d^\omega) time, where dd is the model size, CC is the input matrix, and ω<2.374\omega < 2.374 is the matrix multiplication exponent

    Gender Bias in Large Language Models across Multiple Languages

    Full text link
    With the growing deployment of large language models (LLMs) across various applications, assessing the influence of gender biases embedded in LLMs becomes crucial. The topic of gender bias within the realm of natural language processing (NLP) has gained considerable focus, particularly in the context of English. Nonetheless, the investigation of gender bias in languages other than English is still relatively under-explored and insufficiently analyzed. In this work, We examine gender bias in LLMs-generated outputs for different languages. We use three measurements: 1) gender bias in selecting descriptive words given the gender-related context. 2) gender bias in selecting gender-related pronouns (she/he) given the descriptive words. 3) gender bias in the topics of LLM-generated dialogues. We investigate the outputs of the GPT series of LLMs in various languages using our three measurement methods. Our findings revealed significant gender biases across all the languages we examined.Comment: 20 pages, 27 tables, 7 figures, submitted to ACL202

    Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator

    Full text link
    3D-aware image synthesis aims at learning a generative model that can render photo-realistic 2D images while capturing decent underlying 3D shapes. A popular solution is to adopt the generative adversarial network (GAN) and replace the generator with a 3D renderer, where volume rendering with neural radiance field (NeRF) is commonly used. Despite the advancement of synthesis quality, existing methods fail to obtain moderate 3D shapes. We argue that, considering the two-player game in the formulation of GANs, only making the generator 3D-aware is not enough. In other words, displacing the generative mechanism only offers the capability, but not the guarantee, of producing 3D-aware images, because the supervision of the generator primarily comes from the discriminator. To address this issue, we propose GeoD through learning a geometry-aware discriminator to improve 3D-aware GANs. Concretely, besides differentiating real and fake samples from the 2D image space, the discriminator is additionally asked to derive the geometry information from the inputs, which is then applied as the guidance of the generator. Such a simple yet effective design facilitates learning substantially more accurate 3D shapes. Extensive experiments on various generator architectures and training datasets verify the superiority of GeoD over state-of-the-art alternatives. Moreover, our approach is registered as a general framework such that a more capable discriminator (i.e., with a third task of novel view synthesis beyond domain classification and geometry extraction) can further assist the generator with a better multi-view consistency.Comment: Accepted by NeurIPS 2022. Project page: https://vivianszf.github.io/geo

    Cross Entropy versus Label Smoothing: A Neural Collapse Perspective

    Full text link
    Label smoothing loss is a widely adopted technique to mitigate overfitting in deep neural networks. This paper studies label smoothing from the perspective of Neural Collapse (NC), a powerful empirical and theoretical framework which characterizes model behavior during the terminal phase of training. We first show empirically that models trained with label smoothing converge faster to neural collapse solutions and attain a stronger level of neural collapse. Additionally, we show that at the same level of NC1, models under label smoothing loss exhibit intensified NC2. These findings provide valuable insights into the performance benefits and enhanced model calibration under label smoothing loss. We then leverage the unconstrained feature model to derive closed-form solutions for the global minimizers for both loss functions and further demonstrate that models under label smoothing have a lower conditioning number and, therefore, theoretically converge faster. Our study, combining empirical evidence and theoretical results, not only provides nuanced insights into the differences between label smoothing and cross-entropy losses, but also serves as an example of how the powerful neural collapse framework can be used to improve our understanding of DNNs

    LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

    Full text link
    This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image. Establishing such a connection facilitates a more convenient local control of GAN generation, where users can alter the image content only within a spatial area simply by partially resampling the latent code. Experimental results confirm four appealing properties of our regularizer, which we call LinkGAN. (1) The latent-pixel linkage is applicable to either a fixed region (\textit{i.e.}, same for all instances) or a particular semantic category (i.e., varying across instances), like the sky. (2) Two or multiple regions can be independently linked to different latent axes, which further supports joint control. (3) Our regularizer can improve the spatial controllability of both 2D and 3D-aware GAN models, barely sacrificing the synthesis performance. (4) The models trained with our regularizer are compatible with GAN inversion techniques and maintain editability on real images

    propnet: Propagating 2D Annotation to 3D Segmentation for Gastric Tumors on CT Scans

    Full text link
    **Background:** Accurate 3D CT scan segmentation of gastric tumors is pivotal for diagnosis and treatment. The challenges lie in the irregular shapes, blurred boundaries of tumors, and the inefficiency of existing methods. **Purpose:** We conducted a study to introduce a model, utilizing human-guided knowledge and unique modules, to address the challenges of 3D tumor segmentation. **Methods:** We developed the PropNet framework, propagating radiologists' knowledge from 2D annotations to the entire 3D space. This model consists of a proposing stage for coarse segmentation and a refining stage for improved segmentation, using two-way branches for enhanced performance and an up-down strategy for efficiency. **Results:** With 98 patient scans for training and 30 for validation, our method achieves a significant agreement with manual annotation (Dice of 0.803) and improves efficiency. The performance is comparable in different scenarios and with various radiologists' annotations (Dice between 0.785 and 0.803). Moreover, the model shows improved prognostic prediction performance (C-index of 0.620 vs. 0.576) on an independent validation set of 42 patients with advanced gastric cancer. **Conclusions:** Our model generates accurate tumor segmentation efficiently and stably, improving prognostic performance and reducing high-throughput image reading workload. This model can accelerate the quantitative analysis of gastric tumors and enhance downstream task performance

    Effects of the interactions between platelets with other cells in tumor growth and progression

    Get PDF
    It has been confirmed that platelets play a key role in tumorigenesis. Tumor-activated platelets can recruit blood cells and immune cells to migrate, establish an inflammatory tumor microenvironment at the sites of primary and metastatic tumors. On the other hand, they can also promote the differentiation of mesenchymal cells, which can accelerate the proliferation, genesis and migration of blood vessels. The role of platelets in tumors has been well studied. However, a growing number of studies suggest that interactions between platelets and immune cells (e.g., dendritic cells, natural killer cells, monocytes, and red blood cells) also play an important role in tumorigenesis and tumor development. In this review, we summarize the major cells that are closely associated with platelets and discuss the essential role of the interaction between platelets with these cells in tumorigenesis and tumor development

    Small-molecule activation of lysosomal TRP channels ameliorates Duchenne muscular dystrophy in mouse models

    Get PDF
    Duchenne muscular dystrophy (DMD) is a devastating disease caused by mutations in dystrophin that compromise sarcolemma integrity. Currently, there is no treatment for DMD. Mutations in transient receptor potential mucolipin 1 (ML1), a lysosomal Ca2+ channel required for lysosomal exocytosis, produce a DMD-like phenotype. Here, we show that transgenic overexpression or pharmacological activation of ML1 in vivo facilitates sarcolemma repair and alleviates the dystrophic phenotypes in both skeletal and cardiac muscles of mdx mice (a mouse model of DMD). Hallmark dystrophic features of DMD, including myofiber necrosis, central nucleation, fibrosis, elevated serum creatine kinase levels, reduced muscle force, impaired motor ability, and dilated cardiomyopathies, were all ameliorated by increasing ML1 activity. ML1-dependent activation of transcription factor EB (TFEB) corrects lysosomal insufficiency to diminish muscle damage. Hence, targeting lysosomal Ca2+ channels may represent a promising approach to treat DMD and related muscle diseases

    Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged Robot

    Full text link
    Incorporating a robotic manipulator into a wheel-legged robot enhances its agility and expands its potential for practical applications. However, the presence of potential instability and uncertainties presents additional challenges for control objectives. In this paper, we introduce an arm-constrained curriculum learning architecture to tackle the issues introduced by adding the manipulator. Firstly, we develop an arm-constrained reinforcement learning algorithm to ensure safety and stability in control performance. Additionally, to address discrepancies in reward settings between the arm and the base, we propose a reward-aware curriculum learning method. The policy is first trained in Isaac gym and transferred to the physical robot to do dynamic grasping tasks, including the door-opening task, fan-twitching task and the relay-baton-picking and following task. The results demonstrate that our proposed approach effectively controls the arm-equipped wheel-legged robot to master dynamic grasping skills, allowing it to chase and catch a moving object while in motion. Please refer to our website (https://acodedog.github.io/wheel-legged-loco-manipulation) for the code and supplemental videos

    Evolution of LysM-RLK Gene Family in Wild and Cultivated Peanut Species

    Get PDF
    In legumes, a LysM-RLK perception of rhizobial lipo-chitooligosaccharides (LCOs) known as Nod factors (NFs), triggers a signaling pathway related to the onset of symbiosis development. On the other hand, activation of LysM-RLKs upon recognition of chitin-derived short-chitooligosaccharides initiates defense responses. In this work, we identified the members of the LysM-RLK family in cultivated (Arachis hypogaea L.) and wild (A. duranensis and A. ipaensis) peanut genomes, and reconstructed the evolutionary history of the family. Phylogenetic analyses allowed the building of a framework to reinterpret the functional data reported on peanut LysM-RLKs. In addition, the potential involvement of two identified proteins in NF perception and immunity was assessed by gene expression analyses. Results indicated that peanut LysM-RLK is a highly diverse family. Digital expression analyses indicated that some A. hypogaea LysM-RLK receptors were upregulated during the early and late stages of symbiosis. In addition, expression profiles of selected LysM-RLKs proteins suggest participation in the receptor network mediating NF and/or chitosan perception. The analyses of LysM-RLK in the non-model legume peanut can contribute to gaining insight into the molecular basis of legume–microbe interactions and to the understanding of the evolutionary history of this gene family within the Fabaceae.Fil: Rodriguez Melo, Johan Stiben. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Agrobiotecnología del Litoral. Universidad Nacional del Litoral. Instituto de Agrobiotecnología del Litoral; ArgentinaFil: Tonelli, Maria Laura. Universidad Nacional de Río Cuarto. Facultad de Ciencias Exactas Fisicoquímicas y Naturales. Instituto de Investigaciones Agrobiotecnológicas. - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Agrobiotecnológicas; ArgentinaFil: Barbosa, María Carolina. Universidad Nacional de Río Cuarto. Facultad de Ciencias Exactas Fisicoquímicas y Naturales. Instituto de Investigaciones Agrobiotecnológicas. - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Agrobiotecnológicas; ArgentinaFil: Ariel, Federico. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Agrobiotecnología del Litoral. Universidad Nacional del Litoral. Instituto de Agrobiotecnología del Litoral; ArgentinaFil: Zhao, Zifan. University of Florida; Estados UnidosFil: Wang, Jianping. University of Florida; Estados UnidosFil: Fabra, Adriana Isidora. Universidad Nacional de Río Cuarto. Facultad de Ciencias Exactas Fisicoquímicas y Naturales. Instituto de Investigaciones Agrobiotecnológicas. - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Agrobiotecnológicas; ArgentinaFil: Ibañez, Fernando Julio. Universidad Nacional de Río Cuarto. Facultad de Ciencias Exactas Fisicoquímicas y Naturales. Instituto de Investigaciones Agrobiotecnológicas. - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Agrobiotecnológicas; Argentin
    corecore