Search CORE

213 research outputs found

Failure to Launch in Two-Sided Markets: A Study of the U.S. Video Game Market

Author: Zhou Yiyi
Publication venue
Publication date: 16/10/2012
Field of study

In the dynamic two-sided market environment, overpricing one side of the market not only discourages demand on that side but also discourages participation on the other side. Over time, this process can lead to a death spiral. This paper develops a dynamic structural model of the video game market to study launch failures in two-sided markets. The paper models consumers’ purchase decisions for hardware platforms and affiliated software products and software firms’ entry and pricing decisions. This paper also develops a Bayesian Markov Chain Monte Carlo approach to estimate dynamic structural models. The results of the counterfactual simulations show that a failed platform could have survived if it had lowered its hardware prices and that it could not have walked out of the death spiral if it had subsidized software entry

Failure to Launch in Two-Sided Markets: A Study of the U.S. Video Game Market

Author: Zhou Yiyi
Publication venue
Publication date: 01/01/2012
Field of study

Munich RePEc Personal Archive

Crossref

Anoikis-related genes combined with single cell sequencing: Insights into model specification of lung adenocarcinoma and applicability for prognosis and therapy

Author: Yiyi Zhou
Zhenli Hu
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2023
Field of study

Background: Anoikis has therapeutic potential against different malignancies including lung adenocarcinoma. This study used anoikis and bioinformatics to construct a prognostic model for lung adenocarcinoma and explore new therapeutic strategies.Methods: Several bioinformatic algorithms (co-expression analysis, univariate Cox analysis, multivariate Cox analysis, and cross-validation) were used to screen anoikis-related genes (ARGs) to construct a risk model. Lung adenocarcinoma patients were divided into training and testing groups at a ratio of 1:1. The prognostic model was validated by risk score comparison between high- and low-risk groups using receiver operating characteristic curve (ROC), nomograms, independent prognostic analysis and principal component analysis. In addition, two anoikis-related genes patterns were classified utilizing consensus clustering method and were compared with each other in survival time, immune microenvironment, and regulation in pathway. Single cell sequencing was applied to analyze anoikis-related genes constructed the model.Results: This study demonstrated the feasibility of the model based on seven anoikis-related genes, as well as identifying axitinib, nibtinib and sorafenib as potential therapeutic strategies for LUAD. Risk score based on this model had could be used as an independent prognostic factor for lung adenocarcinoma (HR > 1; p < 0.001) and had the highest accuracy to predict survival compared with the clinical characteristics. Single cell sequencing analysis discovered Keratin 14 (KRT14, one of the seven anoikis-related genes) was mainly expressed in malignant cells in various cancers.Conclusion: We identified seven anoikis-related genes and constructed an accurate risk model based on bioinformatics analysis that can be used for prognostic prediction and for the design of therapeutic strategies in clinical practice

Directory of Open Access Journals

Towards Omni-supervised Referring Expression Segmentation

Author: Huang Minglang
Jiang Guannan
Luo Gen
Sun Xiaoshuai
Zhou Yiyi
Zhuang Weilin
Publication venue
Publication date: 27/11/2023
Field of study

Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO

arXiv.org e-Print Archive

Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models

Author: Huang Shubin
Ji Rongrong
Sun Xiaoshuai
Wu Qiong
Yu Wei
Zhou Yiyi
Publication venue
Publication date: 06/09/2023
Field of study

With ever increasing parameters and computation, vision-language pre-trained (VLP) models exhibit prohibitive expenditure in downstream task adaption. Recent endeavors mainly focus on parameter efficient transfer learning (PETL) for VLP models by only updating a small number of parameters. However, excessive computational overhead still plagues the application of VLPs. In this paper, we aim at parameter and computation efficient transfer learning (PCETL) for VLP models. In particular, PCETL not only needs to limit the number of trainable parameters in VLP models, but also to reduce the computational redundancy during inference, thus enabling a more efficient transfer. To approach this target, we propose a novel dynamic architecture skipping (DAS) approach towards effective PCETL. Instead of directly optimizing the intrinsic architectures of VLP models, DAS first observes the significances of their modules to downstream tasks via a reinforcement learning (RL) based process, and then skips the redundant ones with lightweight networks, i.e., adapters, according to the obtained rewards. In this case, the VLP model can well maintain the scale of trainable parameters while speeding up its inference on downstream tasks. To validate DAS, we apply it to two representative VLP models, namely ViLT and METER, and conduct extensive experiments on a bunch of VL tasks. The experimental results not only show the great advantages of DAS in reducing computational complexity, e.g. -11.97% FLOPs of METER on VQA2.0, but also confirm its competitiveness against existing PETL methods in terms of parameter scale and performance. Our source code is given in our appendix

arXiv.org e-Print Archive

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models

Author: Chen Shengxin
Ji Rongrong
Luo Gen
Ren Tianhe
Sun Xiaoshuai
Zhou Yiyi
Publication venue
Publication date: 24/05/2023
Field of study

Recently, growing interest has been aroused in extending the multimodal capability of large language models (LLMs), e.g., vision-language (VL) learning, which is regarded as the next milestone of artificial general intelligence. However, existing solutions are prohibitively expensive, which not only need to optimize excessive parameters, but also require another large-scale pre-training before VL instruction tuning. In this paper, we propose a novel and affordable solution for the effective VL adaption of LLMs, called Mixture-of-Modality Adaptation (MMA). Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables the joint optimization of the image and language models. Meanwhile, MMA is also equipped with a routing algorithm to help LLMs achieve an automatic shift between single- and multi-modal instructions without compromising their ability of natural language understanding. To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN. To validate MMA and LaVIN, we conduct extensive experiments under two setups, namely multimodal science question answering and multimodal dialogue. The experimental results not only demonstrate the competitive performance and the superior training efficiency of LaVIN than existing multimodal LLMs, but also confirm its great potential as a general-purpose chatbot. More importantly, the actual expenditure of LaVIN is extremely cheap, e.g., only 1.4 training hours with 3.8M trainable parameters, greatly confirming the effectiveness of MMA. Our project is released at https://luogen1996.github.io/lavin

arXiv.org e-Print Archive

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation

Author: Chen Chao
He Jing
Ji Rongrong
Shen Yunhang
Sun Xiaoshuai
Zhang Qi
Zhou Yiyi
Publication venue
Publication date: 02/04/2022
Field of study

Pixel synthesis is a promising research paradigm for image generation, which can well exploit pixel-wise prior knowledge for generation. However, existing methods still suffer from excessive memory footprint and computation overhead. In this paper, we propose a progressive pixel synthesis network towards efficient image generation, coined as PixelFolder. Specifically, PixelFolder formulates image generation as a progressive pixel regression problem and synthesizes images by a multi-stage paradigm, which can greatly reduce the overhead caused by large tensor transformations. In addition, we introduce novel pixel folding operations to further improve model efficiency while maintaining pixel-wise prior knowledge for end-to-end regression. With these innovative designs, we greatly reduce the expenditure of pixel synthesis, e.g., reducing 90% computation and 57% parameters compared to the latest pixel synthesis method called CIPS. To validate our approach, we conduct extensive experiments on two benchmark datasets, namely FFHQ and LSUN Church. The experimental results show that with much less expenditure, PixelFolder obtains new state-of-the-art (SOTA) performance on two benchmark datasets, i.e., 3.77 FID and 2.45 FID on FFHQ and LSUN Church, respectively. Meanwhile, PixelFolder is also more efficient than the SOTA methods like StyleGAN2, reducing about 74% computation and 36% parameters, respectively. These results greatly validate the effectiveness of the proposed PixelFolder.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning

Author: Guo Tianyu
Ji Jiayi
Ji Rongrong
Sun Xiaoshuai
Wang Haowei
Yang Yilong
Zhou Yiyi
Publication venue
Publication date: 23/10/2023
Field of study

Panoptic Narrative Detection (PND) and Segmentation (PNS) are two challenging tasks that involve identifying and locating multiple targets in an image according to a long narrative description. In this paper, we propose a unified and effective framework called NICE that can jointly learn these two panoptic narrative recognition tasks. Existing visual grounding tasks use a two-branch paradigm, but applying this directly to PND and PNS can result in prediction conflict due to their intrinsic many-to-many alignment property. To address this, we introduce two cascading modules based on the barycenter of the mask, which are Coordinate Guided Aggregation (CGA) and Barycenter Driven Localization (BDL), responsible for segmentation and detection, respectively. By linking PNS and PND in series with the barycenter of segmentation as the anchor, our approach naturally aligns the two tasks and allows them to complement each other for improved performance. Specifically, CGA provides the barycenter as a reference for detection, reducing BDL's reliance on a large number of candidate boxes. BDL leverages its excellent properties to distinguish different instances, which improves the performance of CGA for segmentation. Extensive experiments demonstrate that NICE surpasses all existing methods by a large margin, achieving 4.1% for PND and 2.9% for PNS over the state-of-the-art. These results validate the effectiveness of our proposed collaborative learning strategy. The project of this work is made publicly available at https://github.com/Mr-Neko/NICE.Comment: 18 pages. 9 figures, 9 table

arXiv.org e-Print Archive

Towards Efficient Visual Adaption via Structural Re-parameterization

Author: Huang Minglang
Ji Rongrong
Jiang Guannan
Luo Gen
Sun Xiaoshuai
Wang Zhiyu
Zhou Yiyi
Publication venue
Publication date: 16/02/2023
Field of study

Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various vision tasks by updating or injecting a small number of parameters instead of full fine-tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computationally friendly adapter for giant vision models, called RepAdapter. Specifically, we prove that the adaption modules, even with a complex structure, can be seamlessly integrated into most giant vision models via structural re-parameterization. This property makes RepAdapter zero-cost during inference. In addition to computation efficiency, RepAdapter is more effective and lightweight than existing PETL methods due to its sparse structure and our careful deployment. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, by updating only 0.6% parameters, we can improve the performance of ViT from 38.8 to 55.1 on Sun397. Its generalizability is also well validated by a bunch of vision models, i.e., ViT, CLIP, Swin-Transformer and ConvNeXt. Our source code is released at https://github.com/luogen1996/RepAdapter

arXiv.org e-Print Archive