213 research outputs found
Failure to Launch in Two-Sided Markets: A Study of the U.S. Video Game Market
In the dynamic two-sided market environment, overpricing one side of the market not only discourages demand on that side but also discourages participation on the other side. Over time, this process can lead to a death spiral. This paper develops a dynamic structural model of the video game market to study launch failures in two-sided markets. The paper models consumers’ purchase decisions for hardware platforms and affiliated software products and software firms’ entry and pricing decisions. This paper also develops a Bayesian Markov Chain Monte Carlo approach to estimate dynamic structural models. The results of the counterfactual simulations show that a failed platform could have survived if it had lowered its hardware prices and that it could not have walked out of the death spiral if it had subsidized software entry
Failure to Launch in Two-Sided Markets: A Study of the U.S. Video Game Market
In the dynamic two-sided market environment, overpricing one side of the market not only discourages demand on that side but also discourages participation on the other side. Over time, this process can lead to a death spiral. This paper develops a dynamic structural model of the video game market to study launch failures in two-sided markets. The paper models consumers’ purchase decisions for hardware platforms and affiliated software products and software firms’ entry and pricing decisions. This paper also develops a Bayesian Markov Chain Monte Carlo approach to estimate dynamic structural models. The results of the counterfactual simulations show that a failed platform could have survived if it had lowered its hardware prices and that it could not have walked out of the death spiral if it had subsidized software entry
Anoikis-related genes combined with single cell sequencing: Insights into model specification of lung adenocarcinoma and applicability for prognosis and therapy
Background: Anoikis has therapeutic potential against different malignancies including lung adenocarcinoma. This study used anoikis and bioinformatics to construct a prognostic model for lung adenocarcinoma and explore new therapeutic strategies.Methods: Several bioinformatic algorithms (co-expression analysis, univariate Cox analysis, multivariate Cox analysis, and cross-validation) were used to screen anoikis-related genes (ARGs) to construct a risk model. Lung adenocarcinoma patients were divided into training and testing groups at a ratio of 1:1. The prognostic model was validated by risk score comparison between high- and low-risk groups using receiver operating characteristic curve (ROC), nomograms, independent prognostic analysis and principal component analysis. In addition, two anoikis-related genes patterns were classified utilizing consensus clustering method and were compared with each other in survival time, immune microenvironment, and regulation in pathway. Single cell sequencing was applied to analyze anoikis-related genes constructed the model.Results: This study demonstrated the feasibility of the model based on seven anoikis-related genes, as well as identifying axitinib, nibtinib and sorafenib as potential therapeutic strategies for LUAD. Risk score based on this model had could be used as an independent prognostic factor for lung adenocarcinoma (HR > 1; p < 0.001) and had the highest accuracy to predict survival compared with the clinical characteristics. Single cell sequencing analysis discovered Keratin 14 (KRT14, one of the seven anoikis-related genes) was mainly expressed in malignant cells in various cancers.Conclusion: We identified seven anoikis-related genes and constructed an accurate risk model based on bioinformatics analysis that can be used for prognostic prediction and for the design of therapeutic strategies in clinical practice
Towards Omni-supervised Referring Expression Segmentation
Referring Expression Segmentation (RES) is an emerging task in computer
vision, which segments the target instances in images based on text
descriptions. However, its development is plagued by the expensive segmentation
labels. To address this issue, we propose a new learning task for RES called
Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to
make full use of unlabeled, fully labeled and weakly labeled data, e.g.,
referring points or grounding boxes, for efficient RES training. To accomplish
this task, we also propose a novel yet strong baseline method for Omni-RES
based on the recently popular teacher-student learning, where the weak labels
are not directly transformed into supervision signals but used as a yardstick
to select and refine high-quality pseudo-masks for teacher-student learning. To
validate the proposed Omni-RES method, we apply it to a set of state-of-the-art
RES models and conduct extensive experiments on a bunch of RES datasets. The
experimental results yield the obvious merits of Omni-RES than the
fully-supervised and semi-supervised training schemes. For instance, with only
10% fully labeled data, Omni-RES can help the base model achieve 100% fully
supervised performance, and it also outperform the semi-supervised alternative
by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+,
respectively. More importantly, Omni-RES also enable the use of large-scale
vision-langauges like Visual Genome to facilitate low-cost RES training, and
achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
With ever increasing parameters and computation, vision-language pre-trained
(VLP) models exhibit prohibitive expenditure in downstream task adaption.
Recent endeavors mainly focus on parameter efficient transfer learning (PETL)
for VLP models by only updating a small number of parameters. However,
excessive computational overhead still plagues the application of VLPs. In this
paper, we aim at parameter and computation efficient transfer learning (PCETL)
for VLP models. In particular, PCETL not only needs to limit the number of
trainable parameters in VLP models, but also to reduce the computational
redundancy during inference, thus enabling a more efficient transfer. To
approach this target, we propose a novel dynamic architecture skipping (DAS)
approach towards effective PCETL. Instead of directly optimizing the intrinsic
architectures of VLP models, DAS first observes the significances of their
modules to downstream tasks via a reinforcement learning (RL) based process,
and then skips the redundant ones with lightweight networks, i.e., adapters,
according to the obtained rewards. In this case, the VLP model can well
maintain the scale of trainable parameters while speeding up its inference on
downstream tasks. To validate DAS, we apply it to two representative VLP
models, namely ViLT and METER, and conduct extensive experiments on a bunch of
VL tasks. The experimental results not only show the great advantages of DAS in
reducing computational complexity, e.g. -11.97% FLOPs of METER on VQA2.0, but
also confirm its competitiveness against existing PETL methods in terms of
parameter scale and performance. Our source code is given in our appendix
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Recently, growing interest has been aroused in extending the multimodal
capability of large language models (LLMs), e.g., vision-language (VL)
learning, which is regarded as the next milestone of artificial general
intelligence. However, existing solutions are prohibitively expensive, which
not only need to optimize excessive parameters, but also require another
large-scale pre-training before VL instruction tuning. In this paper, we
propose a novel and affordable solution for the effective VL adaption of LLMs,
called Mixture-of-Modality Adaptation (MMA). Instead of using large neural
networks to connect the image encoder and LLM, MMA adopts lightweight modules,
i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables
the joint optimization of the image and language models. Meanwhile, MMA is also
equipped with a routing algorithm to help LLMs achieve an automatic shift
between single- and multi-modal instructions without compromising their ability
of natural language understanding. To validate MMA, we apply it to a recent LLM
called LLaMA and term this formed large vision-language instructed model as
LaVIN. To validate MMA and LaVIN, we conduct extensive experiments under two
setups, namely multimodal science question answering and multimodal dialogue.
The experimental results not only demonstrate the competitive performance and
the superior training efficiency of LaVIN than existing multimodal LLMs, but
also confirm its great potential as a general-purpose chatbot. More
importantly, the actual expenditure of LaVIN is extremely cheap, e.g., only 1.4
training hours with 3.8M trainable parameters, greatly confirming the
effectiveness of MMA. Our project is released at
https://luogen1996.github.io/lavin
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation
Pixel synthesis is a promising research paradigm for image generation, which
can well exploit pixel-wise prior knowledge for generation. However, existing
methods still suffer from excessive memory footprint and computation overhead.
In this paper, we propose a progressive pixel synthesis network towards
efficient image generation, coined as PixelFolder. Specifically, PixelFolder
formulates image generation as a progressive pixel regression problem and
synthesizes images by a multi-stage paradigm, which can greatly reduce the
overhead caused by large tensor transformations. In addition, we introduce
novel pixel folding operations to further improve model efficiency while
maintaining pixel-wise prior knowledge for end-to-end regression. With these
innovative designs, we greatly reduce the expenditure of pixel synthesis, e.g.,
reducing 90% computation and 57% parameters compared to the latest pixel
synthesis method called CIPS. To validate our approach, we conduct extensive
experiments on two benchmark datasets, namely FFHQ and LSUN Church. The
experimental results show that with much less expenditure, PixelFolder obtains
new state-of-the-art (SOTA) performance on two benchmark datasets, i.e., 3.77
FID and 2.45 FID on FFHQ and LSUN Church, respectively. Meanwhile, PixelFolder
is also more efficient than the SOTA methods like StyleGAN2, reducing about 74%
computation and 36% parameters, respectively. These results greatly validate
the effectiveness of the proposed PixelFolder.Comment: 11 pages, 7 figure
NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning
Panoptic Narrative Detection (PND) and Segmentation (PNS) are two challenging
tasks that involve identifying and locating multiple targets in an image
according to a long narrative description. In this paper, we propose a unified
and effective framework called NICE that can jointly learn these two panoptic
narrative recognition tasks. Existing visual grounding tasks use a two-branch
paradigm, but applying this directly to PND and PNS can result in prediction
conflict due to their intrinsic many-to-many alignment property. To address
this, we introduce two cascading modules based on the barycenter of the mask,
which are Coordinate Guided Aggregation (CGA) and Barycenter Driven
Localization (BDL), responsible for segmentation and detection, respectively.
By linking PNS and PND in series with the barycenter of segmentation as the
anchor, our approach naturally aligns the two tasks and allows them to
complement each other for improved performance. Specifically, CGA provides the
barycenter as a reference for detection, reducing BDL's reliance on a large
number of candidate boxes. BDL leverages its excellent properties to
distinguish different instances, which improves the performance of CGA for
segmentation. Extensive experiments demonstrate that NICE surpasses all
existing methods by a large margin, achieving 4.1% for PND and 2.9% for PNS
over the state-of-the-art. These results validate the effectiveness of our
proposed collaborative learning strategy. The project of this work is made
publicly available at https://github.com/Mr-Neko/NICE.Comment: 18 pages. 9 figures, 9 table
Towards Efficient Visual Adaption via Structural Re-parameterization
Parameter-efficient transfer learning (PETL) is an emerging research spot
aimed at inexpensively adapting large-scale pre-trained models to downstream
tasks. Recent advances have achieved great success in saving storage costs for
various vision tasks by updating or injecting a small number of parameters
instead of full fine-tuning. However, we notice that most existing PETL methods
still incur non-negligible latency during inference. In this paper, we propose
a parameter-efficient and computationally friendly adapter for giant vision
models, called RepAdapter. Specifically, we prove that the adaption modules,
even with a complex structure, can be seamlessly integrated into most giant
vision models via structural re-parameterization. This property makes
RepAdapter zero-cost during inference. In addition to computation efficiency,
RepAdapter is more effective and lightweight than existing PETL methods due to
its sparse structure and our careful deployment. To validate RepAdapter, we
conduct extensive experiments on 27 benchmark datasets of three vision tasks,
i.e., image and video classifications and semantic segmentation. Experimental
results show the superior performance and efficiency of RepAdapter than the
state-of-the-art PETL methods. For instance, by updating only 0.6% parameters,
we can improve the performance of ViT from 38.8 to 55.1 on Sun397. Its
generalizability is also well validated by a bunch of vision models, i.e., ViT,
CLIP, Swin-Transformer and ConvNeXt. Our source code is released at
https://github.com/luogen1996/RepAdapter
- …