20 research outputs found
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
We introduce Florence-2, a novel vision foundation model with a unified,
prompt-based representation for a variety of computer vision and
vision-language tasks. While existing large vision models excel in transfer
learning, they struggle to perform a diversity of tasks with simple
instructions, a capability that implies handling the complexity of various
spatial hierarchy and semantic granularity. Florence-2 was designed to take
text-prompt as task instructions and generate desirable results in text forms,
whether it be captioning, object detection, grounding or segmentation. This
multi-task learning setup demands large-scale, high-quality annotated data. To
this end, we co-developed FLD-5B that consists of 5.4 billion comprehensive
visual annotations on 126 million images, using an iterative strategy of
automated image annotation and model refinement. We adopted a
sequence-to-sequence structure to train Florence-2 to perform versatile and
comprehensive vision tasks. Extensive evaluations on numerous tasks
demonstrated Florence-2 to be a strong vision foundation model contender with
unprecedented zero-shot and fine-tuning capabilities
MM-VID: Advancing Video Understanding with GPT-4V(ision)
We present MM-VID, an integrated system that harnesses the capabilities of
GPT-4V, combined with specialized tools in vision, audio, and speech, to
facilitate advanced video understanding. MM-VID is designed to address the
challenges posed by long-form videos and intricate tasks such as reasoning
within hour-long content and grasping storylines spanning multiple episodes.
MM-VID uses a video-to-script generation with GPT-4V to transcribe multimodal
elements into a long textual script. The generated script details character
movements, actions, expressions, and dialogues, paving the way for large
language models (LLMs) to achieve video understanding. This enables advanced
capabilities, including audio description, character identification, and
multimodal high-level comprehension. Experimental results demonstrate the
effectiveness of MM-VID in handling distinct video genres with various video
lengths. Additionally, we showcase its potential when applied to interactive
environments, such as video games and graphic user interfaces.Comment: Project page at https://multimodal-vid.github.io
The method evaluation of culturing df-1 to proliferate canine distemper virus in mink with cephodex microcarrier
As an acute and highly lethal infectious disease, there is no specific therapeutic drug for canine distemper (CD). Although
the process of large-scale production of canine distemper virus (CDV) vaccine of mink has been greatly improved, there are still many
deficiencies to be perfected. As one of the most promising technologies for large-scale vaccine production, microcarrier suspension culture
technology needs to be further improved. In this study, the application effect of the new Cephodex microcarrier in CDV culture was
evaluated to establish a set of technical process for DF-1 cell high-density growth and CDV efficient proliferation. To perfect the large-scale
CDV production process, Cephodex was used to suspension culture DF-1 cells for proliferating CDV. In a shake flasks culture system, the
optimal culture conditions were established by optimizing culture temperature, virus inoculation and harvest time. Therefore, mink CD
vaccine high-efficiency production was laid on the preliminarily established technology of CDV microcarrier suspension culture. The cell
density could reach over 3×106 cells/mL after 72 h cultured with Cephodex microcarrier at 37°C. Proliferated at 35°C, the CDV titer after
72 h was about 100.5 TCID50/0.1ml higher than that at 33°C and 37°C. These results show that the Cephodex microcarrier could be used for
large-scale culture of DF-1 cells and efficient proliferation of CDV
Parallel randomized support vector machine,” The
Abstract. A parallel support vector machine based on randomized sampling technique is proposed in this paper. We modeled a new LP-type problem so that it works for general linear-nonseparable SVM training problems unlike the previous work [2]. A unique priority based sampling mechanism is used so that we can prove an average convergence rate that is so far the fastest bounded convergence rate to the best of our knowledge. The numerical results on synthesized data and a real geometric database show that our algorithm has good scalability.
Dynamic traffic controls for Web-server networks
Responsible Editor: G. Pacifici Distributed Web-server systems have been widely used to provide effective Internet services. The management of these systems requires dynamic controls of the Web traffic. With the development of multimedia Web sites and increasingly diversified services, the existing load balancing approaches can no longer satisfy the requirements of either the service providers or the users. In this paper, a new reward-based control mechanism is proposed that can satisfy the dynamic content-based control requirement while avoiding congestion at the dispatcher. The core of the control algorithm is based on an MDP model. To minimize the system overhead, a centralized dispatching with decentralized admission (CDDA) approach is used to distribute the control related computation to each server pool. This cuts down the dimensions of the problem dramatically. We also propose a state-block scheme to further reduce the state space so that the algorithm becomes computationally feasible for on-line implementation. Simulation results demonstrate that the proposed state-block approach can not only reduce the computation time dramatically but also provide a good approximation of power-tailed request interarrival times common for Internet traffic. Finally, an implementation plan with system design is also proposed