20 research outputs found

    Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

    Full text link
    We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. While existing large vision models excel in transfer learning, they struggle to perform a diversity of tasks with simple instructions, a capability that implies handling the complexity of various spatial hierarchy and semantic granularity. Florence-2 was designed to take text-prompt as task instructions and generate desirable results in text forms, whether it be captioning, object detection, grounding or segmentation. This multi-task learning setup demands large-scale, high-quality annotated data. To this end, we co-developed FLD-5B that consists of 5.4 billion comprehensive visual annotations on 126 million images, using an iterative strategy of automated image annotation and model refinement. We adopted a sequence-to-sequence structure to train Florence-2 to perform versatile and comprehensive vision tasks. Extensive evaluations on numerous tasks demonstrated Florence-2 to be a strong vision foundation model contender with unprecedented zero-shot and fine-tuning capabilities

    MM-VID: Advancing Video Understanding with GPT-4V(ision)

    Full text link
    We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding. MM-VID is designed to address the challenges posed by long-form videos and intricate tasks such as reasoning within hour-long content and grasping storylines spanning multiple episodes. MM-VID uses a video-to-script generation with GPT-4V to transcribe multimodal elements into a long textual script. The generated script details character movements, actions, expressions, and dialogues, paving the way for large language models (LLMs) to achieve video understanding. This enables advanced capabilities, including audio description, character identification, and multimodal high-level comprehension. Experimental results demonstrate the effectiveness of MM-VID in handling distinct video genres with various video lengths. Additionally, we showcase its potential when applied to interactive environments, such as video games and graphic user interfaces.Comment: Project page at https://multimodal-vid.github.io

    The method evaluation of culturing df-1 to proliferate canine distemper virus in mink with cephodex microcarrier

    Get PDF
    As an acute and highly lethal infectious disease, there is no specific therapeutic drug for canine distemper (CD). Although the process of large-scale production of canine distemper virus (CDV) vaccine of mink has been greatly improved, there are still many deficiencies to be perfected. As one of the most promising technologies for large-scale vaccine production, microcarrier suspension culture technology needs to be further improved. In this study, the application effect of the new Cephodex microcarrier in CDV culture was evaluated to establish a set of technical process for DF-1 cell high-density growth and CDV efficient proliferation. To perfect the large-scale CDV production process, Cephodex was used to suspension culture DF-1 cells for proliferating CDV. In a shake flasks culture system, the optimal culture conditions were established by optimizing culture temperature, virus inoculation and harvest time. Therefore, mink CD vaccine high-efficiency production was laid on the preliminarily established technology of CDV microcarrier suspension culture. The cell density could reach over 3×106 cells/mL after 72 h cultured with Cephodex microcarrier at 37°C. Proliferated at 35°C, the CDV titer after 72 h was about 100.5 TCID50/0.1ml higher than that at 33°C and 37°C. These results show that the Cephodex microcarrier could be used for large-scale culture of DF-1 cells and efficient proliferation of CDV

    Parallel randomized support vector machine,” The

    No full text
    Abstract. A parallel support vector machine based on randomized sampling technique is proposed in this paper. We modeled a new LP-type problem so that it works for general linear-nonseparable SVM training problems unlike the previous work [2]. A unique priority based sampling mechanism is used so that we can prove an average convergence rate that is so far the fastest bounded convergence rate to the best of our knowledge. The numerical results on synthesized data and a real geometric database show that our algorithm has good scalability.

    Dynamic traffic controls for Web-server networks

    No full text
    Responsible Editor: G. Pacifici Distributed Web-server systems have been widely used to provide effective Internet services. The management of these systems requires dynamic controls of the Web traffic. With the development of multimedia Web sites and increasingly diversified services, the existing load balancing approaches can no longer satisfy the requirements of either the service providers or the users. In this paper, a new reward-based control mechanism is proposed that can satisfy the dynamic content-based control requirement while avoiding congestion at the dispatcher. The core of the control algorithm is based on an MDP model. To minimize the system overhead, a centralized dispatching with decentralized admission (CDDA) approach is used to distribute the control related computation to each server pool. This cuts down the dimensions of the problem dramatically. We also propose a state-block scheme to further reduce the state space so that the algorithm becomes computationally feasible for on-line implementation. Simulation results demonstrate that the proposed state-block approach can not only reduce the computation time dramatically but also provide a good approximation of power-tailed request interarrival times common for Internet traffic. Finally, an implementation plan with system design is also proposed
    corecore