160 research outputs found

    TGSum: Build Tweet Guided Multi-Document Summarization Dataset

    Full text link
    The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201

    LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

    Full text link
    In this work, we present a novel method to tackle the token generation challenge in Vision Language Models (VLMs) for video and image understanding, called LLaMA-VID. Current VLMs, while proficient in tasks like image captioning and visual question answering, face computational burdens when processing long videos due to the excessive visual tokens. LLaMA-VID addresses this issue by representing each frame with two distinct tokens, namely context token and content token. The context token encodes the overall image context based on user input, whereas the content token encapsulates visual cues in each frame. This dual-token strategy significantly reduces the overload of long videos while preserving critical information. Generally, LLaMA-VID empowers existing frameworks to support hour-long videos and pushes their upper limit with an extra context token. It is proved to surpass previous methods on most of video- or image-based benchmarks. Code is available https://github.com/dvlab-research/LLaMA-VID}{https://github.com/dvlab-research/LLaMA-VIDComment: Code is available at https://github.com/dvlab-research/LLaMA-VI

    Exploring Evaluation Factors and Framework for the Object of Automated Trading System

    Get PDF
    Automated trading system (ATS) is a computer program that combines different trading rules to find optimal trading opportunities. The objects of ATS, which are financial assets, need evaluation because that is of great significance for stakeholders and market orders. From the perspectives of dealers, agents, external environment, and objects themselves, this study explored factors in evaluating and choosing the object of ATS. Based on design science research (DSR), we presented a preliminary evaluation framework and conducted semi-structured interviews with twelve trading participants engaged in different occupations. By analyzing the data collected, we validated eight factors from literatures and found four new factors and fifty-four sub-factors. Additionally, this paper developed a relationship model of factors. The results could be used in future work to explore and validate more evaluation factors by using data mining

    Object-Centric Stereo Matching for 3D Object Detection

    Full text link
    Safe autonomous driving requires reliable 3D object detection-determining the 6 DoF pose and dimensions of objects of interest. Using stereo cameras to solve this task is a cost-effective alternative to the widely used LiDAR sensor. The current state-of-the-art for stereo 3D object detection takes the existing PSMNet stereo matching network, with no modifications, and converts the estimated disparities into a 3D point cloud, and feeds this point cloud into a LiDAR-based 3D object detector. The issue with existing stereo matching networks is that they are designed for disparity estimation, not 3D object detection; the shape and accuracy of object point clouds are not the focus. Stereo matching networks commonly suffer from inaccurate depth estimates at object boundaries, which we define as streaking, because background and foreground points are jointly estimated. Existing networks also penalize disparity instead of the estimated position of object point clouds in their loss functions. We propose a novel 2D box association and object-centric stereo matching method that only estimates the disparities of the objects of interest to address these two issues. Our method achieves state-of-the-art results on the KITTI 3D and BEV benchmarks.Comment: Accepted in ICRA 202

    Research hotspots and emerging trends of deep learning applications in orthopedics: A bibliometric and visualized study

    Get PDF
    BackgroundAs a research hotspot, deep learning has been continuously combined with various research fields in medicine. Recently, there is a growing amount of deep learning-based researches in orthopedics. This bibliometric analysis aimed to identify the hotspots of deep learning applications in orthopedics in recent years and infer future research trends.MethodsWe screened global publication on deep learning applications in orthopedics by accessing the Web of Science Core Collection. The articles and reviews were collected without language and time restrictions. Citespace was applied to conduct the bibliometric analysis of the publications.ResultsA total of 822 articles and reviews were finally retrieved. The analysis showed that the application of deep learning in orthopedics has great prospects for development based on the annual publications. The most prolific country is the USA, followed by China. University of California San Francisco, and Skeletal Radiology are the most prolific institution and journal, respectively. LeCun Y is the most frequently cited author, and Nature has the highest impact factor in the cited journals. The current hot keywords are convolutional neural network, classification, segmentation, diagnosis, image, fracture, and osteoarthritis. The burst keywords are risk factor, identification, localization, and surgery. The timeline viewer showed two recent research directions for bone tumors and osteoporosis.ConclusionPublications on deep learning applications in orthopedics have increased in recent years, with the USA being the most prolific. The current research mainly focused on classifying, diagnosing and risk predicting in osteoarthritis and fractures from medical images. Future research directions may put emphasis on reducing intraoperative risk, predicting the occurrence of postoperative complications, screening for osteoporosis, and identification and classification of bone tumors from conventional imaging

    Sampling and inference of networked dynamics using Log-Koopman nonlinear graph fourier transform

    Get PDF
    Monitoring the networked dynamics via the subset of nodes is essential for a variety of scientific and operational purposes. When there is a lack of an explicit model and networked signal space, traditional observability analysis and non-convex methods are insufficient. Current data-driven Koopman linearization, although derives a linear evolution model for selected vector-valued observable of original state-space, may result in a large sampling set due to: (i) the large size of polynomial based observables (O(N2) , N number of nodes in network), and (ii) not factoring in the nonlinear dependency betweenobservables. In this work, to achieve linear scaling (O(N) ) and a small set of sampling nodes, wepropose to combine a novel Log-Koopman operator and nonlinear Graph Fourier Transform (NL-GFT) scheme. First, the Log-Koopman operator is able to reduce the size of observables by transforming multiplicative poly-observable to logarithm summation. Second, anonlinear GFT concept and sampling theory are provided to exploit the nonlinear dependence of observables for observability analysis using Koopman evolution model. The results demonstrate that the proposed Log-Koopman NL-GFT scheme can (i) linearize unknownnonlinear dynamics using O(N) observables, and (ii) achieve lower number of sampling nodes, compared with the state-of-the art polynomial Koopman based observability analysis
    corecore