43 research outputs found

    Multi-level Video Filtering Using Non-textual Contents

    Get PDF

    Personalized Video Recommendation Using Rich Contents from Videos

    Full text link
    Video recommendation has become an essential way of helping people explore the massive videos and discover the ones that may be of interest to them. In the existing video recommender systems, the models make the recommendations based on the user-video interactions and single specific content features. When the specific content features are unavailable, the performance of the existing models will seriously deteriorate. Inspired by the fact that rich contents (e.g., text, audio, motion, and so on) exist in videos, in this paper, we explore how to use these rich contents to overcome the limitations caused by the unavailability of the specific ones. Specifically, we propose a novel general framework that incorporates arbitrary single content feature with user-video interactions, named as collaborative embedding regression (CER) model, to make effective video recommendation in both in-matrix and out-of-matrix scenarios. Our extensive experiments on two real-world large-scale datasets show that CER beats the existing recommender models with any single content feature and is more time efficient. In addition, we propose a priority-based late fusion (PRI) method to gain the benefit brought by the integrating the multiple content features. The corresponding experiment shows that PRI brings real performance improvement to the baseline and outperforms the existing fusion methods

    Strategies for Searching Video Content with Text Queries or Video Examples

    Full text link
    The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches

    A unified framework with a benchmark dataset for surveillance event detection

    Get PDF
    As an important branch of multimedia content analysis, Surveillance Event Detection (SED) is still a quite challenging task due to high abstraction and complexity such as occlusions, cluttered backgrounds and viewpoint changes etc. To address the problem, we propose a unified SED detection framework which divides events into two categories, i.e., short-term events and long-duration events. The former can be represented as a kind of snapshots of static key-poses and embodies an inner-dependencies, while the latter contains complex interactions between pedestrians, and shows obvious inter-dependencies and temporal context. For short-term event, a novel cascade Convolutional Neural Network (CNN)-HsNet is first constructed to detect the pedestrian, and then the corresponding events are classified. For long-duration event, Dense Trajectory (DT) and Improved Dense Trajectory (IDT) are first applied to explore the temporal features of the events respectively, and subsequently, Fisher Vector (FV) coding is adopted to encode raw features and linear SVM classifiers are learned to predict. Finally, a heuristic fusion scheme is used to obtain the results. In addition, a new large-scale pedestrian dataset, named SED-PD, is built for evaluation. Comprehensive experiments on TRECVID SEDtest datasets demonstrate the effectiveness of proposed framework

    Buffer overflow detection for C programs is hard to learn

    No full text
    Machine learning has been used to detect bugs such as buffer overflow [6, 8]. Models get trained to report bugs at the function or file level, and reviewers of the results have to eyeball the code to determine whether there is a bug in that function or file, or not. Contrast this to static code analysers which report bugs at the statement level along with traces [3, 7], easing the effort required to review the reports. Based on our experience with implementing scalable and precise bug finders in the Parfait tool [3], we experiment with machine learning to understand how close the techniques can get to a precise static code analyser. In this paperwe summarise our finding in using ML techniques to find buffer overflow in programs written in C language.We treat bug detection as a classification problem.We use feature extraction and train a model to determine whether a buffer overflow has occurred or not at the function level. Training is done over labelled data used for regression testing of the Parfait tool.We evaluate the performance of different classifiers using the 10-fold cross-validation and the leave-one-out strategy. To understand the generalisability of the trained model, we use it on a collection of unlabelled real-world programs and manually check the reported warnings. Our experiments show that, even though the models give good results over training data, they do not perform that well when faced with larger, unlabelled data. We conclude with open questions that need addressing before machine learning techniques can be used for buffer overflow detection

    Minimal on-road time route scheduling on time-dependent graphs

    No full text
    On time-dependent graphs, fastest path query is an important problem and has been well studied. It focuses on minimizing the total travel time (waiting time + on-road time) but does not allow waiting on any intermediate vertex if the FIFO property is applied. However, in practice, waiting on a vertex can reduce the time spent on the road (for example, resuming traveling after a traffic jam). In this paper, we study how to find a path with the minimal on-road time on time-dependent graphs by allowing waiting on some pre-defined parking vertices. The existing works are based on the following fact: the arrival time of a vertex v is determined by the arrival time of its in-neighbor u, which does not hold in our scenario since we also consider the waiting time on u if u allows waiting. Thus, determining the waiting time on each parking vertex to achieve the minimal on-road time becomes a big challenge, which further breaks FIFO property. To cope with this challenging problem, we propose two efficient algorithms using minimum on-road travel cost function to answer the query. The evaluations on multiple real-world time-dependent graphs show that the proposed algorithms are more accurate and efficient than the extensions of existing algorithms. In addition, the results further indicate, if the parking facilities are enabled in the route scheduling algorithms, the on-road time will reduce significantly compared to the fastest path algorithms

    Machine learning-based automatic construction of earthquake catalog for reservoir areas in multiple river basins of Guizhou province, ChinaKey points

    No full text
    Large reservoirs have the risk of reservoir induced seismicity. Accurately detecting and locating microseismic events are crucial when studying reservoir earthquakes. Automatic earthquake monitoring in reservoir areas is one of the effective measures for earthquake disaster prevention and mitigation. In this study, we first applied the automatic location workflow (named LOC-FLOW) to process 14-day continuous waveform data from several reservoir areas in different river basins of Guizhou province. Compared with the manual seismic catalog, the recall rate of seismic event detection using the workflow was 83.9%. Of the detected earthquakes, 88.9% had an onset time difference below 1 s, 81.8% has a deviation in epicenter location within 5 km, and 77.8% had a focal depth difference of less than 5 km, indicating that the workflow has good generalization capacity in reservoir areas. We further applied the workflow to retrospectively process continuous waveform data recorded from 2020 to the first half of 2021 in reservoir areas in multiple river basins of western Guizhou province and identified five times the number of seismic events obtained through manual processing. Compared with manual processing of seismic catalog, the completeness magnitude had decreased from 1.3 to 0.8, and a b-value of 1.25 was calculated for seismicity in western Guizhou province, consistent with the b-values obtained for the reservoir area in previous studies. Our results show that seismicity levels were relatively low around large reservoirs that were impounded over 15 years ago, and there is no significant correlation between the seismicity in these areas and reservoir impoundment. Seismicity patterns were notably different around two large reservoirs that were only impounded about 12 years ago, which may be explained by differences in reservoir storage capacity, the geologic and tectonic settings, hydrogeological characteristics, and active fault the reservoir areas. Prominent seismicity persisted around two large reservoirs that have been impounded for less than 10 years. These events were clustered and had relatively shallow focal depths. The impoundment of the Jiayan Reservoir had not officially begun during this study period, but earthquake location results suggested a high seismicity level in this reservoir area. Therefore, any seismicity in this reservoir area after the official impoundment deserves special attention

    Multiple graph unsupervised feature selection

    No full text
    Feature selection improves the quality of the model by filtering out the noisy or redundant part. In the unsupervised scenarios, the selection is challenging due to the unavailability of the labels. To overcome that, the graphs which can unfold the geometry structure on the manifold are usually used to regularize the selection process. These graphs can be constructed either in the local view or the global view. As the local graph is more discriminative, previous methods tended to use the local graph rather than the global graph. But the global graph also has useful information. In light of this, in this paper, we propose a multiple graph unsupervised feature selection method to leverage the information from both local and global graphs. Besides that, we enforce the ll norm to achieve more flexible sparse learning. The experiments which inspect the effects of multiple graph and ll norm are conducted respectively on various datasets, and the comparisons to other mainstream methods are also presented in this paper. The results support that the multiple graph could be better than the single graph in the unsupervised feature selection, and the overall performance of the proposed method is higher than the other comparisons

    Learning Discrete Hashing Towards Efficient Fashion Recommendation

    No full text
    In our daily life, how to match clothing well is always a troublesome problem especially when we are shopping online to select a pair of matched pieces of clothing from tens of thousands available selections. To help common customers overcome selection issues, recent studies in the recommender system area have started to infer the fashion matching results automatically. The traditional fashion recommendation is normally achieved by considering visual similarity of clothing items or/and item co-purchase history from existing shopping transactions. Due to the high complexity of visual features and the lack of historical item purchase records, most of the existing work is unlikely to make an efficient and accurate recommendation. To address the problem, in this paper, we propose a new model called Discrete Supervised Fashion Coordinates Hashing. Its main objective is to learn meaningful yet compact high-level features of clothing items, which are represented as binary hash codes. In detail, this learning process is supervised by a clothing matching matrix, which is initially constructed based on limited known matching pairs and subsequently on the self-augmented ones. The proposed model jointly learns the intrinsic matching patterns from the matching matrix and the binary representations from the clothing items’ images, where the visual feature of each clothing item is discretized into a fixed-length binary vector. The binary representation learning significantly reduces the memory cost and accelerates the recommendation speed. The experiments compared with several state-of-the-art approaches have evidenced the superior performance of the proposed approach on efficient fashion recommendation

    Using detected visual objects to index video database

    No full text
    In this paper, we focus on how to use visual objects to index the videos. Two tables are constructed for this purpose, namely the unique object table and the occurrence table. The former table stores the unique objects which appear in the videos, while the latter table stores the occurrence information of these unique objects in the videos. In previous works, these two tables are generated manually by a topdown process. That is, the unique object table is given by the experts at first, then the occurrence table is generated by the annotators according to the unique object table. Obviously, such process which heavily depends on human labors limits the scalability especially when the data are dynamic or large-scale. To improve this, we propose to perform a bottom-up process to generate these two tables. The novelties are: we use object detector instead of human annotation to create the occurrence table; we propose a hybrid method which consists of local merge, global merge and propagation to generate the unique object table and fix the occurrence table. In fact, there are another three candidate methods for implementing the bottom-up process, namely, recognizing-based, matching-based and tracking-based methods. Through analyzing their mechanism and evaluating their accuracy, we find that they are not suitable for the bottom-up process. The proposed hybrid method leverages the advantages of the matching-based and tracking-based methods. Our experiments show that the hybrid method is more accurate and efficient than the candidate methods, which indicates that it is more suitable for the proposed bottom-up process
    corecore