8 research outputs found

    Strategies for Searching Video Content with Text Queries or Video Examples

    Full text link
    The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches

    Kebutuhan Sumber Belajar Mahasiswa yang Mendukung Pembelajaran Berbasis Teknologi Informasi dan Komunikasi di Perguruan Tinggi

    Get PDF
    This study is aimed at gathering information about the tendency of learning resources needed by the students of Universitas Negeri Jakarta (UNJ) that support ICT based learning. The study is conducted using the survey method applying questionnaires that randomly distributed to the students. The questionnaires filled in by 580 students of 22 different study programmes covering six faculties of UNJ. Every faculty tends to have different learning resources. However, videos are the most essential learning resources needed by students. The development of learning resources in higher education must consider the need of learning resources of students of each faculty, not the university as a whole

    Understanding Human Actions in Video

    Full text link
    Understanding human behavior is crucial for any autonomous system which interacts with humans. For example, assistive robots need to know when a person is signaling for help, and autonomous vehicles need to know when a person is waiting to cross the street. However, identifying human actions in video is a challenging and unsolved problem. In this work, we address several of the key challenges in human action recognition. To enable better representations of video sequences, we develop novel deep learning architectures which improve representations both at the level of instantaneous motion as well as at the level of long-term context. In addition, to reduce reliance on fixed action vocabularies, we develop a compositional representation of actions which allows novel action descriptions to be represented as a sequence of sub-actions. Finally, we address the issue of data collection for human action understanding by creating a large-scale video dataset, consisting of 70 million videos collected from internet video sharing sites and their matched descriptions. We demonstrate that these contributions improve the generalization performance of human action recognition systems on several benchmark datasets.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162887/1/stroud_1.pd

    Instructional videos for unsupervised harvesting and learning of action examples

    No full text
    ABSTRACT Online instructional videos have become a popular way for people to learn new skills encompassing art, cooking and sports. As watching instructional videos is a natural way for humans to learn, analogously, machines can also gain knowledge from these videos. We propose to utilize the large amount of instructional videos available online to harvest examples of various actions in an unsupervised fashion. The key observation is that in instructional videos, the instructor's action is highly correlated with the instructor's narration. By leveraging this correlation, we can exploit the timing of action corresponding terms in the speech transcript to temporally localize actions in the video and harvest action examples. The proposed method is scalable as it requires no human intervention. Experiments show that the examples harvested are of reasonably good quality, and action detectors trained on data collected by our unsupervised method yields comparable performance with detectors trained with manually collected data on the TRECVID Multimedia Event Detection task

    Instructional Videos for Unsupervised Harvesting and Learning of Action Examples

    No full text
    corecore