12 research outputs found
Automated camera ranking and selection using video content and scene context
PhDWhen observing a scene with multiple cameras, an important problem to solve is to automatically
identify âwhat camera feed should be shown and when?â The answer to this question is of interest
for a number of applications and scenarios ranging from sports to surveillance. In this thesis we
present a framework for the ranking of each video frame and camera across time and the camera
network, respectively. This ranking is then used for automated video production. In the first stage
information from each camera view and from the objects in it is extracted and represented in a way
that allows for object- and frame-ranking. First objects are detected and ranked within and across
camera views. This ranking takes into account both visible and contextual information related to
the object. Then content ranking is performed based on the objects in the view and camera-network
level information. We propose two novel techniques for content ranking namely: Routing Based
Ranking (RBR) and Multivariate Gaussian based Ranking (MVG). In RBR we use a rule based
framework where weighted fusion of object and frame level information takes place while in MVG
the rank is estimated as a multivariate Gaussian distribution. Through experimental and subjective
validation we demonstrate that the proposed content ranking strategies allows the identification of
the best-camera at each time.
The second part of the thesis focuses on the automatic generation of N-to-1 videos based on the
ranked content. We demonstrate that in such production settings it is undesirable to have frequent
inter-camera switching. Thus motivating the need for a compromise, between selecting the best
camera most of the time and minimising the frequent inter-camera switching, we demonstrate that
state-of-the-art techniques for this task are inadequate and fail in dynamic scenes. We propose three
novel methods for automated camera selection. The first method (ÂĄgo f ) performs a joint optimization
of a cost function that depends on both the view quality and inter-camera switching so that a
i
Abstract ii
pleasing best-view video sequence can be composed. The other two methods (ÂĄdbn and ÂĄutil) include
the selection decision into the ranking-strategy. In ÂĄdbn we model the best-camera selection
as a state sequence via Directed Acyclic Graphs (DAG) designed as a Dynamic Bayesian Network
(DBN), which encodes the contextual knowledge about the camera network and employs the past
information to minimize the inter camera switches. In comparison ÂĄutil utilizes the past as well
as the future information in a Partially Observable Markov Decision Process (POMDP) where the
camera-selection at a certain time is influenced by the past information and its repercussions in
the future. The performance of the proposed approach is demonstrated on multiple real and synthetic
multi-camera setups. We compare the proposed architectures with various baseline methods
with encouraging results. The performance of the proposed approaches is also validated through
extensive subjective testing
Automatic Mobile Video Remixing and Collaborative Watching Systems
In the thesis, the implications of combining collaboration with automation for remix creation are analyzed. We first present a sensor-enhanced Automatic Video Remixing System (AVRS), which intelligently processes mobile videos in combination with mobile device sensor information. The sensor-enhanced AVRS system involves certain architectural choices, which meet the key system requirements (leverage user generated content, use sensor information, reduce end user burden), and user experience requirements. Architecture adaptations are required to improve certain key performance parameters. In addition, certain operating parameters need to be constrained, for real world deployment feasibility. Subsequently, sensor-less cloud based AVRS and low footprint sensorless AVRS approaches are presented. The three approaches exemplify the importance of operating parameter tradeoffs for system design. The approaches cover a wide spectrum, ranging from a multimodal multi-user client-server system (sensor-enhanced AVRS) to a mobile application which can automatically generate a multi-camera remix experience from a single video. Next, we present the findings from the four user studies involving 77 users related to automatic mobile video remixing. The goal was to validate selected system design goals, provide insights for additional features and identify the challenges and bottlenecks. Topics studied include the role of automation, the value of a video remix as an event memorabilia, the requirements for different types of events and the perceived user value from creating multi-camera remix from a single video. System design implications derived from the user studies are presented. Subsequently, sport summarization, which is a specific form of remix creation is analyzed. In particular, the role of content capture method is analyzed with two complementary approaches. The first approach performs saliency detection in casually captured mobile videos; in contrast, the second one creates multi-camera summaries from role based captured content. Furthermore, a method for interactive customization of summary is presented. Next, the discussion is extended to include the role of usersâ situational context and the consumed content in facilitating collaborative watching experience. Mobile based collaborative watching architectures are described, which facilitate a common shared context between the participants. The concept of movable multimedia is introduced to highlight the multidevice environment of current day users. The thesis presents results which have been derived from end-to-end system prototypes tested in real world conditions and corroborated with extensive user impact evaluation
DESIGN FRAMEWORK FOR INTERNET OF THINGS BASED NEXT GENERATION VIDEO SURVEILLANCE
Modern artificial intelligence and machine learning opens up new era towards video
surveillance system. Next generation video surveillance in Internet of Things (IoT) environment is
an emerging research area because of high bandwidth, big-data generation, resource constraint
video surveillance node, high energy consumption for real time applications. In this thesis, various
opportunities and functional requirements that next generation video surveillance system should
achieve with the power of video analytics, artificial intelligence and machine learning are
discussed. This thesis also proposes a new video surveillance system architecture introducing fog
computing towards IoT based system and contributes the facilities and benefits of proposed system
which can meet the forthcoming requirements of surveillance. Different challenges and issues
faced for video surveillance in IoT environment and evaluate fog-cloud integrated architecture to
penetrate and eliminate those issues.
The focus of this thesis is to evaluate the IoT based video surveillance system. To this end,
two case studies were performed to penetrate values towards energy and bandwidth efficient video
surveillance system. In one case study, an IoT-based power efficient color frame transmission and
generation algorithm for video surveillance application is presented. The conventional way is to
transmit all R, G and B components of all frames. Using proposed technique, instead of sending
all components, first one color frame is sent followed by a series of gray-scale frames. After a
certain number of gray-scale frames, another color frame is sent followed by the same number of
gray-scale frames. This process is repeated for video surveillance system. In the decoder, color
information is formulated from the color frame and then used to colorize the gray-scale frames. In
another case study, a bandwidth efficient and low complexity frame reproduction technique that is
also applicable in IoT based video surveillance application is presented. Using the second
technique, only the pixel intensity that differs heavily comparing to previous frameâs
corresponding pixel is sent. If the pixel intensity is similar or near similar comparing to the
previous frame, the information is not transferred. With this objective, the bit stream is created for
every frame with a predefined protocol. In cloud side, the frame information can be reproduced by
implementing the reverse protocol from the bit stream.
Experimental results of the two case studies show that the IoT-based proposed approach
gives better results than traditional techniques in terms of both energy efficiency and quality of the video, and therefore, can enable sensor nodes in IoT to perform more operations with energy
constraints
Personalized production of basketball videos from multi-sensored data under limited display resolution
Integration of information from multiple cameras is essential in television production or intelligent surveillance systems. We propose an autonomous system for personalized production of basketball videos from multi-sensored data under limited display resolution. The problem consists in selecting the right view to display among the multiple video streams captured by the investigated camera network A view is defined by the camera index and the parameters of the image cropped within the selected camera. We propose criteria for optimal planning of viewpoint coverage and camera selection. Perceptual comfort is discussed as well as efficient integration of contextual information, which is implemented by smoothing generated viewpoint/camera sequences to alleviate flickering visual artifacts and discontinuous story-telling artifacts. We design and implement the estimation process and verify it by experiments, which shows that our method efficiently reduces those artifacts. (C) 2010 Elsevier Inc. All rights reserved
Semantic Management of Location-Based Services in Wireless Environments
En los Ășltimos años el interĂ©s por la computaciĂłn mĂłvil ha crecido debido al incesante uso de dispositivos mĂłviles (por ejemplo, smartphones y tablets) y su ubicuidad. El bajo coste de dichos dispositivos unido al gran nĂșmero de sensores y mecanismos de comunicaciĂłn que equipan, hace posible el desarrollo de sistemas de informaciĂłn Ăștiles para sus usuarios. Utilizando un cierto tipo especial de sensores, los mecanismos de posicionamiento, es posible desarrollar Servicios Basados en la LocalizaciĂłn (Location-Based Services o LBS en inglĂ©s) que ofrecen un valor añadido al considerar la localizaciĂłn de los usuarios de dispositivos mĂłviles para ofrecerles informaciĂłn personalizada. Por ejemplo, se han presentado numerosos LBS entre los que se encuentran servicios para encontrar taxis, detectar amigos en las cercanĂas, ayudar a la extinciĂłn de incendios, obtener fotos e informaciĂłn de los alrededores, etc. Sin embargo, los LBS actuales estĂĄn diseñados para escenarios y objetivos especĂficos y, por lo tanto, estĂĄn basados en esquemas predefinidos para el modelado de los elementos involucrados en estos escenarios. AdemĂĄs, el conocimiento del contexto que manejan es implĂcito; razĂłn por la cual solamente funcionan para un objetivo especĂfico. Por ejemplo, en la actualidad un usuario que llega a una ciudad tiene que conocer (y comprender) quĂ© LBS podrĂan darle informaciĂłn acerca de medios de transporte especĂficos en dicha ciudad y estos servicios no son generalmente reutilizables en otras ciudades. Se han propuesto en la literatura algunas soluciones ad hoc para ofrecer LBS a usuarios pero no existe una soluciĂłn general y flexible que pueda ser aplicada a muchos escenarios diferentes. Desarrollar tal sistema general simplemente uniendo LBS existentes no es sencillo ya que es un desafĂo diseñar un framework comĂșn que permita manejar conocimiento obtenido de datos enviados por objetos heterogĂ©neos (incluyendo datos textuales, multimedia, sensoriales, etc.) y considerar situaciones en las que el sistema tiene que adaptarse a contextos donde el conocimiento cambia dinĂĄmicamente y en los que los dispositivos pueden usar diferentes tecnologĂas de comunicaciĂłn (red fija, inalĂĄmbrica, etc.). Nuestra propuesta en la presente tesis es el sistema SHERLOCK (System for Heterogeneous mobilE Requests by Leveraging Ontological and Contextual Knowledge) que presenta una arquitectura general y flexible para ofrecer a los usuarios LBS que puedan serles interesantes. SHERLOCK se basa en tecnologĂas semĂĄnticas y de agentes: 1) utiliza ontologĂas para modelar la informaciĂłn de usuarios, dispositivos, servicios, y el entorno, y un razonador para manejar estas ontologĂas e inferir conocimiento que no ha sido explicitado; 2) utiliza una arquitectura basada en agentes (tanto estĂĄticos como mĂłviles) que permite a los distintos dispositivos SHERLOCK intercambiar conocimiento y asĂ mantener sus ontologĂas locales actualizadas, y procesar peticiones de informaciĂłn de sus usuarios encontrando lo que necesitan, allĂĄ donde estĂ©. El uso de estas dos tecnologĂas permite a SHERLOCK ser flexible en tĂ©rminos de los servicios que ofrece al usuario (que son aprendidos mediante la interacciĂłn entre los dispositivos), y de los mecanismos para encontrar la informaciĂłn que el usuario quiere (que se adaptan a la infraestructura de comunicaciĂłn subyacente)
Autonomous production of basketball videos from multi-sensored data with personalized viewpoints
We propose an autonomous system for personalized production of basketball videos from multi-sensored data under limited display resolution. We propose criteria for optimal planning of viewpoint coverage and camera selection for improved story-telling and perceptual comfort. By using statistical inference, we design and implement the estimation process. Experiments are made to verify the system, which shows that our method efficiently alleviates flickering visual artifacts due to viewpoint switching, and discontinuous story-telling artifacts.Anglai
Understanding and designing for control in camera operation
Kameraleute nutzen traditionell gezielt Hilfsmittel um kontrollierte Kamerabewegungen zu ermöglichen. Der technische Fortschritt hat hierbei unlĂ€ngst zum Entstehen neuer Werkzeugen wie Gimbals, Drohnen oder Robotern beigetragen. Dabei wurden durch eine Kombination von Motorisierung, Computer-Vision und Machine-Learning auch neue Interaktionstechniken eingefuÌhrt. Neben dem etablierten achsenbasierten Stil wurde nun auch ein inhaltsbasierter Interaktionsstil ermöglicht. Einerseits vereinfachte dieser die Arbeit, andererseits aber folgten dieser (Teil-)Automatisierung auch unerwuÌnschte Nebeneffekte. GrundsĂ€tzlich wollen sich Kameraleute wĂ€hrend der Kamerabewegung kontinuierlich in Kontrolle und am Ende als Autoren der Aufnahmen fuÌhlen. WĂ€hrend Automatisierung hierbei Experten unterstuÌtzen und AnfĂ€nger befĂ€higen kann, fuÌhrt sie unweigerlich auch zu einem gewissen Verlust an gewuÌnschter Kontrolle. Wenn wir Kamerabewegung mit neuen Werkzeugen unterstuÌtzen wollen, stellt sich uns daher die Frage: Wie sollten wir diese Werkzeuge gestalten damit sie, trotz fortschreitender Automatisierung ein GefuÌhl von Kontrolle vermitteln?
In der Vergangenheit wurde Kamerakontrolle bereits eingehend erforscht, allerdings vermehrt im virtuellen Raum. Die Anwendung inhaltsbasierter Kontrolle im physikalischen Raum trifft jedoch auf weniger erforschte domĂ€nenspezifische Herausforderungen welche gleichzeitig auch neue Gestaltungsmöglichkeiten eröffnen. Um dabei auf NutzerbeduÌrfnisse einzugehen, muÌssen sich Schnittstellen zum Beispiel an diese EinschrĂ€nkungen anpassen können und ein Zusammenspiel mit bestehenden Praktiken erlauben. Bisherige Forschung fokussierte sich oftmals auf ein technisches VerstĂ€ndnis von Kamerafahrten, was sich auch in der Schnittstellengestaltung niederschlug. Im Gegensatz dazu trĂ€gt diese Arbeit zu einem besseren VerstĂ€ndnis der Motive und Praktiken von Kameraleuten bei und bildet eine Grundlage zur Forschung und Gestaltung von Nutzerschnittstellen.
Diese Arbeit prĂ€sentiert dazu konkret drei BeitrĂ€ge: Zuerst beschreiben wir ethnographische Studien uÌber Experten und deren Praktiken. Sie zeigen vor allem die Herausforderungen von Automatisierung bei Kreativaufgaben auf (Assistenz vs. KontrollgefuÌhl). Zweitens, stellen wir ein Prototyping-Toolkit vor, dass fuÌr den Einsatz im Feld geeignet ist. Das Toolkit stellt Software fuÌr eine Replikation quelloffen bereit und erleichtert somit die Exploration von Designprototypen. Um Fragen zu deren Gestaltung besser beantworten zu können, stellen wir ebenfalls ein Evaluations-Framework vor, das vor allem KontrollqualitĂ€t und -gefuÌhl bestimmt. Darin erweitern wir etablierte AnsĂ€tze um eine neurowissenschaftliche Methodik, um Daten explizit wie implizit erheben zu können. Drittens, prĂ€sentieren wir Designs und deren Evaluation aufbauend auf unserem Toolkit und Framework. Die Alternativen untersuchen Kontrolle bei verschiedenen Automatisierungsgraden und inhaltsbasierten Interaktionen. Auftretende Verdeckung durch graphische Elemente, wurde dabei durch visuelle Reduzierung und Mid-Air Gesten kompensiert. Unsere Studien implizieren hohe Grade an KontrollqualitĂ€t und -gefuÌhl bei unseren AnsĂ€tzen, die zudem kreatives Arbeiten und bestehende Praktiken unterstuÌtzen.Cinematographers often use supportive tools to craft desired camera moves. Recent technological advances added new tools to the palette such as gimbals, drones or robots. The combination of motor-driven actuation, computer vision and machine learning in such systems also rendered new interaction techniques possible. In particular, a content-based interaction style was introduced in addition to the established axis-based style. On the one hand, content-based cocreation between humans and automated systems made it easier to reach high level goals. On the other hand however, the increased use of automation also introduced negative side effects. Creatives usually want to feel in control during executing the camera motion and in the end as the authors of the recorded shots. While automation can assist experts or enable novices, it unfortunately also takes away desired control from operators. Thus, if we want to support cinematographers with new tools and interaction techniques the following question arises: How should we design interfaces for camera motion control that, despite being increasingly automated, provide cinematographers with an experience of control?
Camera control has been studied for decades, especially in virtual environments. Applying content-based interaction to physical environments opens up new design opportunities but also faces, less researched, domain-specific challenges. To suit the needs of cinematographers, designs need to be crafted with care. In particular, they must adapt to constraints of recordings on location. This makes an interplay with established practices essential. Previous work has mainly focused on a technology-centered understanding of camera travel which consequently influenced the design of camera control systems. In contrast, this thesis, contributes to the understanding of the motives of cinematographers, how they operate on set and provides a user-centered foundation informing cinematography specific research and design.
The contribution of this thesis is threefold: First, we present ethnographic studies on expert users and their shooting practices on location. These studies highlight the challenges of introducing automation to a creative task (assistance vs feeling in control). Second, we report on a domain specific prototyping toolkit for in-situ deployment. The toolkit provides open source software for low cost replication enabling the exploration of design alternatives. To better inform design decisions, we further introduce an evaluation framework for estimating the resulting quality and sense of control. By extending established methodologies with a recent neuroscientific technique, it provides data on explicit as well as implicit levels and is designed to be applicable to other domains of HCI. Third, we present evaluations of designs based on our toolkit and framework. We explored a dynamic interplay of manual control with various degrees of automation. Further, we examined different content-based interaction styles. Here, occlusion due to graphical elements was found and addressed by exploring visual reduction strategies and mid-air gestures. Our studies demonstrate that high degrees of quality and sense of control are achievable with our tools that also support creativity and established practices
Understanding and designing for control in camera operation
Kameraleute nutzen traditionell gezielt Hilfsmittel um kontrollierte Kamerabewegungen zu ermöglichen. Der technische Fortschritt hat hierbei unlĂ€ngst zum Entstehen neuer Werkzeugen wie Gimbals, Drohnen oder Robotern beigetragen. Dabei wurden durch eine Kombination von Motorisierung, Computer-Vision und Machine-Learning auch neue Interaktionstechniken eingefuÌhrt. Neben dem etablierten achsenbasierten Stil wurde nun auch ein inhaltsbasierter Interaktionsstil ermöglicht. Einerseits vereinfachte dieser die Arbeit, andererseits aber folgten dieser (Teil-)Automatisierung auch unerwuÌnschte Nebeneffekte. GrundsĂ€tzlich wollen sich Kameraleute wĂ€hrend der Kamerabewegung kontinuierlich in Kontrolle und am Ende als Autoren der Aufnahmen fuÌhlen. WĂ€hrend Automatisierung hierbei Experten unterstuÌtzen und AnfĂ€nger befĂ€higen kann, fuÌhrt sie unweigerlich auch zu einem gewissen Verlust an gewuÌnschter Kontrolle. Wenn wir Kamerabewegung mit neuen Werkzeugen unterstuÌtzen wollen, stellt sich uns daher die Frage: Wie sollten wir diese Werkzeuge gestalten damit sie, trotz fortschreitender Automatisierung ein GefuÌhl von Kontrolle vermitteln?
In der Vergangenheit wurde Kamerakontrolle bereits eingehend erforscht, allerdings vermehrt im virtuellen Raum. Die Anwendung inhaltsbasierter Kontrolle im physikalischen Raum trifft jedoch auf weniger erforschte domĂ€nenspezifische Herausforderungen welche gleichzeitig auch neue Gestaltungsmöglichkeiten eröffnen. Um dabei auf NutzerbeduÌrfnisse einzugehen, muÌssen sich Schnittstellen zum Beispiel an diese EinschrĂ€nkungen anpassen können und ein Zusammenspiel mit bestehenden Praktiken erlauben. Bisherige Forschung fokussierte sich oftmals auf ein technisches VerstĂ€ndnis von Kamerafahrten, was sich auch in der Schnittstellengestaltung niederschlug. Im Gegensatz dazu trĂ€gt diese Arbeit zu einem besseren VerstĂ€ndnis der Motive und Praktiken von Kameraleuten bei und bildet eine Grundlage zur Forschung und Gestaltung von Nutzerschnittstellen.
Diese Arbeit prĂ€sentiert dazu konkret drei BeitrĂ€ge: Zuerst beschreiben wir ethnographische Studien uÌber Experten und deren Praktiken. Sie zeigen vor allem die Herausforderungen von Automatisierung bei Kreativaufgaben auf (Assistenz vs. KontrollgefuÌhl). Zweitens, stellen wir ein Prototyping-Toolkit vor, dass fuÌr den Einsatz im Feld geeignet ist. Das Toolkit stellt Software fuÌr eine Replikation quelloffen bereit und erleichtert somit die Exploration von Designprototypen. Um Fragen zu deren Gestaltung besser beantworten zu können, stellen wir ebenfalls ein Evaluations-Framework vor, das vor allem KontrollqualitĂ€t und -gefuÌhl bestimmt. Darin erweitern wir etablierte AnsĂ€tze um eine neurowissenschaftliche Methodik, um Daten explizit wie implizit erheben zu können. Drittens, prĂ€sentieren wir Designs und deren Evaluation aufbauend auf unserem Toolkit und Framework. Die Alternativen untersuchen Kontrolle bei verschiedenen Automatisierungsgraden und inhaltsbasierten Interaktionen. Auftretende Verdeckung durch graphische Elemente, wurde dabei durch visuelle Reduzierung und Mid-Air Gesten kompensiert. Unsere Studien implizieren hohe Grade an KontrollqualitĂ€t und -gefuÌhl bei unseren AnsĂ€tzen, die zudem kreatives Arbeiten und bestehende Praktiken unterstuÌtzen.Cinematographers often use supportive tools to craft desired camera moves. Recent technological advances added new tools to the palette such as gimbals, drones or robots. The combination of motor-driven actuation, computer vision and machine learning in such systems also rendered new interaction techniques possible. In particular, a content-based interaction style was introduced in addition to the established axis-based style. On the one hand, content-based cocreation between humans and automated systems made it easier to reach high level goals. On the other hand however, the increased use of automation also introduced negative side effects. Creatives usually want to feel in control during executing the camera motion and in the end as the authors of the recorded shots. While automation can assist experts or enable novices, it unfortunately also takes away desired control from operators. Thus, if we want to support cinematographers with new tools and interaction techniques the following question arises: How should we design interfaces for camera motion control that, despite being increasingly automated, provide cinematographers with an experience of control?
Camera control has been studied for decades, especially in virtual environments. Applying content-based interaction to physical environments opens up new design opportunities but also faces, less researched, domain-specific challenges. To suit the needs of cinematographers, designs need to be crafted with care. In particular, they must adapt to constraints of recordings on location. This makes an interplay with established practices essential. Previous work has mainly focused on a technology-centered understanding of camera travel which consequently influenced the design of camera control systems. In contrast, this thesis, contributes to the understanding of the motives of cinematographers, how they operate on set and provides a user-centered foundation informing cinematography specific research and design.
The contribution of this thesis is threefold: First, we present ethnographic studies on expert users and their shooting practices on location. These studies highlight the challenges of introducing automation to a creative task (assistance vs feeling in control). Second, we report on a domain specific prototyping toolkit for in-situ deployment. The toolkit provides open source software for low cost replication enabling the exploration of design alternatives. To better inform design decisions, we further introduce an evaluation framework for estimating the resulting quality and sense of control. By extending established methodologies with a recent neuroscientific technique, it provides data on explicit as well as implicit levels and is designed to be applicable to other domains of HCI. Third, we present evaluations of designs based on our toolkit and framework. We explored a dynamic interplay of manual control with various degrees of automation. Further, we examined different content-based interaction styles. Here, occlusion due to graphical elements was found and addressed by exploring visual reduction strategies and mid-air gestures. Our studies demonstrate that high degrees of quality and sense of control are achievable with our tools that also support creativity and established practices
BUILDING STUDENTSâ CHARACTER BY ENGAGING SOCIAL STUDIES ISSUES IN LANGUAGE TEACHING
This paper focuses on analyzing the importance of character building in language teaching, and finding
ways how to take advantages from engaging social studies issues within a pedagogical framework. Using
social studies issues as topics in teaching English not only enable students to effectively acquire a foreign
language with the knowledge and skills but also raise their awareness and critical thinking of our society
problems. The globalization with its positive and negative affects has made our generation carried away by
the currents of change of being a stranger in their own backyard and forget our own cultural identities.
Many of the scenes and themes shown on the television, Internet and other media channels often run down
the values and ideas of our national positive characters traits. In this point of view, teaching language by
engaging social studies issues to build studentsâ character and national identity can enhance students
understanding toward the importance of its moral values as their inspiration to be a better person, and
improve their English as well. This paper will describe how an engagement of social studies issues can be
an alternative learning tool utilized in language teaching to develop studentsâ character and national
identity. Also, to provide the benefit from this approach students can get in learning English
Systems and methods for the autonomous production of videos from multi-sensored data (European Patent, EP 2428036 B1 granted in September 2015)
An autonomous computer based method and system is described for personalized production of videos such as team sport videos such as basketball videos from multi-sensored data under limited display resolution. Embodiments of the present invention relate to the selection of a view to display from among the multiple video streams captured by the camera network. Technical solutions are provided to provide perceptual comfort as well as an efficient integration of contextual information, which is implemented, for example, by smoothing generated viewpoint/camera sequences to alleviate flickering visual artefacts and discontinuous story-telling artefacts. A design and implementation of the viewpoint selection process is disclosed that has been verified by experiments, which shows that the method and system of the present invention efficiently distribute the processing load across cameras, and effectively selects viewpoints that cover the team action at hand while avoiding major perceptual artefacts