601 research outputs found
An automatic calibration method for stereo-based 3D distributed smart camera networks
Stereo-based 3D distributed smart camera networks are useful in a broad range of applications. Knowledge of the relative locations and orientations of nodes in the network is an essential prerequisite for true 3D sensing. A novel spatial calibration method for a network of pre-calibrated stereo smart cameras is presented, which obtains pose estimates suitable for collaborative 3D vision in a distributed fashion using two stages of registration on robust 3D point sets. The method is initially described in a geometrical sense, then presented in a practical implementation using existing vision and registration algorithms. Experiments using both software simulations and physical devices are designed and executed to demonstrate performance
Distributed Robotic Vision for Calibration, Localisation, and Mapping
This dissertation explores distributed algorithms for calibration, localisation, and mapping in the context of a multi-robot network equipped with cameras and onboard processing, comparing against centralised alternatives where all data is transmitted to a singular external node on which processing occurs. With the rise of large-scale camera networks, and as low-cost on-board processing becomes increasingly feasible in robotics networks, distributed algorithms are becoming important for robustness and scalability. Standard solutions to multi-camera computer vision require the data from all nodes to be processed at a central node which represents a significant single point of failure and incurs infeasible communication costs. Distributed solutions solve these issues by spreading the work over the entire network, operating only on local calculations and direct communication with nearby neighbours.
This research considers a framework for a distributed robotic vision platform for calibration, localisation, mapping tasks where three main stages are identified: an initialisation stage where calibration and localisation are performed in a distributed manner, a local tracking stage where visual odometry is performed without inter-robot communication, and a global mapping stage where global alignment and optimisation strategies are applied. In consideration of this framework, this research investigates how algorithms can be developed to produce fundamentally distributed solutions, designed to minimise computational complexity whilst maintaining excellent performance, and designed to operate effectively in the long term. Therefore, three primary objectives are sought aligning with these three stages
Socio-economic vision graph generation and handover in distributed smart camera networks
In this article we present an approach to object tracking handover in a network of smart cameras, based on self-interested autonomous agents, which exchange responsibility for tracking objects in a market mechanism, in order to maximise their own utility. A novel ant-colony inspired mechanism is used to learn the vision graph, that is, the camera neighbourhood relations, during runtime, which may then be used to optimise communication between cameras. The key benefits of our completely decentralised approach are on the one hand generating the vision graph online, enabling efficient deployment in unknown scenarios and camera network topologies, and on the other hand relying only on local information, increasing the robustness of the system. Since our market-based approach does not rely on a priori topology information, the need for any multicamera calibration can be avoided. We have evaluated our approach both in a simulation study and in network of real distributed smart cameras
Finding Field of View Overlap by Motion Analysis
Network cameras has in the recent years become more powerful. Each camera is independent and has its own surveillance task. It is reasonable that network cameras in the future should cooperate together to increase surveillance effectiveness. There is a need to find cameras sharing the same field of view in order for an operator to switch perspective. This thesis investigates how multiple network cameras can cooperate by finding the shared field of view between cameras. With the shared field of view, we implement an additional knowledge above system of network cameras and new use cases arises. Our method consist of applying a grid of cells on each camera's video stream and study movement detection. We gather contradicting proof of connectedness between each cell in the whole network of cameras. Our method avoids problems with feature detection such as different perspectives or image quality. We found that our method works with promising results and we can find shared field of view between cameras. There is a limitation in memory of storing all cells and we can only find overlap in regions with movement. This field has not been researched so much, making evaluation hard, as many approaches focuses on feature detection.Network cameras has significantly increased in the recent years. Each camera is independent and its monitoring task can easily be remotely altered. This work focuses on increasing effectiveness of a system of cameras by adding additional knowledge on top of the system and by utilizing multiple cameras when a shared field of view has been found
Multimedia Forensics
This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
Multimedia Forensics
This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
Recommended from our members
User-centred video abstraction
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonThe rapid growth of digital video content in recent years has imposed the need for the development of technologies with the capability to produce condensed but semantically rich versions of the input video stream in an effective manner. Consequently, the topic of Video Summarisation is becoming increasingly popular in multimedia community and numerous video abstraction approaches have been proposed accordingly. These recommended techniques can be divided into two major categories of automatic and semi-automatic in accordance with the required level of human intervention in summarisation process. The fully-automated methods mainly adopt the low-level visual, aural and textual features alongside the mathematical and statistical algorithms in furtherance to extract the most significant segments of original video. However, the effectiveness of this type of techniques is restricted by a number of factors such as domain-dependency, computational expenses and the inability to understand the semantics of videos from low-level features. The second category of techniques however, attempts to alleviate the quality of summaries by involving humans in the abstraction process to bridge the semantic gap. Nonetheless, a single user’s subjectivity and other external contributing factors such as distraction will potentially deteriorate the performance of this group of approaches. Accordingly, in this thesis we have focused on the development of three user-centred effective video summarisation techniques that could be applied to different video categories and generate satisfactory results. According to our first proposed approach, a novel mechanism for a user-centred video summarisation has been presented for the scenarios in which multiple actors are employed in the video summarisation process in order to minimise the negative effects of sole user adoption. Based on our recommended algorithm, the video frames were initially scored by a group of video annotators ‘on the fly’. This was followed by averaging these assigned scores in order to generate a singular saliency score for each video frame and, finally, the highest scored video frames alongside the corresponding audio and textual contents were extracted to be included into the final summary. The effectiveness of our approach has been assessed by comparing the video summaries generated based on our approach against the results obtained from three existing automatic summarisation tools that adopt different modalities for abstraction purposes. The experimental results indicated that our proposed method is capable of delivering remarkable outcomes in terms of Overall Satisfaction and Precision with an acceptable Recall rate, indicating the usefulness of involving user input in the video summarisation process. In an attempt to provide a better user experience, we have proposed our personalised video summarisation method with an ability to customise the generated summaries in accordance with the viewers’ preferences. Accordingly, the end-user’s priority levels towards different video scenes were captured and utilised for updating the average scores previously assigned by the video annotators. Finally, our earlier proposed summarisation method was adopted to extract the most significant audio-visual content of the video. Experimental results indicated the capability of this approach to deliver superior outcomes compared with our previously proposed method and the three other automatic summarisation tools. Finally, we have attempted to reduce the required level of audience involvement for personalisation purposes by proposing a new method for producing personalised video summaries. Accordingly, SIFT visual features were adopted to identify the video scenes’ semantic categories. Fusing this retrieved data with pre-built users’ profiles, personalised video abstracts can be created. Experimental results showed the effectiveness of this method in delivering superior outcomes comparing to our previously recommended algorithm and the three other automatic summarisation techniques
Static, dynamic, and adaptive heterogeneity in distributed smart camera networks
We study heterogeneity among nodes in self-organizing smart camera networks, which use strategies based on social and economic knowledge to target communication activity efficiently. We compare homogeneous configurations, when cameras use the same strategy, with heterogeneous configurations, when cameras use different strategies. Our first contribution is to establish that static heterogeneity leads to new outcomes that are more efficient than those possible with homogeneity. Next, two forms of dynamic heterogeneity are investigated: nonadaptive mixed strategies and adaptive strategies, which learn online. Our second contribution is to show that mixed strategies offer Pareto efficiency consistently comparable with the most efficient static heterogeneous configurations. Since the particular configuration required for high Pareto efficiency in a scenario will not be known in advance, our third contribution is to show how decentralized online learning can lead to more efficient outcomes than the homogeneous case. In some cases, outcomes from online learning were more efficient than all other evaluated configuration types. Our fourth contribution is to show that online learning typically leads to outcomes more evenly spread over the objective space. Our results provide insight into the relationship between static, dynamic, and adaptive heterogeneity, suggesting that all have a key role in achieving efficient self-organization
Automatic Food Intake Assessment Using Camera Phones
Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user\u27s memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies.
In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation.
This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs.
A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation.
To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors
Modeling and Optimizing the Coverage of Multi-Camera Systems
This thesis approaches the problem of modeling a multi-camera system\u27s performance from system and task parameters by describing the relationship in terms of coverage. This interface allows a substantial separation of the two concerns: the ability of the system to obtain data from the space of possible stimuli, according to task requirements, and the description of the set of stimuli required for the task. The conjecture is that for any particular system, it is in principle possible to develop such a model with ideal prediction of performance. Accordingly, a generalized structure and tool set is built around the core mathematical definitions of task-oriented coverage, without tying it to any particular model. A family of problems related to coverage in the context of multi-camera systems is identified and described. A comprehensive survey of the state of the art in approaching such problems concludes that by coupling the representation of coverage to narrow problem cases and applications, and by attempting to simplify the models to fit optimization techniques, both the generality and the fidelity of the models are reduced. It is noted that models exhibiting practical levels of fidelity are well beyond the point where only metaheuristic optimization techniques are applicable. Armed with these observations and a promising set of ideas from surveyed sources, a new high-fidelity model for multi-camera vision based on the general coverage framework is presented. This model is intended to be more general in scope than previous work, and despite the complexity introduced by the multiple criteria required for fidelity, it conforms to the framework and is thus tractable for certain optimization approaches. Furthermore, it is readily extended to different types of vision systems. This thesis substantiates all of these claims. The model\u27s fidelity and generality is validated and compared to some of the more advanced models from the literature. Three of the aforementioned coverage problems are then approached in application cases using the model. In one case, a bistatic variant of the sensing modality is used, requiring a modification of the model; the compatibility of this modification, both conceptually and mathematically, illustrates the generality of the framework
- …