    Advanced Hough-based method for on-device document localization

    The demand for on-device document recognition systems increases in conjunction with the emergence of more strict privacy and security requirements. In such systems, there is no data transfer from the end device to a third-party information processing servers. The response time is vital to the user experience of on-device document recognition. Combined with the unavailability of discrete GPUs, powerful CPUs, or a large RAM capacity on consumer-grade end devices such as smartphones, the time limitations put significant constraints on the computational complexity of the applied algorithms for on-device execution. In this work, we consider document location in an image without prior knowledge of the docu-ment content or its internal structure. In accordance with the published works, at least 5 systems offer solutions for on-device document location. All these systems use a location method which can be considered Hough-based. The precision of such systems seems to be lower than that of the state-of-the-art solutions which were not designed to account for the limited computational resources. We propose an advanced Hough-based method. In contrast with other approaches, it accounts for the geometric invariants of the central projection model and combines both edge and color features for document boundary detection. The proposed method allowed for the second best result for SmartDoc dataset in terms of precision, surpassed by U-net like neural network. When evaluated on a more challenging MIDV-500 dataset, the proposed algorithm guaranteed the best precision compared to published methods. Our method retained the applicability to on-device computations.This work is partially supported by Russian Foundation for Basic Research (projects 18-29-26035 and 19-29-09092)

    Digital life stories: Semi-automatic (auto)biographies within lifelog collections

    Our life stories enable us to reflect upon and share our personal histories. Through emerging digital technologies the possibility of collecting life experiences digitally is increasingly feasible; consequently so is the potential to create a digital counterpart to our personal narratives. In this work, lifelogging tools are used to collect digital artifacts continuously and passively throughout our day. These include images, documents, emails and webpages accessed; texts messages and mobile activity. This range of data when brought together is known as a lifelog. Given the complexity, volume and multimodal nature of such collections, it is clear that there are significant challenges to be addressed in order to achieve coherent and meaningful digital narratives of our events from our life histories. This work investigates the construction of personal digital narratives from lifelog collections. It examines the underlying questions, issues and challenges relating to construction of personal digital narratives from lifelogs. Fundamentally, it addresses how to organize and transform data sampled from an individual’s day-to-day activities into a coherent narrative account. This enquiry is enabled by three 20-month long-term lifelogs collected by participants and produces a narrative system which enables the semi-automatic construction of digital stories from lifelog content. Inspired by probative studies conducted into current practices of curation, from which a set of fundamental requirements are established, this solution employs a 2-dimensional spatial framework for storytelling. It delivers integrated support for the structuring of lifelog content and its distillation into storyform through information retrieval approaches. We describe and contribute flexible algorithmic approaches to achieve both. Finally, this research inquiry yields qualitative and quantitative insights into such digital narratives and their generation, composition and construction. The opportunities for such personal narrative accounts to enable recollection, reminiscence and reflection with the collection owners are established and its benefit in sharing past personal experience experiences is outlined. Finally, in a novel investigation with motivated third parties we demonstrate the opportunities such narrative accounts may have beyond the scope of the collection owner in: personal, societal and cultural explorations, artistic endeavours and as a generational heirloom

    Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data

    This book gives a start-to-finish overview of the whole Fish4Knowledge project, in 18 short chapters, each describing one aspect of the project. The Fish4Knowledge project explored the possibilities of big video data, in this case from undersea video. Recording and analyzing 90 thousand hours of video from ten camera locations, the project gives a 3 year view of fish abundance in several tropical coral reefs off the coast of Taiwan. The research system built a remote recording network, over 100 Tb of storage, supercomputer processing, video target detection and

    Providing effective memory retrieval cues through automatic structuring and augmentation of a lifelog of images

    Lifelogging is an area of research which is concerned with the capture of many aspects of an individual's life digitally, and within this rapidly emerging field is the significant challenge of managing images passively captured by an individual of their daily life. Possible applications vary from helping those with neurodegenerative conditions recall events from memory, to the maintenance and augmentation of extensive image collections of a tourist's trips. However, a large lifelog of images can quickly amass, with an average of 700,000 images captured each year, using a device such as the SenseCam. We address the problem of managing this vast collection of personal images by investigating automatic techniques that: 1. Identify distinct events within a full day of lifelog images (which typically consists of 2,000 images) e.g. breakfast, working on PC, meeting, etc. 2. Find similar events to a given event in a person's lifelog e.g. "show me other events where I was in the park" 3. Determine those events that are more important or unusual to the user and also select a relevant keyframe image for visual display of an event e.g. a "meeting" is more interesting to review than "working on PC" 4. Augment the images from a wearable camera with higher quality images from external "Web 2.0" sources e.g. find me pictures taken by others of the U2 concert in Croke Park In this dissertation we discuss novel techniques to realise each of these facets and how effective they are. The significance of this work is not only of benefit to the lifelogging community, but also to cognitive psychology researchers studying the potential benefits of lifelogging devices to those with neurodegenerative diseases

    Remote Sensing for Land Administration

    A tremendous amount of digital visual data is being collected every day, and we need efficient and effective algorithms to extract useful information from that data. Considering the complexity of visual data and the expense of human labor, we expect algorithms to have enhanced generalization capability and depend less on domain knowledge. While many topics in computer vision have benefited from machine learning, some document analysis and image quality assessment problems still have not found the best way to utilize it. In the context of document images, a compelling need exists for reliable methods to categorize and extract key information from captured images. In natural image content analysis, accurate quality assessment has become a critical component for many applications. Most current approaches, however, rely on the heuristics designed by human observations on severely limited data. These approaches typically work only on specific types of images and are hard to generalize on complex data from real applications. This dissertation looks to address the challenges of processing heterogeneous visual data by applying effective learning methods that directly model the data with minimal preprocessing and feature engineering. We focus on three important problems - text line detection, document image categorization, and image quality assessment. The data we work on typically contains unconstrained layouts, styles, or noise, which resemble the real data from applications. First, we present a graph-based method, learning the line structure from training data for text line segmentation in handwritten document images, and a general framework to detect multi-oriented scene text lines using Higher-Order Correlation Clustering. Our method depends less on domain knowledge and is robust to variations in fonts or languages. Second, we introduce a general approach for document image genre classification using Convolutional Neural Networks (CNN). The introduction of CNNs for document image genre classification largely reduces the needs of hand-crafted features or domain knowledge. Third, we present our CNN based methods to general-purpose No-Reference Image Quality Assessment (NR-IQA). Our methods bridge the gap between NR-IQA and CNN and opens the door to a broad range of deep learning methods. With excellent local quality estimation ability, our methods demonstrate the state of art performance on both distortion identification and quality estimation

    Combining Machine Learning with Computer Vision for Precision Agriculture Applications

    University of Minnesota Ph.D. dissertation. April 2018. Major: Computer Science. Advisor: Nikolaos Papanikolopoulos. 1 computer file (PDF); x, 93 pages.Financial and social elements of modern societies are closely connected to the cultivation of corn. Due to its massive production, deficiencies during the cultivation process directly translate to major financial losses. Existing field monitoring solutions utilize aerial and ground means towards identifying sectors of the farmland presenting under-performing crops. Nevertheless, an inference element is still absent; that is the automated diagnose of the cause and severity of the deficiency. The early detection and treatment of crops deficiencies and the frequent evaluation of their growth status are thus tasks of great significance. Towards an automated health condition assessment, this thesis introduces schemes for the computation of plant health indices. First, we propose a methodology to detect nitrogen (N) deficiencies in corn fields and assess their severity at an early stage using low-cost RGB sensors. The introduced methodology is twofold. First, a low complexity recommendation scheme identifies candidate plants exhibiting nitrogen deficiency and second, a detection elimination step completes the inference loop by deciding which of the candidate plants are actually exhibiting that condition. Experimental results on a diverse real-world dataset achieve a 90.6% accuracy for the detection of N-deficient regions and support the extension of this methodology to other crops and deficiencies that show similar visual characteristics. Second, based on the 3D reconstruction of small batches of corn plants at growth stages between ''V3'' and ''V6'', an automated alternative to existing manual and cumbersome phenotype estimation methodologies is presented. The use of 3D models provides an elevated information content, when compared to planar methods, mainly due to the alleviation of leaf occlusions. High-resolution images of corn stalks are collected and used to obtain 3D models of plants of interest. Based on the extracted 3D point clouds, the calculation of a plethora of phenotypic characteristics for each 3D reconstruction are obtained such as the number of plants depicted with 88.1% accuracy, Leaf Area Index (LAI) with 92.48% accuracy, the height with 89.2% accuracy, the leaf length with 74.8% accuracy, and the location and the angles of leaves with respect to the stem. The last two variables are connected by showing the trend of the angles to change with respect to the leaf position on the stem as the crops grow. An experimental validation using both artificially made corn plants emulating real-world scenarios and real corn plants in different growth stages supports the efficacy of the proposed methodology. Although the proposed methodologies are agnostic to the platform that performs the data collection, for the presented experiments a MikroKopter Okto XL equipped with a Nikon D7200 RGB sensor and a DJI Matrice 100 with a Zenmuse X3 and a Zenmuze Z3 RGB high-resolution cameras were used. The flight altitude ranged between 6 and 15 m and the resolution of the images varies within a range of 0.2 to 0.47 cm/pixel. Thorough data collection and interpretation leads to a better understanding of the needs not only of the farm as a whole but to each individual plant providing a much higher granularity to potential treatment strategies. Through the thoughtful utilization of modern computer vision techniques, it is possible to achieve positive financial and environmental results for these tasks. The conclusions of this work, suggest a fully automated scheme for information gathering in modern farms capable of replacing current labor-intensive procedures, thus greatly impacting the timely detection of crop deficiencies

    Expressive movement generation with machine learning

    Movement is an essential aspect of our lives. Not only do we move to interact with our physical environment, but we also express ourselves and communicate with others through our movements. In an increasingly computerized world where various technologies and devices surround us, our movements are essential parts of our interaction with and consumption of computational devices and artifacts. In this context, incorporating an understanding of our movements within the design of the technologies surrounding us can significantly improve our daily experiences. This need has given rise to the field of movement computing – developing computational models of movement that can perceive, manipulate, and generate movements. In this thesis, we contribute to the field of movement computing by building machine-learning-based solutions for automatic movement generation. In particular, we focus on using machine learning techniques and motion capture data to create controllable, generative movement models. We also contribute to the field by creating datasets, tools, and libraries that we have developed during our research. We start our research by reviewing the works on building automatic movement generation systems using machine learning techniques and motion capture data. Our review covers background topics such as high-level movement characterization, training data, features representation, machine learning models, and evaluation methods. Building on our literature review, we present WalkNet, an interactive agent walking movement controller based on neural networks. The expressivity of virtual, animated agents plays an essential role in their believability. Therefore, WalkNet integrates controlling the expressive qualities of movement with the goal-oriented behaviour of an animated virtual agent. It allows us to control the generation based on the valence and arousal levels of affect, the movement’s walking direction, and the mover’s movement signature in real-time. Following WalkNet, we look at controlling movement generation using more complex stimuli such as music represented by audio signals (i.e., non-symbolic music). Music-driven dance generation involves a highly non-linear mapping between temporally dense stimuli (i.e., the audio signal) and movements, which renders a more challenging modelling movement problem. To this end, we present GrooveNet, a real-time machine learning model for music-driven dance generation