1,073 research outputs found

    Investigating the latency cost of statistical learning of a Gaussian mixture simulating on a convolutional density network with adaptive batch size technique for background modeling

    Get PDF
    Background modeling is a promising field of study in video analysis, with a wide range of applications in video surveillance. Deep neural networks have proliferated in recent years as a result of effective learning-based approaches to motion analysis. However, these strategies only provide a partial description of the observed scenes' insufficient properties since they use a single-valued mapping to estimate the target background's temporal conditional averages. On the other hand, statistical learning in the imagery domain has become one of the most widely used approaches due to its high adaptability to dynamic context transformation, especially Gaussian Mixture Models. Specifically, these probabilistic models aim to adjust latent parameters to gain high expectation of realistically observed data; however, this approach only concentrates on contextual dynamics in short-term analysis. In a prolonged investigation, it is challenging so that statistical methods cannot reserve the generalization of long-term variation of image data. Balancing the trade-off between traditional machine learning models and deep neural networks requires an integrated approach to ensure accuracy in conception while maintaining a high speed of execution. In this research, we present a novel two-stage approach for detecting changes using two convolutional neural networks in this work. The first architecture is based on unsupervised Gaussian mixtures statistical learning, which is used to classify the salient features of scenes. The second one implements a light-weighted pipeline of foreground detection. Our two-stage system has a total of approximately 3.5K parameters but still converges quickly to complex motion patterns. Our experiments on publicly accessible datasets demonstrate that our proposed networks are not only capable of generalizing regions of moving objects with promising results in unseen scenarios, but also competitive in terms of performance quality and effectiveness foreground segmentation. Apart from modeling the data's underlying generator as a non-convex optimization problem, we briefly examine the communication cost associated with the network training by using a distributed scheme of data-parallelism to simulate a stochastic gradient descent algorithm with communication avoidance for parallel machine learnin

    A survey of the application of soft computing to investment and financial trading

    Get PDF

    Delicar: A smart deep learning based self driving product delivery car in perspective of Bangladesh

    Get PDF
    The rapid expansion of a country’s economy is highly dependent on timely product distribution, which is hampered by terrible traffic congestion. Additional staff are also required to follow the delivery vehicle while it transports documents or records to another destination. This study proposes Delicar, a self-driving product delivery vehicle that can drive the vehicle on the road and report the current geographical location to the authority in real-time through a map. The equipped camera module captures the road image and transfers it to the computer via socket server programming. The raspberry pi sends the camera image and waits for the steering angle value. The image is fed to the pre-trained deep learning model that predicts the steering angle regarding that situation. Then the steering angle value is passed to the raspberry pi that directs the L298 motor driver which direction the wheel should follow. Based upon this direction, L298 decides either forward or left or right or backwards movement. The 3-cell 12V LiPo battery handles the power supply to the raspberry pi and L298 motor driver. A buck converter regulates a 5V 3A power supply to the raspberry pi to be working. Nvidia CNN architecture has been followed, containing nine layers including five convolution layers and three dense layers to develop the steering angle predictive model. Geoip2 (a python library) retrieves the longitude and latitude from the equipped system’s IP address to report the live geographical position to the authorities. After that, Folium is used to depict the geographical location. Moreover, the system’s infrastructure is far too low-cost and easy to install.publishedVersio

    Study and Development of Some Novel Image Segmentation Techniques

    Get PDF
    Some fuzzy technique based segmentation methods are studied and implemented and some fuzzy c means clustering based segmentation algorithms are developed in this thesis to suppress high and low uniform random noise. The reason for not developing fuzzy rule based segmentation method is that they are application dependent In many occasions, the images in real life are affected with noise. Fuzzy c means clustering based segmentation does not give good segmentation result under such condition. Various extension of the FCM method for segmentation are present in the literature. But most of them modify the objective function hence changing the basic FCM algorithm present in MATLAB toolboxes. Hence efforts have been made to develop FCM algorithm without modifying their objective function for better segmentation . The fuzzy technique based segmentation methods that are studied and developed are summarized here. (A) Fuzzy edge detection based segmentation: Two fuzzy edge detection methods are studied and implemented for segmentation: (i) FIS based edge detection and (ii) Fast multilevel fuzzy edge detector (FMFED). (i): The Fuzzy Inference system (FIS) based edge detector consists of some fuzzy inference rules which are defined in such a way that the FIS system output (“edges”) is high only for those pixels belonging to edges in the input image. A robustness to contrast and lightining variations were also taken into consideration while developing these rules.The output of the FIS based edge detector is then compared with the existing Sobel, LoG and Canny edge detector results. The algorithm is seen to be application dependent and time consuming. (ii) Fast Multilevel Fuzzy Edge Detector: To realise the fast and accurate detection of edges, the FMFED algorithm is proposed. It first enhances the image contrast by means of a fast multilevel fuzzy enhancement algorithm using simple transformation function based on two image thresholds. Second, the edges are extracted from the enhanced image by using a two stage edge detector operator that identifies the edge candidates based on local characteristics of the image and then determines the true edge pixels using edge detector operator based on extremum of the gradient values. Finally the segmentation of the edge image is done by morphological operator by edge linking. (B) FCM based segmentation: Two fuzzy clustering based segmentation methods are developed: (i) Modified Spatial Fuzzy c-Means (MSFCM) (ii) Neighbourhood Attraction Fuzzy c-Means (NAFCM). . (i) Contrast-Limited Adaptive Histogram Equalization Fuzzy c-Means (CLAHEFCM): This proposed algorithm presents a color segmentation process for low contrast images or unevenly illuminated images. The algorithm presented in this paper first enhances the contrast of the image by using contrast limited adaptive histogram equalization. After the enhancement of the image this method divides the color space into a given number of clusters, the number of cluster are fixed initially. The image is converted from RGB color space to LAB color space before the clustering process. Clustering is done here by using Fuzzy c means algorithm. The image is segmented based on color of a region, that is, areas having same color are grouped together. The image segmentation is done by taking into consideration, to which cluster a given pixel belongs the most. The method has been applied on a number of color test images and it is observed to give good segmentation results (ii) Modified Spatial Fuzzy c-means (MSFCM): The proposed algorithm divides the color space into a given number of clusters, the number of cluster are fixed initially. The image is converted from RGB color space to LAB color space before the clustering process. A robust segmentation technique based on extension to the traditional fuzzy c-means (FCM) clustering algorithm is proposed. The spatial information of each pixel in an image has been taken into consideration to get a noise free segmentation result. The image is segmented based on color of a region, that is, areas having same color are grouped together. The image segmentation is done by taking into consideration, to which cluster a given pixel belongs the most. The method has been applied to some color test images and its performance has been compared to FCM and FCM based methods to show its superiority over them. The proposed technique is observed to be an efficient and easy method for segmentation of noisy images. (iv) Neighbourhood Attraction Fuzzy c Means Algorithm: A new algorithm based on the IFCM neighbourhood attraction is used without changing the distance function of the FCM and hence avoiding an extra neural network optimization step for the adjusting parameters of the distance function, it is called Neighborhood Atrraction FCM (NAFCM). During clustering, each pixel attempts to attract its neighbouring pixels towards its own cluster. This neighbourhood attraction depends on two factors: the pixel intensities or feature attraction, and the spatial position of the neighbours or distance attraction, which also depends on neighbourhood structure. The NAFCM algorithm is tested on a synthetic image (chapter 6, figure 6.3-6.6) and a number of skin tumor images. It is observed to produce excellent clustering result under high noise condition when compared with the other FCM based clustering methods

    Socio-Cognitive and Affective Computing

    Get PDF
    Social cognition focuses on how people process, store, and apply information about other people and social situations. It focuses on the role that cognitive processes play in social interactions. On the other hand, the term cognitive computing is generally used to refer to new hardware and/or software that mimics the functioning of the human brain and helps to improve human decision-making. In this sense, it is a type of computing with the goal of discovering more accurate models of how the human brain/mind senses, reasons, and responds to stimuli. Socio-Cognitive Computing should be understood as a set of theoretical interdisciplinary frameworks, methodologies, methods and hardware/software tools to model how the human brain mediates social interactions. In addition, Affective Computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects, a fundamental aspect of socio-cognitive neuroscience. It is an interdisciplinary field spanning computer science, electrical engineering, psychology, and cognitive science. Physiological Computing is a category of technology in which electrophysiological data recorded directly from human activity are used to interface with a computing device. This technology becomes even more relevant when computing can be integrated pervasively in everyday life environments. Thus, Socio-Cognitive and Affective Computing systems should be able to adapt their behavior according to the Physiological Computing paradigm. This book integrates proposals from researchers who use signals from the brain and/or body to infer people's intentions and psychological state in smart computing systems. The design of this kind of systems combines knowledge and methods of ubiquitous and pervasive computing, as well as physiological data measurement and processing, with those of socio-cognitive and affective computing

    Unsupervised quantification of entity consistency between photos and text in real-world news

    Get PDF
    Das World Wide Web und die sozialen Medien übernehmen im heutigen Informationszeitalter eine wichtige Rolle für die Vermittlung von Nachrichten und Informationen. In der Regel werden verschiedene Modalitäten im Sinne der Informationskodierung wie beispielsweise Fotos und Text verwendet, um Nachrichten effektiver zu vermitteln oder Aufmerksamkeit zu erregen. Kommunikations- und Sprachwissenschaftler erforschen das komplexe Zusammenspiel zwischen Modalitäten seit Jahrzehnten und haben unter Anderem untersucht, wie durch die Kombination der Modalitäten zusätzliche Informationen oder eine neue Bedeutungsebene entstehen können. Die Anzahl gemeinsamer Konzepte oder Entitäten (beispielsweise Personen, Orte und Ereignisse) zwischen Fotos und Text stellen einen wichtigen Aspekt für die Bewertung der Gesamtaussage und Bedeutung eines multimodalen Artikels dar. Automatisierte Ansätze zur Quantifizierung von Bild-Text-Beziehungen können für zahlreiche Anwendungen eingesetzt werden. Sie ermöglichen beispielsweise eine effiziente Exploration von Nachrichten, erleichtern die semantische Suche von Multimedia-Inhalten in (Web)-Archiven oder unterstützen menschliche Analysten bei der Evaluierung der Glaubwürdigkeit von Nachrichten. Allerdings gibt es bislang nur wenige Ansätze, die sich mit der Quantifizierung von Beziehungen zwischen Fotos und Text beschäftigen. Diese Ansätze berücksichtigen jedoch nicht explizit die intermodalen Beziehungen von Entitäten, welche eine wichtige Rolle in Nachrichten darstellen, oder basieren auf überwachten multimodalen Deep-Learning-Techniken. Diese überwachten Lernverfahren können ausschließlich die intermodalen Beziehungen von Entitäten detektieren, die in annotierten Trainingsdaten enthalten sind. Um diese Forschungslücke zu schließen, wird in dieser Arbeit ein unüberwachter Ansatz zur Quantifizierung der intermodalen Konsistenz von Entitäten zwischen Fotos und Text in realen multimodalen Nachrichtenartikeln vorgestellt. Im ersten Teil dieser Arbeit werden neuartige Verfahren auf Basis von Deep Learning zur Extrahierung von Informationen aus Fotos vorgestellt, um Ereignisse (Events), Orte, Zeitangaben und Personen automatisch zu erkennen. Diese Verfahren bilden eine wichtige Voraussetzung, um die Beziehungen von Entitäten zwischen Bild und Text zu bewerten. Zunächst wird ein Ansatz zur Ereignisklassifizierung präsentiert, der neuartige Optimierungsfunktionen und Gewichtungsschemata nutzt um Ontologie-Informationen aus einer Wissensdatenbank in ein Deep-Learning-Verfahren zu integrieren. Das Training erfolgt anhand eines neu vorgestellten Datensatzes, der 570.540 Fotos und eine Ontologie mit 148 Ereignistypen enthält. Der Ansatz übertrifft die Ergebnisse von Referenzsystemen die keine strukturierten Ontologie-Informationen verwenden. Weiterhin wird ein DeepLearning-Ansatz zur Schätzung des Aufnahmeortes von Fotos vorgeschlagen, der Kontextinformationen über die Umgebung (Innen-, Stadt-, oder Naturaufnahme) und von Erdpartitionen unterschiedlicher Granularität verwendet. Die vorgeschlagene Lösung übertrifft die bisher besten Ergebnisse von aktuellen Forschungsarbeiten, obwohl diese deutlich mehr Fotos zum Training verwenden. Darüber hinaus stellen wir den ersten Datensatz zur Schätzung des Aufnahmejahres von Fotos vor, der mehr als eine Million Bilder aus den Jahren 1930 bis 1999 umfasst. Dieser Datensatz wird für das Training von zwei Deep-Learning-Ansätzen zur Schätzung des Aufnahmejahres verwendet, welche die Aufgabe als Klassifizierungs- und Regressionsproblem behandeln. Beide Ansätze erzielen sehr gute Ergebnisse und übertreffen Annotationen von menschlichen Probanden. Schließlich wird ein neuartiger Ansatz zur Identifizierung von Personen des öffentlichen Lebens und ihres gemeinsamen Auftretens in Nachrichtenfotos aus der digitalen Bibliothek Internet Archiv präsentiert. Der Ansatz ermöglicht es unstrukturierte Webdaten aus dem Internet Archiv mit Metadaten, beispielsweise zur semantischen Suche, zu erweitern. Experimentelle Ergebnisse haben die Effektivität des zugrundeliegenden Deep-Learning-Ansatzes zur Personenerkennung bestätigt. Im zweiten Teil dieser Arbeit wird ein unüberwachtes System zur Quantifizierung von BildText-Beziehungen in realen Nachrichten vorgestellt. Im Gegensatz zu bisherigen Verfahren liefert es automatisch neuartige Maße der intermodalen Konsistenz für verschiedene Entitätstypen (Personen, Orte und Ereignisse) sowie den Gesamtkontext. Das System ist nicht auf vordefinierte Datensätze angewiesen, und kann daher mit der Vielzahl und Diversität von Entitäten und Themen in Nachrichten umgehen. Zur Extrahierung von Entitäten aus dem Text werden geeignete Methoden der natürlichen Sprachverarbeitung eingesetzt. Examplarbilder für diese Entitäten werden automatisch aus dem Internet beschafft. Die vorgeschlagenen Methoden zur Informationsextraktion aus Fotos werden auf die Nachrichten- und heruntergeladenen Exemplarbilder angewendet, um die intermodale Konsistenz von Entitäten zu quantifizieren. Es werden zwei Aufgaben untersucht um die Qualität des vorgeschlagenen Ansatzes in realen Anwendungen zu bewerten. Experimentelle Ergebnisse für die Dokumentverifikation und die Beschaffung von Nachrichten mit geringer (potenzielle Fehlinformation) oder hoher multimodalen Konsistenz zeigen den Nutzen und das Potenzial des Ansatzes zur Unterstützung menschlicher Analysten bei der Untersuchung von Nachrichten.In today’s information age, the World Wide Web and social media are important sources for news and information. Different modalities (in the sense of information encoding) such as photos and text are typically used to communicate news more effectively or to attract attention. Communication scientists, linguists, and semioticians have studied the complex interplay between modalities for decades and investigated, e.g., how their combination can carry additional information or add a new level of meaning. The number of shared concepts or entities (e.g., persons, locations, and events) between photos and text is an important aspect to evaluate the overall message and meaning of an article. Computational models for the quantification of image-text relations can enable many applications. For example, they allow for more efficient exploration of news, facilitate semantic search and multimedia retrieval in large (web) archives, or assist human assessors in evaluating news for credibility. To date, only a few approaches have been suggested that quantify relations between photos and text. However, they either do not explicitly consider the cross-modal relations of entities – which are important in the news – or rely on supervised deep learning approaches that can only detect the cross-modal presence of entities covered in the labeled training data. To address this research gap, this thesis proposes an unsupervised approach that can quantify entity consistency between photos and text in multimodal real-world news articles. The first part of this thesis presents novel approaches based on deep learning for information extraction from photos to recognize events, locations, dates, and persons. These approaches are an important prerequisite to measure the cross-modal presence of entities in text and photos. First, an ontology-driven event classification approach that leverages new loss functions and weighting schemes is presented. It is trained on a novel dataset of 570,540 photos and an ontology with 148 event types. The proposed system outperforms approaches that do not use structured ontology information. Second, a novel deep learning approach for geolocation estimation is proposed that uses additional contextual information on the environmental setting (indoor, urban, natural) and from earth partitions of different granularity. The proposed solution outperforms state-of-the-art approaches, which are trained with significantly more photos. Third, we introduce the first large-scale dataset for date estimation with more than one million photos taken between 1930 and 1999, along with two deep learning approaches that treat date estimation as a classification and regression problem. Both approaches achieve very good results that are superior to human annotations. Finally, a novel approach is presented that identifies public persons and their co-occurrences in news photos extracted from the Internet Archive, which collects time-versioned snapshots of web pages that are rarely enriched with metadata relevant to multimedia retrieval. Experimental results confirm the effectiveness of the deep learning approach for person identification. The second part of this thesis introduces an unsupervised approach capable of quantifying image-text relations in real-world news. Unlike related work, the proposed solution automatically provides novel measures of cross-modal consistency for different entity types (persons, locations, and events) as well as the overall context. The approach does not rely on any predefined datasets to cope with the large amount and diversity of entities and topics covered in the news. State-of-the-art tools for natural language processing are applied to extract named entities from the text. Example photos for these entities are automatically crawled from the Web. The proposed methods for information extraction from photos are applied to both news images and example photos to quantify the cross-modal consistency of entities. Two tasks are introduced to assess the quality of the proposed approach in real-world applications. Experimental results for document verification and retrieval of news with either low (potential misinformation) or high cross-modal similarities demonstrate the feasibility of the approach and its potential to support human assessors to study news
    corecore