Search CORE

172 research outputs found

Autonomous Cleaning of Corrupted Scanned Documents - A Generative Modeling Approach

Author: Dai Zhenwen
Lücke Jörg
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/07/2012
Field of study

We study the task of cleaning scanned text documents that are strongly corrupted by dirt such as manual line strokes, spilled ink etc. We aim at autonomously removing dirt from a single letter-size page based only on the information the page contains. Our approach, therefore, has to learn character representations without supervision and requires a mechanism to distinguish learned representations from irregular patterns. To learn character representations, we use a probabilistic generative model parameterizing pattern features, feature variances, the features' planar arrangements, and pattern frequencies. The latent variables of the model describe pattern class, pattern position, and the presence or absence of individual pattern features. The model parameters are optimized using a novel variational EM approximation. After learning, the parameters represent, independently of their absolute position, planar feature arrangements and their variances. A quality measure defined based on the learned representation then allows for an autonomous discrimination between regular character patterns and the irregular patterns making up the dirt. The irregular patterns can thus be removed to clean the document. For a full Latin alphabet we found that a single page does not contain sufficiently many character examples. However, even if heavily corrupted by dirt, we show that a page containing a lower number of character types can efficiently and autonomously be cleaned solely based on the structural regularity of the characters it contains. In different examples using characters from different alphabets, we demonstrate generality of the approach and discuss its implications for future developments.Comment: oral presentation and Google Student Travel Award; IEEE conference on Computer Vision and Pattern Recognition 201

arXiv.org e-Print Archive

Crossref

KINE[SIS]TEM'17 From Nature to Architectural Matter

Author: Oliveira M. J.
Osório F.
Publication venue: DINÂMIA'CET - IUL
Publication date: 01/05/2017
Field of study

Kine[SiS]tem – From Kinesis + System. Kinesis is a non-linear movement or activity of an organism in response to a stimulus. A system is a set of interacting and interdependent agents forming a complex whole, delineated by its spatial and temporal boundaries, influenced by its environment. How can architectural systems moderate the external environment to enhance comfort conditions in a simple, sustainable and smart way? This is the starting question for the Kine[SiS]tem’17 – From Nature to Architectural Matter International Conference. For decades, architectural design was developed despite (and not with) the climate, based on mechanical heating and cooling. Today, the argument for net zero energy buildings needs very effective strategies to reduce energy requirements. The challenge ahead requires design processes that are built upon consolidated knowledge, make use of advanced technologies and are inspired by nature. These design processes should lead to responsive smart systems that deliver the best performance in each specific design scenario. To control solar radiation is one key factor in low-energy thermal comfort. Computational-controlled sensor-based kinetic surfaces are one of the possible answers to control solar energy in an effective way, within the scope of contradictory objectives throughout the year.FC

Repositório Institucional do ISCTE-IUL

A Comparative Study of Text Summarization on E-mail Data Using Unsupervised Learning Approaches

Author: Thomas Tijo
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2020
Field of study

Over the last few years, email has met with enormous popularity. People send and receive a lot of messages every day, connect with colleagues and friends, share files and information. Unfortunately, the email overload outbreak has developed into a personal trouble for users as well as a financial concerns for businesses. Accessing an ever-increasing number of lengthy emails in the present generation has become a major concern for many users. Email text summarization is a promising approach to resolve this challenge. Email messages are general domain text, unstructured and not always well developed syntactically. Such elements introduce challenges for study in text processing, especially for the task of summarization. This research employs a quantitative and inductive methodologies to implement the Unsupervised learning models that addresses summarization task problem, to efficiently generate more precise summaries and to determine which approach of implementing Unsupervised clustering models outperform the best. The precision score from ROUGE-N metrics is used as the evaluation metrics in this research. This research evaluates the performance in terms of the precision score of four different approaches of text summarization by using various combinations of feature embedding technique like Word2Vec /BERT model and hybrid/conventional clustering algorithms. The results reveals that both the approaches of using Word2Vec and BERT feature embedding along with hybrid PHA-ClusteringGain k-Means algorithm achieved increase in the precision when compared with the conventional k-means clustering model. Among those hybrid approaches performed, the one using Word2Vec as feature embedding method attained 55.73% as maximum precision value

Arrow@TUDublin

Towards autonomous diagnostic systems with medical imaging

Author: Vlontzos Athanasios
Publication venue: Computing, Imperial College London
Publication date: 01/12/2022
Field of study

Democratizing access to high quality healthcare has highlighted the need for autonomous diagnostic systems that a non-expert can use. Remote communities, first responders and even deep space explorers will come to rely on medical imaging systems that will provide them with Point of Care diagnostic capabilities. This thesis introduces the building blocks that would enable the creation of such a system. Firstly, we present a case study in order to further motivate the need and requirements of autonomous diagnostic systems. This case study primarily concerns deep space exploration where astronauts cannot rely on communication with earth-bound doctors to help them through diagnosis, nor can they make the trip back to earth for treatment. Requirements and possible solutions about the major challenges faced with such an application are discussed. Moreover, this work describes how a system can explore its perceived environment by developing a Multi Agent Reinforcement Learning method that allows for implicit communication between the agents. Under this regime agents can share the knowledge that benefits them all in achieving their individual tasks. Furthermore, we explore how systems can understand the 3D properties of 2D depicted objects in a probabilistic way. In Part II, this work explores how to reason about the extracted information in a causally enabled manner. A critical view on the applications of causality in medical imaging, and its potential uses is provided. It is then narrowed down to estimating possible future outcomes and reasoning about counterfactual outcomes by embedding data on a pseudo-Riemannian manifold and constraining the latent space by using the relativistic concept of light cones. By formalizing an approach to estimating counterfactuals, a computationally lighter alternative to the abduction-action-prediction paradigm is presented through the introduction of Deep Twin Networks. Appropriate partial identifiability constraints for categorical variables are derived and the method is applied in a series of medical tasks involving structured data, images and videos. All methods are evaluated in a wide array of synthetic and real life tasks that showcase their abilities, often achieving state-of-the-art performance or matching the existing best performance while requiring a fraction of the computational cost.Open Acces

Spiral - Imperial College Digital Repository

Multimedia Retrieval

Author
Publication venue: Springer
Publication date: 01/01/2007
Field of study

University of Twente Research Information

Intelligence artificielle: Les défis actuels et l'action d'Inria - Livre blanc Inria

Author: Braunschweig Bertrand
Publication venue: INRIA
Publication date: 01/01/2021
Field of study

Livre blanc Inria N°01International audienceInria white papers look at major current challenges in informatics and mathematics and show actions conducted by our project-teams to address these challenges. This document is the first produced by the Strategic Technology Monitoring & Prospective Studies Unit. Thanks to a reactive observation system, this unit plays a lead role in supporting Inria to develop its strategic and scientific orientations. It also enables the institute to anticipate the impact of digital sciences on all social and economic domains. It has been coordinated by Bertrand Braunschweig with contributions from 45 researchers from Inria and from our partners. Special thanks to Peter Sturm for his precise and complete review.Les livres blancs d’Inria examinent les grands défis actuels du numérique et présentent les actions menées par noséquipes-projets pour résoudre ces défis. Ce document est le premier produit par la cellule veille et prospective d’Inria. Cette unité, par l’attention qu’elle porte aux évolutions scientifiques et technologiques, doit jouer un rôle majeur dans la détermination des orientations stratégiques et scientifiques d’Inria. Elle doit également permettre à l’Institut d’anticiper l’impact des sciences du numérique dans tous les domaines sociaux et économiques. Ce livre blanc a été coordonné par Bertrand Braunschweig avec des contributions de 45 chercheurs d’Inria et de ses partenaires. Un grand merci à Peter Sturm pour sa relecture précise et complète. Merci également au service STIP du centre de Saclay – Île-de-France pour la correction finale de la version française

INRIA a CCSD electronic archive server

Data Mining

Author
Publication venue: 'IntechOpen'
Publication date: 27/07/2022
Field of study

The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

Directory of Open Access Books (DOAB)

Advances in Artificial Intelligence: Models, Optimization, and Machine Learning

Author
Publication venue: 'MDPI AG'
Publication date: 06/07/2022
Field of study

The present book contains all the articles accepted and published in the Special Issue “Advances in Artificial Intelligence: Models, Optimization, and Machine Learning” of the MDPI Mathematics journal, which covers a wide range of topics connected to the theory and applications of artificial intelligence and its subfields. These topics include, among others, deep learning and classic machine learning algorithms, neural modelling, architectures and learning algorithms, biologically inspired optimization algorithms, algorithms for autonomous driving, probabilistic models and Bayesian reasoning, intelligent agents and multiagent systems. We hope that the scientific results presented in this book will serve as valuable sources of documentation and inspiration for anyone willing to pursue research in artificial intelligence, machine learning and their widespread applications

Directory of Open Access Books (DOAB)