179 research outputs found

    A Learning Approach for Adaptive Image Segmentation

    Get PDF
    International audienceAs mentioned in many papers, a lot of key parameters of image segmentation algorithms are manually tuned by designers. This induces a lack of flexibility of the segmentation step in many vision systems. By a dynamic control of these parameters, results of this crucial step could be drastically improved. We propose a scheme to automatically select segmentation algorithm and tune theirs key parameters thanks to a preliminary supervised learning stage. This paper details this learning approach which is composed by three steps: (1) optimal parameters extraction, (2) algorithm selection learning, and (3) generalization of parametrization learning. The major contribution is twofold: segmentation is adapted to the image to segment, and in the same time, this scheme can be used as a generic framework, independant of any application domain

    Pre-processing, classification and semantic querying of large-scale Earth observation spaceborne/airborne/terrestrial image databases: Process and product innovations.

    Get PDF
    By definition of Wikipedia, “big data is the term adopted for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The big data challenges typically include capture, curation, storage, search, sharing, transfer, analysis and visualization”. Proposed by the intergovernmental Group on Earth Observations (GEO), the visionary goal of the Global Earth Observation System of Systems (GEOSS) implementation plan for years 2005-2015 is systematic transformation of multisource Earth Observation (EO) “big data” into timely, comprehensive and operational EO value-adding products and services, submitted to the GEO Quality Assurance Framework for Earth Observation (QA4EO) calibration/validation (Cal/Val) requirements. To date the GEOSS mission cannot be considered fulfilled by the remote sensing (RS) community. This is tantamount to saying that past and existing EO image understanding systems (EO-IUSs) have been outpaced by the rate of collection of EO sensory big data, whose quality and quantity are ever-increasing. This true-fact is supported by several observations. For example, no European Space Agency (ESA) EO Level 2 product has ever been systematically generated at the ground segment. By definition, an ESA EO Level 2 product comprises a single-date multi-spectral (MS) image radiometrically calibrated into surface reflectance (SURF) values corrected for geometric, atmospheric, adjacency and topographic effects, stacked with its data-derived scene classification map (SCM), whose thematic legend is general-purpose, user- and application-independent and includes quality layers, such as cloud and cloud-shadow. Since no GEOSS exists to date, present EO content-based image retrieval (CBIR) systems lack EO image understanding capabilities. Hence, no semantic CBIR (SCBIR) system exists to date either, where semantic querying is synonym of semantics-enabled knowledge/information discovery in multi-source big image databases. In set theory, if set A is a strict superset of (or strictly includes) set B, then A B. This doctoral project moved from the working hypothesis that SCBIR computer vision (CV), where vision is synonym of scene-from-image reconstruction and understanding EO image understanding (EO-IU) in operating mode, synonym of GEOSS ESA EO Level 2 product human vision. Meaning that necessary not sufficient pre-condition for SCBIR is CV in operating mode, this working hypothesis has two corollaries. First, human visual perception, encompassing well-known visual illusions such as Mach bands illusion, acts as lower bound of CV within the multi-disciplinary domain of cognitive science, i.e., CV is conditioned to include a computational model of human vision. Second, a necessary not sufficient pre-condition for a yet-unfulfilled GEOSS development is systematic generation at the ground segment of ESA EO Level 2 product. Starting from this working hypothesis the overarching goal of this doctoral project was to contribute in research and technical development (R&D) toward filling an analytic and pragmatic information gap from EO big sensory data to EO value-adding information products and services. This R&D objective was conceived to be twofold. First, to develop an original EO-IUS in operating mode, synonym of GEOSS, capable of systematic ESA EO Level 2 product generation from multi-source EO imagery. EO imaging sources vary in terms of: (i) platform, either spaceborne, airborne or terrestrial, (ii) imaging sensor, either: (a) optical, encompassing radiometrically calibrated or uncalibrated images, panchromatic or color images, either true- or false color red-green-blue (RGB), multi-spectral (MS), super-spectral (SS) or hyper-spectral (HS) images, featuring spatial resolution from low (> 1km) to very high (< 1m), or (b) synthetic aperture radar (SAR), specifically, bi-temporal RGB SAR imagery. The second R&D objective was to design and develop a prototypical implementation of an integrated closed-loop EO-IU for semantic querying (EO-IU4SQ) system as a GEOSS proof-of-concept in support of SCBIR. The proposed closed-loop EO-IU4SQ system prototype consists of two subsystems for incremental learning. A primary (dominant, necessary not sufficient) hybrid (combined deductive/top-down/physical model-based and inductive/bottom-up/statistical model-based) feedback EO-IU subsystem in operating mode requires no human-machine interaction to automatically transform in linear time a single-date MS image into an ESA EO Level 2 product as initial condition. A secondary (dependent) hybrid feedback EO Semantic Querying (EO-SQ) subsystem is provided with a graphic user interface (GUI) to streamline human-machine interaction in support of spatiotemporal EO big data analytics and SCBIR operations. EO information products generated as output by the closed-loop EO-IU4SQ system monotonically increase their value-added with closed-loop iterations

    A Theory for All Music: Problems and Solutions in the Analysis of non-Western Forms

    Get PDF
    Professor Rahn takes the approach to the analysis of Western art music developed recently by theorists such as Benjamin Boretz and extends it to address non-Western forms. In the process, he rejects recent ethnomusicological formulations based on mentalism, cultural determinism, and the psychology of perception as potentially fruitful bases for analysing music in general. Instead he stresses the desirability of formulating a theory to deal with all music, rather than merely Western forms, and emphasizes the need to evaluate an analysis and compare it with other interpretations, and demonstrates how this may be done. The theoretical concepts which form the basis of Rahn's approach are discussed and applied: first to individual pieces of non-Western music which have enjoyed a fairly high profile in ethnomusicological literature, and second to repertoires or groups of pieces. The author also discusses the fields of anthropology and psychology, showing how his approach serves as a starting point for studies of perception and the concepts, norms, and values found in specific music cultures. In conclusion, he lists what he considers to be musical universals and takes up the more controversial issues implicit in his discussion

    Language and culture in perception: a three-pronged investigation of phylogenetic, ontogenetic and cross-cultural evidence

    Get PDF
    Brown and Lenneberg (I954) and Rosch Heider (1972) were among the first to conduct psychological investigations to test the Whorfian view that language affects thought. They both asked about colour categories. The debate has continued with some research supporting a relativist (Whorfian) account (Davidoff, Davies & Roberson, I999; Borodistsky, 200I), and some supporting a universalist account (e.g., Kay & Regier, 2003; Spelke & Kinzler, 2007). The present thesis adds to the debate by taking three different approaches i.e., cross-cultural, ontogenetic and phylogenetic frames in which to carry out investigations of categorization of various perceptual continua. Categorical Perception's hallmark is the effect of mental warping of space such as has been found for phonemes (Pisani & Tash, I974) and colour (Bornstein & Monroe, I980; Bornstein & Korda, I984). With respect to colours, those that cross a category boundary seem more distant than two otherwise equally spaced colours from the same category. Warping is tested using cognitive methods such as two-alternative-forced-choice and matching-to-sample. Evidence is considered for the continua under investigation i.e. colour and animal patterns. Experiments 1 and 2 find evidence of categorical perception for human-primates and not for monkeys. Experiment 3 finds that Himba and English human adults categorize differently, particularly for colours crossing a category boundary, but also show broad similarity in solving the same matching-to-sample task as used with the monkeys (experiment I) who showed clear differences with humans. Experiment 4 and 5 tested Himba and English toddlers and found categorical perception of colour mainly for toddlers that knew their colour terms despite prior findings (Franklin et al., 2005) indicative of universal colour categories. In experiment 6, Himba and English categorical perception of animal patterns was tested for the first time, and result indicate a cross-category advantage for participants who knew the animal pattern terms. Therefore, a weak Whorfian view of linguistic relativity's role in obtaining categorical perception effects is presented. Although there is some evidence of an inherent human way of grouping drawn from results of experiment 1 and 3, results in all experiments (1,2,3,4,5, and 6) show that linguistic labels and categorical perception effects go hand-in-hand; categorization effects are not found when linguistic terms are not acquired at test and have not had a chance to affect cognition. This was true for all populations under observation in this set of studies, providing further support for effects of language and culture in perception

    Visual perception an information-based approach to understanding biological and artificial vision

    Get PDF
    The central issues of this dissertation are (a) what should we be doing — what problems should we be trying to solve — in order to build computer vision systems, and (b) what relevance biological vision has to the solution of these problems. The approach taken to tackle these issues centres mostly on the clarification and use of information-based ideas, and an investigation into the nature of the processes underlying perception. The primary objective is to demonstrate that information theory and extensions of it, and measurement theory are powerful tools in helping to find solutions to these problems. The quantitative meaning of information is examined, from its origins in physical theories, through Shannon information theory, Gabor representations and codes towards semantic interpretations of the term. Also the application of information theory to the understanding of the developmental and functional properties of biological visual systems is discussed. This includes a review of the current state of knowledge of the architecture and function of the early visual pathways, particularly the retina, and a discussion of the possible coding functions of cortical neurons. The nature of perception is discussed from a number of points of view: the types and function of explanation of perceptual systems and how these relate to the operation of the system; the role of the observer in describing perceptual functions in other systems or organisms; the status and role of objectivist and representational viewpoints in understanding vision; the philosophical basis of perception; the relationship between pattern recognition and perception, and the interpretation of perception in terms of a theory of measurement These two threads of research, information theory and measurement theory are brought together in an overview and reinterpretation of the cortical role in mammalian vision. Finally the application of some of the coding and recognition concepts to industrial inspection problems are described. The nature of the coding processes used are unusual in that coded images are used as the input for a simple neural network classifier, rather than a heuristic feature set The relationship between the Karhunen-Loève transform and the singular value decomposition is clarified as background the coding technique used to code the images. This coding technique has also been used to code long sequences of moving images to investigate the possibilities of recognition of people on the basis of their gait or posture and this application is briefly described

    Perceptual control of interceptive timing

    Get PDF

    Multi modal multi-semantic image retrieval

    Get PDF
    PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

    Real-world categories don't allow uniform feature spaces - not just across categories but within categories also [Open peer commentary on Schyns, P.G., Goldstone, R.L., & Thibaut, J. The development of features in object concepts] [Letter]

    Get PDF
    The Schyns et al. target article demonstrates that different classifications entail different representations, implying “flexible space learning.” We argue that flexibility is required even at the within-category level

    Analisis orientado a objetos de imágenes de teledetección para cartografia forestal : bases conceptuales y un metodo de segmentacion para obtener una particion inicial para la clasificacion = Object-oriented analysis of remote sensing images for land cover mapping : Conceptual foundations and a segmentation method to derive a baseline partition for classification

    Full text link
    El enfoque comúnmente usado para analizar las imágenes de satélite con fines cartográficos da lugar a resultados insatisfactorios debido principalmente a que únicamente utiliza los patrones espectrales de los píxeles, ignorando casi por completo la estructura espacial de la imagen. Además, la equiparación de las clases de cubierta a tipos de materiales homogéneos permite que cualquier parte arbitrariamente delimitada dentro de una tesela del mapa siga siendo un referente del concepto definido por su etiqueta. Esta posibilidad es incongruente con el modelo jerárquico del paisaje cada vez más aceptado en Ecología del Paisaje, que asume que la homogeneidad depende de la escala de observación y en cualquier caso es más semántica que biofísica, y que por tanto los paisajes son intrínsecamente heterogéneos y están compuestos de unidades (patches) que funcionan simultáneamente como un todo diferente de lo que les rodea y como partes de un todo mayor. Por tanto se hace necesario un nuevo enfoque (orientado a objetos) que sea compatible con este modelo y en el que las unidades básicas del análisis sean delimitadas de acuerdo a la variación espacial del fenómeno estudiado. Esta tesis pretende contribuir a este cambio de paradigma en teledetección, y sus objetivos concretos son: 1.- Poner de relieve las deficiencias del enfoque tradicionalmente empleado en la clasificación de imágenes de satélite. 2.- Sentar las bases conceptuales de un enfoque alternativo basado en zonas básicas clasificables como objetos. 3.- Desarrollar e implementar una versión demostrativa de un método automático que convierte una imagen multiespectral en una capa vectorial formada por esas zonas. La estrategia que se propone es producir, basándose en la estructura espacial de las imágenes, una partición de estas en la que cada región puede considerarse relativamente homogénea y diferente de sus vecinas y que además supera (aunque no por mucho) el tamaño de la unidad mínima cartografiable. Cada región se asume corresponde a un rodal que tras la clasificación será agregado junto a otros rodales vecinos en una región mayor que en conjunto pueda verse como una instancia de un cierto tipo de objetos que más tarde son representados en el mapa mediante teselas de una clase particular
    corecore