Search CORE

728 research outputs found

How can cells in the anterior medial face patch be viewpoint invariant?

Author: Jim Mutch
Joel Z. Leibo
Tomaso Poggio
Publication venue
Publication date: 25/03/2011
Field of study

In a recent paper, Freiwald and Tsao (2010) found evidence that the responses of cells in the macaque anterior medial (AM) face patch are invariant to significant changes in viewpoint. The monkey subjects had no prior experience with the individuals depicted in the stimuli and were never given an opportunity to view the same individual from different viewpoints sequentially. These results cannot be explained by a mechanism based on temporal association of experienced views. Employing a biologically plausible model of object recognition (software available at cbcl.mit.edu), we show two mechanisms which could account for these results. First, we show that hair style and skin color provide sufficient information to enable viewpoint recognition without resorting to any mechanism that associates images across views. It is likely that a large part of the effect described in patch AM is attributable to these cues. Separately, we show that it is possible to further improve view-invariance using class-specific features (see Vetter 1997). Faces, as a class, transform under 3D rotation in similar enough ways that it is possible to use previously viewed example faces to learn a general model of how all faces rotate. Novel faces can be encoded relative to these previously encountered “template” faces and thus recognized with some degree of invariance to 3D rotation. Since each object class transforms differently under 3D rotation, it follows that invariant recognition from a single view requires a recognition architecture with a detection step determining the class of an object (e.g. face or non-face) prior to a subsequent identification stage utilizing the appropriate class-specific features

Crossref

Nature Precedings

Unsupervised learning of clutter-resistant visual representations from natural videos

Author: Leibo Joel Z.
Liao Qianli
Poggio Tomaso
Publication venue
Publication date: 23/04/2015
Field of study

Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning rules are not known, recent results [4, 5, 6] suggest the operation of an unsupervised temporal-association-based method e.g., Foldiak's trace rule [7]. Such methods exploit the temporal continuity of the visual world by assuming that visual experience over short timescales will tend to have invariant identity content. Thus, by associating representations of frames from nearby times, a representation that tolerates whatever transformations occurred in the video may be achieved. Many previous studies verified that such rules can work in simple situations without background clutter, but the presence of visual clutter has remained problematic for this approach. Here we show that temporal association based on large class-specific filters (templates) avoids the problem of clutter. Our system learns in an unsupervised way from natural videos gathered from the internet, and is able to perform a difficult unconstrained face recognition task on natural images: Labeled Faces in the Wild [8]

arXiv.org e-Print Archive

DSpace@MIT

How can cells in the anterior medial face patch be viewpoint invariant?

Author: Jim Mutch
Joel Leibo
Joel Leibo
Tomaso Poggio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Neurons That Confuse Mirror-Symmetric Object Views

Author: Jim Mutch
Jim Mutch
Joel Z Leibo
Joel Z Leibo
Steve Smale
Steve Smale
Tomaso Poggio
Tomaso Poggio
Publication venue
Publication date: 01/01/2010
Field of study

Neurons in inferotemporal cortex that respond similarly to many pairs of mirror-symmetric images -- for example, 45 degree and -45 degree views of the same face -- have often been reported. The phenomenon seemed to be an interesting oddity. However, the same phenomenon has also emerged in simple hierarchical models of the ventral stream. Here we state a theorem characterizing sufficient conditions for this curious invariance to occur in a rather large class of hierarchical networks and demonstrate it with simulations

CiteSeerX

DSpace@MIT

How Important is Weight Symmetry in Backpropagation?

Author: Leibo Joel Z.
Liao Qianli
Poggio Tomaso
Publication venue: Center for Brains, Minds and Machines (CBMM), arXiv
Publication date: 29/11/2015
Field of study

Gradient backpropagation (BP) requires symmetric feedforward and feedback connections—the same weights must be used for forward and backward passes. This “weight transport problem” [1] is thought to be one of the main reasons of BP’s biological implausibility. Using 15 different classification datasets, we systematically study to what extent BP really depends on weight symmetry. In a study that turned out to be surprisingly similar in spirit to Lillicrap et al.’s demonstration [2] but orthogonal in its results, our experiments indicate that: (1) the magnitudes of feedback weights do not matter to performance (2) the signs of feedback weights do matter—the more concordant signs between feedforward and their corresponding feedback connections, the better (3) with feedback weights having random magnitudes and 100% concordant signs, we were able to achieve the same or even better performance than SGD. (4) some normalizations/stabilizations are indispensable for such asymmetric BP to work, namely Batch Normalization (BN) [3] and/or a “Batch Manhattan” (BM) update rule.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216

arXiv.org e-Print Archive

DSpace@MIT

Association for the Advancement of Artificial Intelligence: AAAI Publications

Throwing Down the Visual Intelligence Gauntlet

Author: Leibo Joel Z
Poggio Tomaso
Tan Cheston
Publication venue
Publication date: 01/01/2012
Field of study

In recent years, scientific and technological advances have produced artificial systems that have matched or surpassed human capabilities in narrow domains such as face detection and optical character recognition. However, the problem of producing truly intelligent machines still remains far from being solved. In this chapter, we first describe some of these recent advances, and then review one approach to moving beyond these limited successes---the neuromorphic approach of studying and reverse-engineering the networks of neurons in the human brain (specifically, the visual system). Finally, we discuss several possible future directions in the quest for visual intelligence.This research was sponsored by grants from DARPA (IPTO and DSO), National Science Foundation (NSF-0640097, NSF-0827427), AFSOR-THRL (FA8650-05-C-7262). Additional support was provided by: Adobe, Honda Research Institute USA, King Abdullah University Science and Technology grant to B. DeVore, NEC, Sony and especially by the Eugene McDermott Foundation

DSpace@MIT

Learning and disrupting invariance in visual recognition

Author: Isik Leyla
Leibo Joel Z
Poggio Tomaso
Publication venue
Publication date: 01/01/2011
Field of study

Learning by temporal association rules such as Foldiak's trace rule is an attractive hypothesis that explains the development of invariance in visual recognition. Consistent with these rules, several recent experiments have shown that invariance can be broken by appropriately altering the visual environment but found puzzling differences in the effects at the psychophysical versus single cell level. We show a) that associative learning provides appropriate invariance in models of object recognition inspired by Hubel and Wiesel b) that we can replicate the "invariance disruption" experiments using these models with a temporal association learning rule to develop and maintain invariance, and c) that we can thereby explain the apparent discrepancies between psychophysics and singe cells effects. We argue that these models account for the stability of perceptual invariance despite the underlying plasticity of the system, the variability of the visual world and expected noise in the biological mechanisms

CiteSeerX

DSpace@MIT

Using machine learning for automated de-identification and clinical coding of free text data in electronic medical records

Author: Liu Leibo
Publication venue: UNSW, Sydney
Publication date: 01/01/2023
Field of study

The widespread adoption of Electronic Medical Records (EMRs) in hospitals continues to increase the amount of patient data that are digitally stored. Although the primary use of the EMR is to support patient care by making all relevant information accessible, governments and health organisations are looking for ways to unleash the potential of these data for secondary purposes, including clinical research, disease surveillance and automation of healthcare processes and workflows. EMRs include large quantities of free text documents that contain valuable information. The greatest challenges in using the free text data in EMRs include the removal of personally identifiable information and the extraction of relevant information for specific tasks such as clinical coding. Machine learning-based automated approaches can potentially address these challenges. This thesis aims to explore and improve the performance of machine learning models for automated de-identification and clinical coding of free text data in EMRs, as captured in hospital discharge summaries, and facilitate the applications of these approaches in real-world use cases. It does so by 1) implementing an end-to-end de-identification framework using an ensemble of deep learning models; 2) developing a web-based system for de-identification of free text (DEFT) with an interactive learning loop; 3) proposing and implementing a hierarchical label-wise attention transformer model (HiLAT) for explainable International Classification of Diseases (ICD) coding; and 4) investigating the use of extreme multi-label long text transformer-based models for automated ICD coding. The key findings include: 1) An end-to-end framework using an ensemble of deep learning base-models achieved excellent performance on the de-identification task. 2) A new web-based de-identification software system (DEFT) can be readily and easily adopted by data custodians and researchers to perform de-identification of free text in EMRs. 3) A novel domain-specific transformer-based model (HiLAT) achieved state-of-the-art (SOTA) results for predicting ICD codes on a Medical Information Mart for Intensive Care (MIMIC-III) dataset comprising the discharge summaries (n=12,808) that are coded with at least one of the most 50 frequent diagnosis and procedure codes. In addition, the label-wise attention scores for the tokens in the discharge summary presented a potential explainability tool for checking the face validity of ICD code predictions. 4) An optimised transformer-based model, PLM-ICD, achieved the latest SOTA results for ICD coding on all the discharge summaries of the MIMIC-III dataset (n=59,652). The segmentation method, which split the long text consecutively into multiple small chunks, addressed the problem of applying transformer-based models to long text datasets. However, using transformer-based models on extremely large label sets needs further research. These findings demonstrate that the de-identification and clinical coding tasks can benefit from the application of machine learning approaches, present practical tools for implementing these approaches, and highlight priorities for further research

UNSWorks