46 research outputs found
Analysis, interpretation and synthesis of facial expressions
Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 121-130).by Irfan Aziz Essa.Ph.D
Coding, Analysis, Interpretation, and Recognition of Facial Expressions
We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with a geometric and a physical (muscle) model describing the facial structure. Our method produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion. Previous efforts at analysis of facial expression have been based on the Facial Action Coding System (FACS), a representation developed in order to allow human psychologists to code expression from static pictures. To avoid use of this heuristic coding scheme, we have used our computer vision system to probabilistically characterize facial motion and muscle activation in an experimental population, thus deriving a new, more accurate representation of human facial expressions that we call FACS+. We use this new representation for recognition in two different ways. The first method uses the physics-based model directly, by recognizing..
Machine Learning for Video-Based Rendering
We recently introduced a new paradigm for computer animation, video textures, which allows us to use a recorded video to generate novel animations by replaying the video samples in a new order. Video sprites are a special type of video texture. Instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. They can be rendered anywhere on the screen to create a novel animation of the object. To create such an animation, we have to find a sequence of sprite samples that is both visually smooth and shows the desired motion. In this paper, we address both problems. To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. If the motion path is known in advance, we then use a beam search algorithm to find a good sample sequence. We can also specify the motion interactively by precomputing a set of cost functions using Q-learning
Localization and 3D Reconstruction of Urban Scenes Using GPS
Using off-the-shelf Global Positioning System (GPS)
units, we reconstruct buildings in 3D by exploiting the reduction
in signal to noise ratio (SNR) that occurs when
the buildings obstruct the line-of-sight between the moving
units and the orbiting satellites. We measure the size and
height of skyscrapers as well as automatically constructing
a density map representing the location of multiple buildings
in an urban landscape. If deployed on a large scale, via
a cellular service provider’s GPS-enabled mobile phones or
GPS-tracked delivery vehicles, the system could provide an
inexpensive means of continuously creating and updating
3D maps of urban environments
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling
frozen LLMs to perform both understanding and generation tasks involving
non-linguistic modalities such as images or videos. SPAE converts between raw
pixels and interpretable lexical tokens (or words) extracted from the LLM's
vocabulary. The resulting tokens capture both the semantic meaning and the
fine-grained details needed for visual reconstruction, effectively translating
the visual content into a language comprehensible to the LLM, and empowering it
to perform a wide array of multimodal tasks. Our approach is validated through
in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set
of image understanding and generation tasks. Our method marks the first
successful attempt to enable a frozen LLM to generate image content while
surpassing state-of-the-art performance in image understanding tasks, under the
same setting, by over 25%.Comment: NeurIPS 2023 spotligh
Burnout among surgeons before and during the SARS-CoV-2 pandemic: an international survey
Background: SARS-CoV-2 pandemic has had many significant impacts within the surgical realm, and surgeons have been obligated to reconsider almost every aspect of daily clinical practice. Methods: This is a cross-sectional study reported in compliance with the CHERRIES guidelines and conducted through an online platform from June 14th to July 15th, 2020. The primary outcome was the burden of burnout during the pandemic indicated by the validated Shirom-Melamed Burnout Measure. Results: Nine hundred fifty-four surgeons completed the survey. The median length of practice was 10 years; 78.2% included were male with a median age of 37 years old, 39.5% were consultants, 68.9% were general surgeons, and 55.7% were affiliated with an academic institution. Overall, there was a significant increase in the mean burnout score during the pandemic; longer years of practice and older age were significantly associated with less burnout. There were significant reductions in the median number of outpatient visits, operated cases, on-call hours, emergency visits, and research work, so, 48.2% of respondents felt that the training resources were insufficient. The majority (81.3%) of respondents reported that their hospitals were included in the management of COVID-19, 66.5% felt their roles had been minimized; 41% were asked to assist in non-surgical medical practices, and 37.6% of respondents were included in COVID-19 management. Conclusions: There was a significant burnout among trainees. Almost all aspects of clinical and research activities were affected with a significant reduction in the volume of research, outpatient clinic visits, surgical procedures, on-call hours, and emergency cases hindering the training. Trial registration: The study was registered on clicaltrials.gov "NCT04433286" on 16/06/2020
Career: developing and evaluating a spatio-temporal representation for analysis, modeling, recognition and synthesis of facial expressions
Issued as final reportNational Science Foundatio
Contact detection, collision forces and friction for physically based virtual world modeling
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Civil Engineering, 1990.Includes bibliographical references (leaves 142-145).by Irfan A. Essa.M.S
Aware Home: Sensing, Interpretation, and Recognition of Everday Activities
Presentation given in the Wilby Room of the Georgia Tech Library and Information Center.The Aware Home project is a unique living laboratory for exploration of ubiquitous computing in a domestic setting. Dr Essa's Talk will present ongoing research in the area of developing technologies within a residential setting that will affect our everyday living - specifically concentrating on the sensing and perception technologies that can enable a home environment to be aware of the whereabouts and activities of its occupants. The discussion will include the use of computer vision, audition work and other efforts in computational perception to track and monitor the residents, as well as
methods being developed to recognize the residents' activities over short and extended periods. The technological, design and engineering research challenges inherent in this problem domain, and the focus on awareness to help maintain independence and quality of life for an aging population will also be explored. The project is located in the Georgia Tech Broadband Institute's Residential Laboratory
Coding, Analysis, Interpretation, and Recognition of Facial Expressions
We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure. Our method produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion.
Previous efforts at analysis of facial expression have been based on the Facial Action Coding System (FACS), a representation developed in order to allow human psychologists to code expression from static pictures. To avoid use of this heuristic coding scheme, we have used our computer vision system to probabilistically characterize facial motion and muscle activation in an experimental population, thus deriving a new, more accurate representation of human facial expressions that we call FACS.
Finally, we show how this method can be used for coding, analysis, interpretation, and recognition of facial expressions