3,207 research outputs found
Freeform User Interfaces for Graphical Computing
報告番号: 甲15222 ; 学位授与年月日: 2000-03-29 ; 学位の種別: 課程博士 ; 学位の種類: 博士(工学) ; 学位記番号: 博工第4717号 ; 研究科・専攻: 工学系研究科情報工学専
Investigation of new learning methods for visual recognition
Visual recognition is one of the most difficult and prevailing problems in computer vision and pattern recognition due to the challenges in understanding the semantics and contents of digital images. Two major components of a visual recognition system are discriminatory feature representation and efficient and accurate pattern classification. This dissertation therefore focuses on developing new learning methods for visual recognition.
Based on the conventional sparse representation, which shows its robustness for visual recognition problems, a series of new methods is proposed. Specifically, first, a new locally linear K nearest neighbor method, or LLK method, is presented. The LLK method derives a new representation, which is an approximation to the ideal representation, by optimizing an objective function based on a host of criteria for sparsity, locality, and reconstruction. The novel representation is further processed by two new classifiers, namely, an LLK based classifier (LLKc) and a locally linear nearest mean based classifier (LLNc), for visual recognition. The proposed classifiers are shown to connect to the Bayes decision rule for minimum error. Second, a new generative and discriminative sparse representation (GDSR) method is proposed by taking advantage of both a coarse modeling of the generative information and a modeling of the discriminative information. The proposed GDSR method integrates two new criteria, namely, a discriminative criterion and a generative criterion, into the conventional sparse representation criterion. A new generative and discriminative sparse representation based classification (GDSRc) method is then presented based on the derived new representation. Finally, a new Score space based multiple Metric Learning (SML) method is presented for a challenging visual recognition application, namely, recognizing kinship relations or kinship verification. The proposed SML method, which goes beyond the conventional Mahalanobis distance metric learning, not only learns the distance metric but also models the generative process of features by taking advantage of the score space. The SML method is optimized by solving a constrained, non-negative, and weighted variant of the sparse representation problem.
To assess the feasibility of the proposed new learning methods, several visual recognition tasks, such as face recognition, scene recognition, object recognition, computational fine art analysis, action recognition, fine grained recognition, as well as kinship verification are applied. The experimental results show that the proposed new learning methods achieve better performance than the other popular methods
Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence
The increasing capacities of large language models (LLMs) present an
unprecedented opportunity to scale up data analytics in the humanities and
social sciences, augmenting and automating qualitative analytic tasks
previously typically allocated to human labor. This contribution proposes a
systematic mixed methods framework to harness qualitative analytic expertise,
machine scalability, and rigorous quantification, with attention to
transparency and replicability. 16 machine-assisted case studies are showcased
as proof of concept. Tasks include linguistic and discourse analysis, lexical
semantic change detection, interview analysis, historical event cause inference
and text mining, detection of political stance, text and idea reuse, genre
composition in literature and film; social network inference, automated
lexicography, missing metadata augmentation, and multimodal visual cultural
analytics. In contrast to the focus on English in the emerging LLM
applicability literature, many examples here deal with scenarios involving
smaller languages and historical texts prone to digitization distortions. In
all but the most difficult tasks requiring expert knowledge, generative LLMs
can demonstrably serve as viable research instruments. LLM (and human)
annotations may contain errors and variation, but the agreement rate can and
should be accounted for in subsequent statistical modeling; a bootstrapping
approach is discussed. The replications among the case studies illustrate how
tasks previously requiring potentially months of team effort and complex
computational pipelines, can now be accomplished by an LLM-assisted scholar in
a fraction of the time. Importantly, this approach is not intended to replace,
but to augment researcher knowledge and skills. With these opportunities in
sight, qualitative expertise and the ability to pose insightful questions have
arguably never been more critical
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Evaluating text-to-image models is notoriously difficult. A strong recent
approach for assessing text-image faithfulness is based on QG/A (question
generation and answering), which uses pre-trained foundational models to
automatically generate a set of questions and answers from the prompt, and
output images are scored based on whether these answers extracted with a visual
question answering model are consistent with the prompt-based answers. This
kind of evaluation is naturally dependent on the quality of the underlying QG
and QA models. We identify and address several reliability challenges in
existing QG/A work: (a) QG questions should respect the prompt (avoiding
hallucinations, duplications, and omissions) and (b) VQA answers should be
consistent (not asserting that there is no motorcycle in an image while also
claiming the motorcycle is blue). We address these issues with Davidsonian
Scene Graph (DSG), an empirically grounded evaluation framework inspired by
formal semantics. DSG is an automatic, graph-based QG/A that is modularly
implemented to be adaptable to any QG/A module. DSG produces atomic and unique
questions organized in dependency graphs, which (i) ensure appropriate semantic
coverage and (ii) sidestep inconsistent answers. With extensive experimentation
and human evaluation on a range of model configurations (LLM, VQA, and T2I), we
empirically demonstrate that DSG addresses the challenges noted above. Finally,
we present DSG-1k, an open-sourced evaluation benchmark that includes 1,060
prompts, covering a wide range of fine-grained semantic categories with a
balanced distribution. We release the DSG-1k prompts and the corresponding DSG
questions.Comment: Project website: https://google.github.io/ds
Learning visual representations of style
Learning Visual Representations of Style Door Nanne van Noord De stijl van een kunstenaar is zichtbaar in zijn/haar werk, onafhankelijk van de vorm of het onderwerp van een kunstwerk kunnen kunstexperts deze stijl herkennen. Of het nu om een landschap of een portret gaat, het connaisseurschap van kunstexperts stelt hen in staat om de stijl van de kunstenaar te herkennen. Het vertalen van dit vermogen tot connaisseurschap naar een computer, zodat de computer in staat is om de stijl van een kunstenaar te herkennen, en om kunstwerken te (re)produceren in de stijl van de kunstenaar, staat centraal in dit onderzoek. Voor visuele analyseren van kunstwerken maken computers gebruik van beeldverwerkingstechnieken. Traditioneel gesproken bestaan deze technieken uit door computerwetenschappers ontwikkelde algoritmes die vooraf gedefinieerde visuele kernmerken kunnen herkennen. Omdat deze kenmerken zijn ontwikkelt voor de analyse van de inhoud van foto’s zijn ze beperkt toepasbaar voor de analyse van de stijl van visuele kunst. Daarnaast is er ook geen definitief antwoord welke visuele kenmerken indicatief zijn voor stijl. Om deze beperkingen te overkomen maken we in dit onderzoek gebruik van Deep Learning, een methodologie die het beeldverwerking onderzoeksveld in de laatste jaren enorm heeft gerevolutionaliseerd. De kracht van Deep Learning komt voort uit het zelflerende vermogen, in plaats van dat we afhankelijk zijn van vooraf gedefinieerde kenmerken, kan de computer zelf leren wat de juiste kenmerken zijn. In dit onderzoek hebben we algoritmes ontwikkelt met het doel om het voor de computer mogelijk te maken om 1) zelf te leren om de stijl van een kunstenaar te herkennen, en 2) nieuwe afbeeldingen te genereren in de stijl van een kunstenaar. Op basis van het in het proefschrift gepresenteerde werk kunnen we concluderen dat de computer inderdaad in staat is om te leren om de stijl van een kunstenaar te herkennen, ook in een uitdagende setting met duizenden kunstwerken en enkele honderden kunstenaars. Daarnaast kunnen we concluderen dat het mogelijk is om, op basis van bestaande kunstwerken, nieuwe kunstwerken te generen in de stijl van de kunstenaar. Namelijk, een kleurloze afbeeldingen van een kunstwerk kan ingekleurd worden in de stijl van de kunstenaar, en wanneer er delen missen uit een kunstwerk is het mogelijk om deze missende stukken in te vullen (te retoucheren). Alhoewel we nog niet in staat zijn om volledig nieuwe kunstwerken te generen, is dit onderzoek een grote stap in die richting. Bovendien zijn de in dit onderzoek ontwikkelde technieken en methodes veelbelovend als digitale middelen ter ondersteuning van kunstexperts en restauratoren
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
- …