22 research outputs found
Object classification in images of Neoclassical furniture using Deep Learning
This short paper outlines research results on object classification in images
of Neoclassical furniture. The motivation was to provide an object recognition
framework which is able to support the alignment of furniture images with a
symbolic level model. A data-driven bottom-up research routine in the
Neoclassica research framework is the main use-case. It strives to deliver
tools for analyzing the spread of aesthetic forms which are considered as a
cultural transfer process
Representational Capacity of Deep Neural Networks -- A Computing Study
There is some theoretical evidence that deep neural networks with multiple
hidden layers have a potential for more efficient representation of
multidimensional mappings than shallow networks with a single hidden layer. The
question is whether it is possible to exploit this theoretical advantage for
finding such representations with help of numerical training methods. Tests
using prototypical problems with a known mean square minimum did not confirm
this hypothesis. Minima found with the help of deep networks have always been
worse than those found using shallow networks. This does not directly
contradict the theoretical findings---it is possible that the superior
representational capacity of deep networks is genuine while finding the mean
square minimum of such deep networks is a substantially harder problem than
with shallow ones
Make Deep Networks Shallow Again
Deep neural networks have a good success record and are thus viewed as the
best architecture choice for complex applications. Their main shortcoming has
been, for a long time, the vanishing gradient which prevented the numerical
optimization algorithms from acceptable convergence. A breakthrough has been
achieved by the concept of residual connections -- an identity mapping parallel
to a conventional layer. This concept is applicable to stacks of layers of the
same dimension and substantially alleviates the vanishing gradient problem. A
stack of residual connection layers can be expressed as an expansion of terms
similar to the Taylor expansion. This expansion suggests the possibility of
truncating the higher-order terms and receiving an architecture consisting of a
single broad layer composed of all initially stacked layers in parallel. In
other words, a sequential deep architecture is substituted by a parallel
shallow one. Prompted by this theory, we investigated the performance
capabilities of the parallel architecture in comparison to the sequential one.
The computer vision datasets MNIST and CIFAR10 were used to train both
architectures for a total of 6912 combinations of varying numbers of
convolutional layers, numbers of filters, kernel sizes, and other meta
parameters. Our findings demonstrate a surprising equivalence between the deep
(sequential) and shallow (parallel) architectures. Both layouts produced
similar results in terms of training and validation set loss. This discovery
implies that a wide, shallow architecture can potentially replace a deep
network without sacrificing performance. Such substitution has the potential to
simplify network architectures, improve optimization efficiency, and accelerate
the training process.Comment: to be published at KDIR2023, Rom
A Multilingual Test Collection for the Semantic Search of Entity Categories
Humans naturally organise and classify the world into sets and categories. These categories expressed in natural language are present
in all data artefacts from structured to unstructured data and play a fundamental role as tags, dataset predicates or ontology attributes.
A better understanding of the category syntactic structure and how to match them semantically is a fundamental problem in the
computational linguistics domain. Despite the high popularity of entity search, entity categories have not been receiving equivalent
attention. This paper aims to present the task of semantic search of entity categories by defining, developing and making publicly
available a multilingual test collection comprehending English, Portuguese and German. The test collections were designed to meet the
demands of the entity search community in providing more representative and semantically complex query sets. In addition, we also
provide comparative baselines and a brief analysis of the results
Investigating a Second-Order Optimization Strategy for Neural Networks
In summary, this cumulative dissertation investigates the application of the conjugate gradient method CG for the optimization of artificial neural networks (NNs) and compares this method with common first-order optimization methods, especially the stochastic gradient descent (SGD).
The presented research results show that CG can effectively optimize both small and very large networks. However, the default machine precision of 32 bits can lead to problems. The best results are only achieved in 64-bits computations. The research also emphasizes the importance of the initialization of the NNs’ trainable parameters and shows that an initialization using singular value decomposition (SVD) leads to drastically lower error values. Surprisingly, shallow but wide NNs, both in Transformer and CNN architectures, often perform better than their deeper counterparts. Overall, the research results recommend a re-evaluation of the previous preference for extremely deep NNs and emphasize the potential of CG as an optimization method.Zusammenfassend untersucht die vorliegende kumulative Dissertation die Anwendung des konjugierten Gradienten (CG) zur Optimierung künstlicher neuronaler Netzwerke (NNs) und vergleicht diese Methode mit verbreiteten Optimierungsverfahren erster Ordnung, insbesondere dem Stochastischem Gradientenabstieg (SGD).
Die in den Arbeiten präsentierten Forschungsergebnisse zeigen, dass CG in der Lage ist, sowohl kleinere als auch sehr große Netzwerke effektiv zu optimieren. Allerdings kann die Maschinen- genauigkeit bei 32-Bit-Berechnungen zu Problemen führen, beste Ergebnisse werden erst in 64-Bit-Fließkommazahlen erreicht. Die Forschung betont auch die Bedeutung der Initialisierung der NN-Parameter und zeigt, dass eine Initialisierung mittels Singulärwertzerlegung zu deutlich geringeren Fehlerwerten führt. Überraschenderweise erzielen flachere NNs bessere Ergebnisse als tiefe NNs mit einer vergleichbaren Anzahl an trainierbaren Parametern, unabhängig vom jeweiligen NN, das die künstlichen Daten erzeugt. Es zeigt sich auch, dass flache, breite NNs, sowohl in Transformer-, als auch in CNN-Architekturen oft besser abschneiden als ihre tieferen Gegenstücke. Insgesamt empfehlen die Forschungsergebnisse eine Neubewertung der bisherigen Präferenz für extrem tiefe NNs und betonen das Potential von CG als Optimierungsmethode
How to compute a shape: Optical character recognition for hieratic
This paper is written in the framework of the project Altägyptische Kursivschriften and presents an experiment for applying OCR to Hieratic, examining the final step of the OCR pipeline. A convolutional neural network is used to classify individual hieratic characters. The results prove that the classification of hieratic characters is in principle possible, but they also show, where improvements and combinations with other methods promise a better performance of recognition models
