56 research outputs found
Active Learning for Reducing Labeling Effort in Text Classification Tasks
Labeling data can be an expensive task as it is usually performed manually by
domain experts. This is cumbersome for deep learning, as it is dependent on
large labeled datasets. Active learning (AL) is a paradigm that aims to reduce
labeling effort by only using the data which the used model deems most
informative. Little research has been done on AL in a text classification
setting and next to none has involved the more recent, state-of-the-art Natural
Language Processing (NLP) models. Here, we present an empirical study that
compares different uncertainty-based algorithms with BERT as the used
classifier. We evaluate the algorithms on two NLP classification datasets:
Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore
heuristics that aim to solve presupposed problems of uncertainty-based AL;
namely, that it is unscalable and that it is prone to selecting outliers.
Furthermore, we explore the influence of the query-pool size on the performance
of AL. Whereas it was found that the proposed heuristics for AL did not improve
performance of AL; our results show that using uncertainty-based AL with
BERT outperforms random sampling of data. This difference in
performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference
on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine
Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to
BNAIC/BENELEARN, adds several improvements including a more thorough
discussion of related work plus an extended discussion section. 28 pages
including references and appendice
Graph Theory and Universal Grammar
Tese arquivada ao abrigo da Portaria nÂş 227/2017 de 25 de Julho-Registo de Grau EstrangeiroIn the last few years, Noam Chomsky (1994; 1995; 2000; 2001) has gone quite far in
the direction of simplifying syntax, including eliminating X-bar theory and the levels
of D-structure and S-structure entirely, as well as reducing movement rules to a
combination of the more primitive operations of Copy and Merge. What remain in
the Minimalist Program are the operations Merge and Agree and the levels of LF
(Logical Form) and PF (Phonological form).
My doctoral thesis attempts to offer an economical theory of syntactic structure
from a graph-theoretic point of view (cf. Diestel, 2005), with special emphases on the
elimination of category and projection labels and the Inclusiveness Condition
(Chomsky 1994). The major influences for the development of such a theory have
been Chris Collins’ (2002) seminal paper “Eliminating labels”, John Bowers (2001)
unpublished manuscript “Syntactic Relations” and the Cartographic Paradigm (see
Belletti, Cinque and Rizzi’s volumes on OUP for a starting point regarding this
paradigm).
A syntactic structure will be regarded here as a graph consisting of the set of
lexical items, the set of relations among them and nothing more
Statistical aspects of forensic genetics:Models for qualitative and quantitative STR data
This PhD thesis deals with statistical models intended for forensic genetics, which is the part of forensic medicine concerned with analysis of DNA evidence from criminal cases together with calculation of alleged paternity and affinity in family reunification cases. The main focus of the thesis is on crime cases as these differ from the other types of cases since the biological material often is used for person identification contrary to affinity. Common to all cases, however, is that the DNA is used as evidence in order to assess the prob-ability of observing the biological material given different hypotheses. Most countries use com-mercially manufactured DNA kits for typing a person’s DNA profile. Using these kits the DNA profile is constituted by the state of 10-15 DNA loci which has a large variation from person to person in the population. Thus, only a small fraction of the genome is typed, but due to the large variability, it is possible to identify individuals with very high probability. These probabil-ities are used when calculating the weight of evidence, which in some cases corresponds to the likelihood of observing a given suspect’s DNA profile in the population. By assessing the probability of the DNA evidence under competing hypotheses the biologica
De-Sign Environment Landscape City Atti
La VI Conferenza Internazionale sul Disegno, De_Sign Environment Landscape City_Genova 2020 tratta di: Rilievo e Rappresentazione dell’Architettura e dell’Ambiente; Il Disegno per il paesaggio; Disegni per il Progetto: tracce - visioni e pre-visioni; I margini i segni della memoria e la città in progress; Cultura visiva e comunicazione dall’idea al progetto; Le emergenze architettoniche; Il colore e l’ambiente; Percezione e identità territoriale; Patrimonio iconografico culturale paesaggistico: arte, letteratura e ricadute progettuali; Segni e Disegni per il Design e Rappresentazione avanzata. Federico Babina, architetto e graphic designer presenta ARCHIVISION, e Eduardo Carazo Lefort, Docente dell’Università di Valladolid e Targa d’Oro dell’Unione Italiana Disegno la Lectio Magistralis.
The VI International Conference on Drawing, De_Sign Environment Landscape City_Genoa 2020, deals with: Survey and Representation of Architecture and the Environment; Drawing for the landscape; De-signs for the Project: traces-visions and previews; Margins, signs of memory and the city in progress; Visual culture and communication from idea to project; Architectural emergencies; The color and the environment; Perception and territorial identity; Landscape cultural iconographic heritage: art, literature and design implications; Signs and Drawings for Design and Advanced Representation. Federico Babina, architect and graphic designer presents ARCHIVISION, and Professor Eduardo Carazo Lefort-University of Valladolid and Gold Plate of the Italian Design Union presents his Lectio Magistralis
Evolutionary genomics : statistical and computational methods
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
- …