338 research outputs found

    Design and enhanced evaluation of a robust anaphor resolution algorithm

    Get PDF
    Syntactic coindexing restrictions are by now known to be of central importance to practical anaphor resolution approaches. Since, in particular due to structural ambiguity, the assumption of the availability of a unique syntactic reading proves to be unrealistic, robust anaphor resolution relies on techniques to overcome this deficiency. This paper describes the ROSANA approach, which generalizes the verification of coindexing restrictions in order to make it applicable to the deficient syntactic descriptions that are provided by a robust state-of-the-art parser. By a formal evaluation on two corpora that differ with respect to text genre and domain, it is shown that ROSANA achieves high-quality robust coreference resolution. Moreover, by an in-depth analysis, it is proven that the robust implementation of syntactic disjoint reference is nearly optimal. The study reveals that, compared with approaches that rely on shallow preprocessing, the largely nonheuristic disjoint reference algorithmization opens up the possibility/or a slight improvement. Furthermore, it is shown that more significant gains are to be expected elsewhere, particularly from a text-genre-specific choice of preference strategies. The performance study of the ROSANA system crucially rests on an enhanced evaluation methodology for coreference resolution systems, the development of which constitutes the second major contribution o/the paper. As a supplement to the model-theoretic scoring scheme that was developed for the Message Understanding Conference (MUC) evaluations, additional evaluation measures are defined that, on one hand, support the developer of anaphor resolution systems, and, on the other hand, shed light on application aspects of pronoun interpretation

    Relative-fuzzy: a novel approach for handling complex ambiguity for software engineering of data mining models

    Get PDF
    There are two main defined classes of uncertainty namely: fuzziness and ambiguity, where ambiguity is ‘one-to-many’ relationship between syntax and semantic of a proposition. This definition seems that it ignores ‘many-to-many’ relationship ambiguity type of uncertainty. In this thesis, we shall use complex-uncertainty to term many-to-many relationship ambiguity type of uncertainty. This research proposes a new approach for handling the complex ambiguity type of uncertainty that may exist in data, for software engineering of predictive Data Mining (DM) classification models. The proposed approach is based on Relative-Fuzzy Logic (RFL), a novel type of fuzzy logic. RFL defines a new formulation of the problem of ambiguity type of uncertainty in terms of States Of Proposition (SOP). RFL describes its membership (semantic) value by using the new definition of Domain of Proposition (DOP), which is based on the relativity principle as defined by possible-worlds logic. To achieve the goal of proposing RFL, a question is needed to be answered, which is: how these two approaches; i.e. fuzzy logic and possible-world, can be mixed to produce a new membership value set (and later logic) that able to handle fuzziness and multiple viewpoints at the same time? Achieving such goal comes via providing possible world logic the ability to quantifying multiple viewpoints and also model fuzziness in each of these multiple viewpoints and expressing that in a new set of membership value. Furthermore, a new architecture of Hierarchical Neural Network (HNN) called ML/RFL-Based Net has been developed in this research, along with a new learning algorithm and new recalling algorithm. The architecture, learning algorithm and recalling algorithm of ML/RFL-Based Net follow the principles of RFL. This new type of HNN is considered to be a RFL computation machine. The ability of the Relative Fuzzy-based DM prediction model to tackle the problem of complex ambiguity type of uncertainty has been tested. Special-purpose Integrated Development Environment (IDE) software, which generates a DM prediction model for speech recognition, has been developed in this research too, which is called RFL4ASR. This special purpose IDE is an extension of the definition of the traditional IDE. Using multiple sets of TIMIT speech data, the prediction model of type ML/RFL-Based Net has classification accuracy of 69.2308%. This accuracy is higher than the best achievements of WEKA data mining machines given the same speech data

    Towards a knowledge‑hub destination: analysis and recommendation for implementing TOD for Qatar national library metro station

    Get PDF
    During the past two decades, Qatar, a developing country, has invested heavily in infrastructure development to address several challenges caused by the rapid urbanization. Qatar has made a significant step toward its urban sustainability vision through the construction of the Doha Metro system. By adopting Transit-Oriented Development (TOD), Qatar is overcoming some urban challenges. TOD promotes compact, walkable, and mixed-use development around the transit nodes, which enhances the public realm through providing pedestrian-oriented and active spaces. Additionally, Qatar aims to transfer to a knowledge-based economy through developing an environment that will attract knowledge and creative human power. Qatar Foundation is taking the lead toward implementing a Knowledge-Based Urban Development (KBUD) through its flagship project: Education City (EC). This study aims therefore to evaluate the integration of TOD and KBUD strategies to leverage the potential of TOD in attracting knowledge and creative economy industries. The selected case study is Qatar National Library (QNL) metro station at the EC in Doha. The study examines the potential of QNL as a destination TOD to enhance the area’s mission as a driver for a knowledge-based economy. The methodological approach is based on the analytical concepts obtained from the Integrated Modification Methodology as a sustainable urban design process. The study’s results revealed that void and function, followed by volume, are the weakest layers of the study area’s Complex Adaptive System which require morphological modification to achieve sustainability and a knowledge-hub TOD. The study offers recommendations to assist planners and designers in making better decisions toward regenerating urban areas through a knowledge-hub TOD contributing to the spill out of knowledge and creativity into the public realm creating a human-centric vibrant public space adjacent to metro stations

    Learning Efficient Disambiguation

    Get PDF
    This dissertation analyses the computational properties of current performance-models of natural language parsing, in particular Data Oriented Parsing (DOP), points out some of their major shortcomings and suggests suitable solutions. It provides proofs that various problems of probabilistic disambiguation are NP-Complete under instances of these performance-models, and it argues that none of these models accounts for attractive efficiency properties of human language processing in limited domains, e.g. that frequent inputs are usually processed faster than infrequent ones. The central hypothesis of this dissertation is that these shortcomings can be eliminated by specializing the performance-models to the limited domains. The dissertation addresses "grammar and model specialization" and presents a new framework, the Ambiguity-Reduction Specialization (ARS) framework, that formulates the necessary and sufficient conditions for successful specialization. The framework is instantiated into specialization algorithms and applied to specializing DOP. Novelties of these learning algorithms are 1) they limit the hypotheses-space to include only "safe" models, 2) are expressed as constrained optimization formulae that minimize the entropy of the training tree-bank given the specialized grammar, under the constraint that the size of the specialized model does not exceed a predefined maximum, and 3) they enable integrating the specialized model with the original one in a complementary manner. The dissertation provides experiments with initial implementations and compares the resulting Specialized DOP (SDOP) models to the original DOP models with encouraging results.Comment: 222 page

    A comparative study of different gradient approximations for Restricted Boltzmann Machines

    Get PDF
    This project consists of the theoretical study of Restricted Boltzmann Machines(RBMs) and focuses on the gradient approximations of RBMs. RBMs suffer from the dilemma of accurate learning with the exact gradient. Based on Contrastive Divergence(CD) and Markov Chain Monte Carlo(MCMC), CD-k, an efficient algorithm of approximating the gradients, is proposed and now it becomes the mainstream to train RBMs. In order to improve the algorithm efficiency and mitigate the bias existing in the approximation, many CD-related algorithms have emerged afterwards, such as Persistent Contrastive Divergence(PCD) and Weighted Contrastive Divergence(WCD). In this project the comprehensive comparison of the gradient approximation algorithms is presented, mainly including CD, PCD, WCD. The experimental results indicate that among all the conducted algorithms, WCD has the fastest and best convergence for parameter learning. Increasing the Gibbs sampling time and adding a persistent chain in CD-related can enhance the performance and alleviate the bias in the approximation, also taking advantage of Parallel Tempering can further improve the results. Moreover, the cosine similarity of approximating gradients and exact gradients is studied and it proves that CD series algorithms and WCD series algorithms are heterogeneous. The general conclusions in this project can be the reference when training RBMs

    Robust and flexible multi-scale medial axis computation

    Get PDF
    The principle of the multi-scale medial axis (MMA) is important in that any object is detected at a blurring scale proportional to the size of the object. Thus it provides a sound balance between noise removal and preserving detail. The robustness of the MMA has been reflected in many existing applications in object segmentation, recognition, description and registration. This thesis aims to improve the computational aspects of the MMA. The MMA is obtained by computing ridges in a “medialness” scale-space derived from an image. In computing the medialness scale-space, we propose an edge-free medialness algorithm, the Concordance-based Medial Axis Transform (CMAT). It not only depends on the symmetry of the positions of boundaries, but also is related to the symmetry of the intensity contrasts at boundaries. Therefore it excludes spurious MMA branches arising from isolated boundaries. In addition, the localisation accuracy for the position and width of an object, as well as the robustness under noisy conditions, is preserved in the CMAT. In computing ridges in the medialness space, we propose the sliding window algorithm for extracting locally optimal scale ridges. It is simple and efficient in that it can readily separate the scale dimension from the search space but avoids the difficult task of constructing surfaces of connected maxima. It can extract a complete set of MMA for interfering objects in scale-space, e.g. embedded or adjacent objects. These algorithms are evaluated using a quantitative study of their performance for 1-D signals and qualitative testing on 2-D images

    ATHENA Research Book

    Get PDF
    The ATHENA European University is an alliance of nine Higher Education Institutions with the mission of fostering excellence in research and innovation by facilitating international cooperation. The ATHENA acronym stands for Advanced Technologies in Higher Education Alliance. The partner institutions are from France, Germany, Greece, Italy, Lithuania, Portugal, and Slovenia: the University of OrlĂ©ans, the University of Siegen, the Hellenic Mediterranean University, the NiccolĂČ Cusano University, the Vilnius Gediminas Technical University, the Polytechnic Institute of Porto, and the University of Maribor. In 2022 institutions from Poland and Spain joined the alliance: the Maria Curie-SkƂodowska University and the University of Vigo. This research book presents a selection of the ATHENA university partners' research activities. It incorporates peer-reviewed original articles, reprints and student contributions. The ATHENA Research Book provides a platform that promotes joint and interdisciplinary research projects of both advanced and early-career researchers

    State-of-the-art generalisation research in NLP: a taxonomy and review

    Get PDF
    The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what `good generalisation' entails and how it should be evaluated is not well understood, nor are there any common standards to evaluate it. In this paper, we aim to lay the ground-work to improve both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP, we use that taxonomy to present a comprehensive map of published generalisation studies, and we make recommendations for which areas might deserve attention in the future. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they aim to solve, the type of data shift they consider, the source by which this data shift is obtained, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 previous papers that test generalisation, for a total of more than 600 individual experiments. Considering the results of this review, we present an in-depth analysis of the current state of generalisation research in NLP, and make recommendations for the future. Along with this paper, we release a webpage where the results of our review can be dynamically explored, and which we intend to up-date as new NLP generalisation studies are published. With this work, we aim to make steps towards making state-of-the-art generalisation testing the new status quo in NLP.Comment: 35 pages of content + 53 pages of reference
    • 

    corecore