17 research outputs found

    Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization

    Full text link

    On Heads and Coordination in Valence Acquisition

    Full text link
    Abstract. The aim of this paper is to present the design of a partial syntactic annotation of the IPI PAN Corpus of Polish [22] and the cor-responding extension of the corpus search engine Poliqarp [25,12] devel-oped at the Institue of Computer Science PAS and currently employed in Polish and Portuguese corpora projects. In particular, we will argue for the need to distinguish between, and represent both, syntactic and se-mantic heads, and we will sketch the representation of coordination, the area traditionally controversial both in theoretical and in computational linguistics. The annotation is designed in a way intended to maximise the usefulness of the resulting corpus for the task of automatic valence acquisition

    The Chalcidoidea bush of life: evolutionary history of a massive radiation of minute wasps.

    Get PDF
    Chalcidoidea are mostly parasitoid wasps that include as many as 500 000 estimated species. Capturing phylogenetic signal from such a massive radiation can be daunting. Chalcidoidea is an excellent example of a hyperdiverse group that has remained recalcitrant to phylogenetic resolution. We combined 1007 exons obtained with Anchored Hybrid Enrichment with 1048 ultra-conserved elements (UCEs) for 433 taxa including all extant families, >95% of all subfamilies, and 356 genera chosen to represent the vast diversity of the superfamily. Going back and forth between the molecular results and our collective knowledge of morphology and biology, we detected bias in the analyses that was driven by the saturation of nucleotide data. Our final results are based on a concatenated analysis of the least saturated exons and UCE datasets (2054 loci, 284 106 sites). Our analyses support an expected sister relationship with Mymarommatoidea. Seven previously recognized families were not monophyletic, so support for a new classification is discussed. Natural history in some cases would appear to be more informative than morphology, as illustrated by the elucidation of a clade of plant gall associates and a clade of taxa with planidial first-instar larvae. The phylogeny suggests a transition from smaller soft-bodied wasps to larger and more heavily sclerotized wasps, with egg parasitism as potentially ancestral for the entire superfamily. Deep divergences in Chalcidoidea coincide with an increase in insect families in the fossil record, and an early shift to phytophagy corresponds with the beginning of the "Angiosperm Terrestrial Revolution". Our dating analyses suggest a middle Jurassic origin of 174 Ma (167.3-180.5 Ma) and a crown age of 162.2 Ma (153.9-169.8 Ma) for Chalcidoidea. During the Cretaceous, Chalcidoidea may have undergone a rapid radiation in southern Gondwana with subsequent dispersals to the Northern Hemisphere. This scenario is discussed with regard to knowledge about the host taxa of chalcid wasps, their fossil record and Earth's palaeogeographic history

    Car-Sharing between Two Locations: Online Scheduling with Flexible Advance Bookings

    Full text link
    We study an on-line scheduling problem that is motivated by applications such as car-sharing. Users submit ride requests, and the scheduler aims to accept requests of maximum total profit using a single server (car). Each ride request specifies the pick-up time and the pick-up location (among two locations, with the other location being the destination). The scheduler has to decide whether or not to accept a request immediately at the time when the request is submitted (booking time). We consider two variants of the problem with respect to constraints on the booking time: In the fixed booking time variant, a request must be submitted a fixed amount of time before the pick-up time. In the variable booking time variant, a request can be submitted at any time during a certain time interval that precedes the pick-up time. We present lower bounds on the competitive ratio for both variants and propose a greedy algorithm that achieves the best possible competitive ratio

    Corrective Dependency Parsing

    No full text
    Abstract We present a discriminative model for correcting errors in automatically generated dependency trees. We show that by focusing on “structurally local ” errors, we can improve the overall quality of the dependency structure. Defining the task by way of a locality constraint allows us to search over a large set of alternate dependency trees simply by making small perturbations to individual dependency edges. This technique requires no additional data for training as it uses the original training data and parser to generate a set of parses from which the training examples are generated. We present experimental results on a Czech corpus using four different parsers, both projective and non-projective, showing the robustness of the technique.

    A Manual for Tectogrammatical Tagging of the Prague Dependency Treebank

    No full text
    Introduction: Three layers of tagging of the Prague Dependency Treebank This manual is supposed to introduce into the practice of syntactic tagging in the framework of the Prague Dependency Treebank (henceforth PDT). After a brief Introduction, a list of used symbols is given (Sect. 1) followed by a description of the automatic procedure dealing with grammatemes (Sect. 2.1), and by instructions covering further transducing (non-automatic, for the time being) of morphemic and analytic data t
    corecore