78 research outputs found

    Compositional Morphology for Word Representations and Language Modelling

    Full text link
    This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model. Our approach is evaluated in the context of log-bilinear language models, rendered suitably efficient for implementation inside a machine translation decoder by factoring the vocabulary. We perform both intrinsic and extrinsic evaluations, presenting results on a range of languages which demonstrate that our model learns morphological representations that both perform well on word similarity tasks and lead to substantial reductions in perplexity. When used for translation into morphologically rich languages with large vocabularies, our models obtain improvements of up to 1.2 BLEU points relative to a baseline system using back-off n-gram models.Comment: Proceedings of the 31st International Conference on Machine Learning (ICML

    Probabilistic Modelling of Morphologically Rich Languages

    Full text link
    This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014. http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c

    Suicides on Commuter Rail in California: Possible Patterns — A Case Study, Research Report 10-05

    Get PDF
    Suicides on rail systems constitute a significant social concern. Reports in local media, whether in newspapers, television, or radio, have brought awareness to this very sensitive and personal subject. This is also true for the San Francisco Bay Area. These events also cause severe trauma for the train operators and staff of the system as well as disruption and cost to society. The overall objective of this project was to conduct a pilot study to identify possible patterns in suicides associated with urban commuter rail systems in California. The Caltrain commuter rail system in the San Francisco Bay Area was used as the subject system for the pilot study. The primary intent of the data analysis was to determine whether suicides along the Caltrain tracks exhibited patterns. Pattern detection in this study was conducted primarily on the basis of time and location. Because the data were readily available, the gender factor was also included in the analysis, although this is not a factor that is connected to the rail system. It was concluded that the data did show some patterns for suicides with respect to time and location. Some of the patterns can be explained while the reasons for some are not immediately obvious. However, the patterns in the latter category did not indicate a particularly attractive location or possible source for suicides

    Natural Language Processing with Small Feed-Forward Networks

    Full text link
    We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory budget.Comment: EMNLP 2017 short pape

    Scrotal tick damage as a cause of infertility in communal bulls in Moretele, South Africa

    Get PDF
    Calving rate in communal cattle influences both food security and socio-economics in rural households. A previous study indicated that scrotal damage caused by ticks could affect the fertility of communal bulls and reduce the annual calving rate. The objectives of the study were to investigate the annual calving rate in communal herds by counting calves during herd visits, perform breeding soundness examinations on bulls and identify adult ticks attached to their genitalia. This prospective longitudinal survey was based on participatory rural appraisal. Calving rates were estimated in cows (n = 2398) from 100 randomly selected communal herds in Moretele over 12 months in 2013, during routine visits by animal health technicians. Randomly selected bulls (n = 50) from these herds were tested for Brucella abortus, Trichomonas foetus and Campylobacter fetus subspecies venerealis. The calving rate was 35.86% (0.359). The mean scrotal circumference was 37.63 ± 3.42 cm. Total sperm motility was 78.73 ± 35.73%; progressive sperm motility was 27.39 ± 15.81% and non-progressive sperm motility was 51.34 ± 19.92%. Thirty-five of the 38 bulls examined for breeding soundness exhibited severe scrotal and preputial lesions caused by the adult ticks Amblyomma hebraeum and Hyalomma rufipes. Tick control methods used included spraying (n = 20), pour-on (n = 11), no control (n = 1) and various (n = 18). It was concluded that in Moretele genital tick damage had a more serious impact on the fertility of communal bulls than contagious diseases. Targeted acaricidal spot treatment of the genitalia of communal bulls to prevent infestation is recommended, as tick control strategies used by farmers appeared to be inadequate.The University of Hohenheim in Germany and the National Research Foundation.http://www.jsava.co.za/index.php/jsavapm2020Animal and Wildlife SciencesProduction Animal Studie

    Antimicrobial susceptibility of gram-negative pathogens isolated from patients with complicated intra-abdominal infections in South African hospitals (SMART Study 2004-2009) : impact of the new carbapenem breakpoints

    Get PDF
    BACKGROUND: The Study for Monitoring Antimicrobial Resistance Trends (SMART) follows trends in resistance among aerobic and facultative anaerobic gram-negative bacilli (GNB) isolated from complicated intra-abdominal infections (cIAIs) in patients around the world. METHODS: During 2004–2009, three centralized clinical microbiology laboratories serving 59 private hospitals in three large South African cities collected 1,218 GNB from complicated intra-abdominal infections (cIAIs) and tested them for susceptibility to 12 antibiotics according to the 2011 Clinical Laboratory Standards Institute (CLSI) guidelines. RESULTS: Enterobacteriaceae comprised 83.7% of the isolates. Escherichia coli was the species isolated most commonly (46.4%), and 7.6% of these were extended-spectrum b-lactamase (ESBL)-positive. The highest ESBL rate was documented for Klebsiella pneumoniae (41.2%). Overall, ertapenem was the antibiotic most active against susceptible species for which it has breakpoints (94.6%) followed by amikacin (91.9%), piperacillin-tazobactam (89.3%), and imipenem-cilastatin (87.1%), whereas rates of resistance to ceftriaxone, cefotaxime, ciprofloxacin, and levofloxacin were documented to be 29.7%, 28.7%, 22.5%, and 21.1%, respectively. Multi-drug resistance (MDR), defined as resistance to three or more antibiotic classes, was significantly more common in K. pneumoniae (27.9%) than in E. coli (4.9%; p < 0.0001) or Proteus mirabilis (4.1%; p < 0.05). Applying the new CLSI breakpoints for carbapenems, susceptibility to ertapenem was reduced significantly in ESBL-positive E. coli compared with ESBL-negative isolates (91% vs. 98%; p < 0.05), but this did not apply to imipenem-cilastatin (95% vs. 99%; p = 0.0928). A large disparity between imipenem-cilastatin and ertapenem susceptibility in P. mirabilis and Morganella morganii was documented (24% vs. 96% and 15% vs. 92%, respectively), as most isolates of these two species had imipenem-cilastatin minimum inhibitory concentrations in the 2–4 mcg/mL range, which is no longer regarded as susceptible. CONCLUSIONS: This study documented substantial resistance to standard antimicrobial therapy among GNB commonly isolated from cIAIs in South Africa. With the application of the new CLSI carbapenem breakpoints, discrepancies were noted between ertapenem and imipenem-cilastatin with regard to the changes in their individual susceptibilities. Longitudinal surveillance of susceptibility patterns is useful to guide recommendations for empiric antibiotic use in cIAIs.Merck & Co., Inc.http://www.liebertpub.com/overview/surgical-infections/53/am2013ay201

    Downregulation of pyrophosphate: d-fructose-6-phosphate 1-phosphotransferase activity in sugarcane culms enhances sucrose accumulation due to elevated hexose-phosphate levels

    Get PDF
    Analyses of transgenic sugarcane clones with 45–95% reduced cytosolic pyrophosphate: d-fructose-6-phosphate 1-phosphotransferase (PFP, EC 2.7.1.90) activity displayed no visual phenotypical change, but significant changes were evident in in vivo metabolite levels and fluxes during internode development. In three independent transgenic lines, sucrose concentrations increased between three- and sixfold in immature internodes, compared to the levels in the wildtype control. There was an eightfold increase in the hexose-phosphate:triose-phosphate ratio in immature internodes, a significant restriction in the triose phosphate to hexose phosphate cycle and significant increase in sucrose cycling as monitored by 13C nuclear magnetic resonance. This suggests that an increase in the hexose-phosphate concentrations resulting from a restriction in the conversion of hexose phosphates to triose phosphates drive sucrose synthesis in the young internodes. These effects became less pronounced as the tissue matured. Decreased expression of PFP also resulted in an increase of the ATP/ADP and UTP/UDP ratios, and an increase of the total uridine nucleotide and, at a later stage, the total adenine nucleotide pool, revealing strong interactions between PPi metabolism and general energy metabolism. Finally, decreased PFP leads to a reduction of PPi levels in older internodes indicating that in these developmental stages PFP acts in the gluconeogenic direction. The lowered PPi levels might also contribute to the absence of increases in sucrose contents in the more mature tissues of transgenic sugarcane with reduced PFP activity

    Understanding continent-wide variation in vulture ranging behavior to assess feasibility of Vulture Safe Zones in Africa: Challenges and possibilities

    Get PDF
    Protected areas are intended as tools in reducing threats to wildlife and preserving habitat for their long-term population persistence. Studies on ranging behavior provide insight into the utility of protected areas. Vultures are one of the fastest declining groups of birds globally and are popular subjects for telemetry studies, but continent-wide studies are lacking. To address how vultures use space and identify the areas and location of possible vulture safe zones, we assess home range size and their overlap with protected areas by species, age, breeding status, season, and region using a large continent-wide telemetry datasets that includes 163 individuals of three species of threatened Gyps vulture. Immature vultures of all three species had larger home ranges and used a greater area outside of protected areas than breeding and non-breeding adults. Cape vultures had the smallest home range sizes and the lowest level of overlap with protected areas. Rüppell\u27s vultures had larger home range sizes in the wet season, when poisoning may increase due to human-carnivore conflict. Overall, our study suggests challenges for the creation of Vulture Safe Zones to protect African vultures. At a minimum, areas of 24,000 km2 would be needed to protect the entire range of an adult African White-backed vulture and areas of more than 75,000 km2 for wider-ranging Rüppell\u27s vultures. Vulture Safe Zones in Africa would generally need to be larger than existing protected areas, which would require widespread conservation activities outside of protected areas to be successful
    corecore