211 research outputs found
On the naturalness of software
Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension.
We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations---and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether (a) code can be usefully modeled by statistical language models and (b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very regular, and, in fact, even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipse's completion capability. We conclude the paper by laying out a vision for future research in this area
Cumulative weather effects can impact across the whole life cycle
Predicting how species will be affected by future climatic change requires the underlying environmental drivers to be identified. As vital rates vary over the lifecycle, structured population models derived from statistical environment-demography relationships are often used to inform such predictions. Environmental drivers are typically identified independently for different vital rates and demographic classes. However, these rates often exhibit positive temporal covariance, suggesting the vital rates respond to common environmental drivers. Additionally, models often only incorporate average weather conditions during a single, a priori chosen time window (e.g. monthly means). Mismatches between these windows and the period when the vital rates are sensitive to variation in climate decrease the predictive performance of such approaches. We used a demographic structural equation model (SEM) to demonstrate that a single axis of environmental variation drives the majority of the (co)variation in survival, reproduction, and twinning across six age-sex classes in a Soay sheep population. This axis provides a simple target for the complex task of identifying the drivers of vital rate variation. We used functional linear models (FLMs) to determine the critical windows of three local climatic drivers, allowing the magnitude and direction of the climate effects to differ over time. Previously unidentified lagged climatic effects were detected in this well-studied population. The FLMs had a better predictive performance than selecting a critical window a priori, but not than a large-scale climate index. Positive covariance amongst vital rates and temporal variation in the effects of environmental drivers are common, suggesting our SEM-FLM approach is a widely applicable tool for exploring the joint responses of vital rates to environmental change
Genetic diversity and population structure of Angiostrongylus vasorum parasites within and between local urban foxes (Vulpes Vulpes)
Angiostrongylus vasorum is a nematode parasite of the pulmonary arteries and heart that infects domestic and wild canids. Dogs (Canis familiaris) and red foxes (Vulpes vulpes) are the most commonly affected definitive hosts. Recent studies suggest that angiostrongylosis is an emerging disease, and that red foxes may play an important role in the epidemiology of the parasite. Genetic analyses of parasites collected from dogs and foxes throughout Europe have shown that the same parasite haplotypes are commonly shared between different host species. However, the extent of genetic diversity within local A. vasorum populations and individual hosts is unknown. The objective of the present study was to assess the occurrence of genetic diversity among A. vasorum (a) recovered from different foxes within the Greater London area (a localised population, single worm per fox dataset); and (b) hosted within single foxes (multiple worms per fox dataset). During 2016, A. vasorum worms were collected from foxes culled for other purposes in London. DNA was extracted from each parasite and a partial fragment of the mitochondrial cytochrome oxidase subunit 1 (mtCOI) gene was amplified and sequenced. Sequences from the single worm dataset were compared with those published elsewhere. Combined, 19 haplotypes were described of which 15 were identified from foxes found in London, indicating that considerable genetic diversity can be detected within a local geographic area. Analysis of the multiple worm dataset identified 22 haplotypes defining worms recovered from just six foxes, emphasising the relevance of wild canines as reservoirs of genetic diversity. This is the first study to explore the genetic complexity of individual fox-hosted A. vasorum population
Exploring population responses to environmental change when there is never enough data: a factor analytic approach
© 2018 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society Temporal variability in the environment drives variation in vital rates, with consequences for population dynamics and life-history evolution. Integral projection models (IPMs) are data-driven structured population models widely used to study population dynamics and life-history evolution in temporally variable environments. However, many datasets have insufficient temporal replication for the environmental drivers of vital rates to be identified with confidence, limiting their use for evaluating population level responses to environmental change. Parameter selection, where the kernel is constructed at each time step by randomly selecting the time-varying parameters from their joint probability distribution, is one approach to including stochasticity in IPMs. We consider a factor analytic (FA) approach for modelling the covariance matrix of time-varying parameters, whereby latent variable(s) describe the covariance among vital rate parameters. This decreases the number of parameters to estimate and, where the covariance is positive, the latent variable can be interpreted as a measure of environmental quality. We demonstrate this using simulation studies and two case studies. The simulation studies suggest the FA approach provides similarly accurate estimates of stochastic population growth rate to estimating an unstructured covariance matrix. We demonstrate how the latent parameter can be perturbed to show how selection on reproductive delays in the monocarp Carduus nutans changes under different environmental conditions. We develop a demographic model of the fire dependent herb Eryngium cuneifolium to show how a putative driver of the variation in environmental quality can be incorporated with the addition of a single parameter. Using perturbation analyses we determine optimal management strategies for this species. This approach estimates fewer parameters than previous approaches and allows novel eco-evolutionary insights. Predictions on population dynamics and life-history evolution under different environmental conditions can be made without necessarily identifying causal factors. Putative environmental drivers can be incorporated with relatively few parameters, allowing for predictions on how populations will respond to changes in the environment
AIGO: towards a unified framework for the analysis and the inter-comparison of GO functional annotations
BACKGROUND: In response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall. RESULTS: In this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo) for the Analysis and the Inter-comparison of the products of Gene Ontology (GO) annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats. CONCLUSIONS: This work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way
Atherogenic Lipid Stress Induces Platelet Hyperactivity Through CD36-Mediated Hyposensitivity To Prostacyclin-; The Role Of Phosphodiesterase 3A
Prostacyclin (PGI2) controls platelet activation and thrombosis through a cyclic adenosine monophosphate (cAMP) signalling cascade. However, in patients with cardiovascular diseases this protective mechanism fails for reasons that are unclear. Using both pharmacological and genetic approaches we describe a mechanism by which oxidised low density lipoproteins (oxLDL) associated with dyslipidaemia promote platelet activation through impaired PGI2 sensitivity and diminished cAMP signalling. In functional assays using human platelets, oxLDL modulated the inhibitory effects of PGI2, but not a PDE-insensitive cAMP analogue, on platelet aggregation, granule secretion and in vitro thrombosis. Examination of the mechanism revealed that oxLDL promoted the hydrolysis of cAMP through the phosphorylation and activation of phosphodiesterase 3A (PDE3A), leading to diminished cAMP signalling. PDE3A activation by oxLDL required Src family kinases, Syk and protein kinase C. The effects of oxLDL on platelet function and cAMP signalling were blocked by pharmacological inhibition of CD36, mimicked by CD36-specific oxidised phospholipids and ablated in CD36-/- murine platelets. The injection of oxLDL into wild type mice strongly promoted FeCl3 induced carotid thrombosis in vivo, which was prevented by pharmacological inhibition of PDE3A. Furthermore, blood from dyslipidaemic mice was associated with increased oxidative lipid stress, reduced platelet sensitivity to PGI2 ex vivo and diminished PKA signalling. In contrast, platelet sensitivity to a PDE-resistant cAMP analogue remained normal. Genetic deletion of CD36, protected dyslipidaemic animals from PGI2 hyposensitivity and restored PKA signalling. These data suggest that CD36 can translate atherogenic lipid stress into platelet hyperactivity through modulation of inhibitory cAMP signalling.
Broadband terahertz heterodyne spectrometer exploiting synchrotron radiation at megahertz resolution
International audienceA new spectrometer allowing both high resolution and broadband coverage in the terahertz (THz) domain is proposed. This instrument exploits the heterodyne technique between broadband synchrotron radiation and a quantum cascade laser (QCL) based molecular THz laser that acts as the local oscillator (LO). Proof of principle for exploitation for spectroscopy is provided by the recording of molecular absorptions of hydrogen sulfide (H 2 S) and methanol (CH 3 OH) around 1.073 THz. Ultimately, the spectrometer will enable to cover the 1-4 THz region in 5 GHz windows at Doppler resolution
The implications of seasonal climatic effects for managing disturbance dependent populations under a changing climate
The frequency of ecological disturbances, such as fires, is changing due to changing land use and climatic conditions. Disturbance-adapted species may thus require the manipulation of disturbance regimes to persist. However, the effects of changes in other abiotic factors, such as climatic conditions, are frequently disregarded in studies of such systems. Where climatic effects are included, relatively simple approaches that disregard seasonal variation in the effects are typically used. We compare predictions of population persistence using different fire return intervals (FRIs) under recent and predicted future climatic conditions for the rare fire-dependent herb Eryngium cuneifolium. We used functional linear models (FLMs) to estimate the cumulative effect of climatic variables across the annual cycle, allowing the strength and direction of the climatic impacts to differ over the year. We then estimated extinction probabilities and minimum population sizes under past and forecasted future climatic conditions and a range of FRIs. Under forecasted climate change, E. cuneifolium is predicted to persist under a much broader range of FRIs, because increasing temperatures are associated with faster individual growth. Climatic impacts on fecundity do not result in a temporal trend in this vital rate due to antagonistic seasonal effects operating through winter and summer temperatures. These antagonistic seasonal climatic effects highlight the importance of capturing the seasonal dependence of climatic effects when forecasting their future fate. Synthesis. Awareness of the potential effects of climate change on disturbance-adapted species is necessary for developing suitable management strategies for future environmental conditions. However, our results suggest that widely used simple methods for modelling climate impacts, that disregard seasonality in such effects, may produce misleading inferences
Protein Kinase A Regulates Platelet Phosphodiesterase 3A through an A-Kinase Anchoring Protein Dependent Manner
Platelet activation is critical for haemostasis, but if unregulated can lead to pathological thrombosis. Endogenous platelet inhibitory mechanisms are mediated by prostacyclin (PGI2)-stimulated cAMP signalling, which is regulated by phosphodiesterase 3A (PDE3A). However, spatiotemporal regulation of PDE3A activity in platelets is unknown. Here, we report that platelets possess multiple PDE3A isoforms with seemingly identical molecular weights (100 kDa). One isoform contained a unique N-terminal sequence that corresponded to PDE3A1 in nucleated cells but with negligible contribution to overall PDE3A activity. The predominant cytosolic PDE3A isoform did not possess the unique N-terminal sequence and accounted for >99% of basal PDE3A activity. PGI2 treatment induced a dose and time-dependent increase in PDE3A phosphorylation which was PKA-dependent and associated with an increase in phosphodiesterase enzymatic activity. The effects of PGI2 on PDE3A were modulated by A-kinase anchoring protein (AKAP) disruptor peptides, suggesting an AKAP-mediated PDE3A signalosome. We identified AKAP7, AKAP9, AKAP12, AKAP13, and moesin expressed in platelets but focussed on AKAP7 as a potential PDE3A binding partner. Using a combination of immunoprecipitation, proximity ligation techniques, and activity assays, we identified a novel PDE3A/PKA RII/AKAP7 signalosome in platelets that integrates propagation and termination of cAMP signalling through coupling of PKA and PDE3A
Statistical methods in language processing
The term statistical methods here refers to a methodology that has been dominant in computational linguistics since about 1990. It is characterized by the use of stochastic models, substantial data sets, machine learning, and rigorous experimental evaluation. The shift to statistical methods in computational linguistics parallels a movement in artificial intelligence more broadly. Statistical methods have so thoroughly permeated computational linguistics that almost all work in the field draws on them in some way. There has, however, been little penetration of the methods into general linguistics. The methods themselves are largely borrowed from machine learning and information theory. We limit attention to that which has direct applicability to language processing, though the methods are quite general and have many nonlinguistic applications. Not every use of statistics in language processing falls under statistical methods as we use the term. Standard hypothesis testing and experimental design, for example, are not covered in this article. WIREs Cogni Sci 2011 2 315–322 DOI: 10.1002/wcs.111 For further resources related to this article, please visit the WIREs websitePeer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/83468/1/111_ftp.pd
- …