25 research outputs found
Historical Bio-Linguistics : A biostatistic approach to the study of linguistic phylogenies and the correlation of genetic, linguistic and geographical data.
Demographic events often leave traces in languages and genes: this prompted Darwin’s prediction that the evolutionary tree of human populations would provide the best possible phylogeny of language relationships. We tested Darwin’s expectation through long-distance genome-language comparisons across Eurasia, relying on independently assessed quantitative tools on both sides. To do so, we had to resort to a linguistic method able to compare across different families, based on abstract syntactic characters, which proved more apt for long-term historical reconstruction than phonemic ones
More rule than exception: parallel evidence of ancient migrations in grammars and genomes of Finno-Ugric speakers
To reconstruct aspects of human demographic history, linguistics and genetics complement each other, reciprocally suggesting testable hypotheses on population relationships and interactions. Relying on a linguistic comparative method based on syntactic data, here we focus on the non-straightforward relation of genes and languages among Finno-Ugric (FU) speakers, in comparison to their Indo-European (IE) and Altaic (AL) neighbors. Syntactic analysis, in agreement with the indications of more traditional linguistic levels, supports at least three distinct clusters, corresponding to these three Eurasian families; yet, the outliers of the FU group show linguistic convergence with their geographical neighbors. By analyzing genome-wide data in both ancient and contemporary populations, we uncovered remarkably matching patterns, with north-western FU speakers linguistically and genetically closer in parallel degrees to their IE-speaking neighbors, and eastern FU speakers to AL speakers. Therefore, our analysis indicates that plausible cross-family linguistic interference effects were accompanied, and possibly caused, by recognizable demographic processes. In particular, based on the comparison of modern and ancient genomes, our study identified the Pontic-Caspian steppes as the possible origin of the demographic processes that led to the expansion of FU languages into Europe
Learning implicational models of universal grammar parameters
The use of parameters in the description of natural language syntax has to balance between the need to discriminate among (sometimes subtly different) languages, which can be seen as a cross-linguistic version of Chomsky's descriptive adequacy (Chomsky, 1964), and the complexity of the acquisition task that a large number of parameters would imply, which is a problem for explanatory adequacy. Here we first present a novel approach in which machine learning is used to detect hidden dependencies in a table of parameters. The result is a dependency graph in which some of the parameters can be fully predicted from others. These findings can be then subjected to linguistic analysis, which may either refute them by providing typological counter-examples of languages not included in the original dataset, dismiss them on theoretical grounds, or uphold them as tentative empirical laws worth of further study. Machine learning is also used to explore the full sets of parameters that are sufficient to distinguish one historically established language family from others. These results provide a new type of empirical evidence about the historical adequacy of parameter theories
Machine Learning Models of Universal Grammar Parameter Dependencies
The use of parameters in the description of natural language syntax has to balance between the need to discriminate among (sometimes subtly different) languages, which can be seen as a cross-linguistic version of Chomsky’s (1964) descriptive adequacy, and the complexity of the acquisition task that a large number of parameters would imply, which is a problem for explanatory adequacy. Here we present a novel approach in which a machine learning algorithm is used to find dependencies in a table of parameters. The result is a dependency graph in which some of the parameters can be fully predicted from others. These empirical findings can be then subjected to linguistic analysis, which may either refute them by providing typological counter-examples of languages not included in the original dataset, dismiss them on theoretical grounds, or uphold them as tentative empirical laws worth of further study
Recommended from our members
Comparative genomics of European Avian Pathogenic E. coli (APEC)
Background
Avian pathogenic Escherichia coli (APEC) causes colibacillosis, which results in significant economic losses to the poultry industry worldwide. However, the diversity between isolates remains poorly understood. Here, a total of 272 APEC isolates collected from the United Kingdom (UK), Italy and Germany were characterised using multiplex polymerase chain reactions (PCRs) targeting 22 equally weighted factors covering virulence genes, R-type and phylogroup. Following these analysis, 95 of the selected strains were further analysed using Whole Genome Sequencing (WGS).
Results
The most prevalent phylogroups were B2 (47%) and A1 (22%), although there were national differences with Germany presenting group B2 (35.3%), Italy presenting group A1 (53.3%) and UK presenting group B2 (56.1%) as the most prevalent. R-type R1 was the most frequent type (55%) among APEC, but multiple R-types were also frequent (26.8%). Following compilation of all the PCR data which covered a total of 15 virulence genes, it was possible to build a similarity tree using each PCR result unweighted to produce 9 distinct groups. The average number of virulence genes was 6-8 per isolate, but no positive association was found between phylogroup and number or type of virulence genes. A total of 95 isolates representing each of these 9 groupings were genome sequenced and analysed for in silico serotype, Multilocus Sequence Typing (MLST), and antimicrobial resistance (AMR). The UK isolates showed the greatest variability in terms of serotype and MLST compared with German and Italian isolates, whereas the lowest prevalence of AMR was found for German isolates. Similarity trees were compiled using sequencing data and notably single nucleotide polymorphism data generated ten distinct geno-groups. The frequency of geno-groups across Europe comprised 26.3% belonging to Group 8 representing serogroups O2, O4, O18 and MLST types ST95, ST140, ST141, ST428, ST1618 and others, 18.9% belonging to Group 1 (serogroups O78 and MLST types ST23, ST2230), 15.8% belonging to Group 10 (serogroups O8, O45, O91, O125ab and variable MLST types), 14.7% belonging to Group 7 (serogroups O4, O24, O35, O53, O161 and MLST type ST117) and 13.7% belonging to Group 9 (serogroups O1, O16, O181 and others and MLST types ST10, ST48 and others). The other groups (2, 3, 4, 5 and 6) each contained relatively few strains.
However, for some of the genogroups (e.g. groups 6 and 7) partial overlap with SNPs grouping and PCR grouping (matching PCR groups 8 (13 isolates on 22) and 1 (14 isolates on 16) were observable). However, it was not possible to obtain a clear correlation between genogroups and unweighted PCR groupings. This may be due to the genome plasticity of E. coli that enables strains to carry the same virulence factors even if the overall genotype is substantially different.
Conclusions
The conclusion to be drawn from the lack of correlations is that firstly, APEC are very diverse and secondly, it is not possible to rely on any one or more basic molecular or phenotypic tests to define APEC with clarity, reaffirming the need for whole genome analysis approaches which we describe here.
This study highlights the presence of previously unreported serotypes and MLSTs for APEC in Europe. Moreover, it is a first step on a cautious reconsideration of the merits of classical identification criteria such as R typing, phylogrouping and serotyping
VIDIIA Hunter: a low-cost, smartphone connected, artificial intelligence-assisted COVID-19 rapid diagnostic platform approved for medical use in the UK
Introduction: Accurate and rapid diagnostics paired with effective tracking and tracing systems are key to halting the spread of infectious diseases, limiting the emergence of new variants and to monitor vaccine efficacy. The current gold standard test (RT-qPCR) for COVID-19 is highly accurate and sensitive, but is time-consuming, and requires expensive specialised, lab-based equipment.Methods: Herein, we report on the development of a SARS-CoV-2 (COVID-19) rapid and inexpensive diagnostic platform that relies on a reverse-transcription loop-mediated isothermal amplification (RT-LAMP) assay and a portable smart diagnostic device. Automated image acquisition and an Artificial Intelligence (AI) deep learning model embedded in the Virus Hunter 6 (VH6) device allow to remove any subjectivity in the interpretation of results. The VH6 device is also linked to a smartphone companion application that registers patients for swab collection and manages the entire process, thus ensuring tests are traced and data securely stored.Results: Our designed AI-implemented diagnostic platform recognises the nucleocapsid protein gene of SARS-CoV-2 with high analytical sensitivity and specificity. A total of 752 NHS patient samples, 367 confirmed positives for coronavirus disease (COVID-19) and 385 negatives, were used for the development and validation of the test and the AI-assisted platform. The smart diagnostic platform was then used to test 150 positive clinical samples covering a dynamic range of clinically meaningful viral loads and 250 negative samples. When compared to RT-qPCR, our AI-assisted diagnostics platform was shown to be reliable, highly specific (100%) and sensitive (98–100% depending on viral load) with a limit of detection of 1.4 copies of RNA per µL in 30 min. Using this data, our CE-IVD and MHRA approved test and associated diagnostic platform has been approved for medical use in the United Kingdom under the UK Health Security Agency’s Medical Devices (Coronavirus Test Device Approvals, CTDA) Regulations 2022. Laboratory and in-silico data presented here also indicates that the VIDIIA diagnostic platform is able to detect the main variants of concern in the United Kingdom (September 2023).Discussion: This system could provide an efficient, time and cost-effective platform to diagnose SARS-CoV-2 and other infectious diseases in resource-limited settings
Epidemiology and Taxonomy of Honey Bee Viruses in England and Wales.
Unfortunately, in the last few years a large-scale colony loss called Colony Collapse Disorder syndrome reduced the overall number of hives in different countries. However, the cause of this is still not clear and for this reason the Department for Environment, Food and Rural affairs (DEFRA) funded this research in order to gain a better knowledge of honey bee virus epidemiology and taxonomy in England and Wales. A National level sampling plan was designed to be statistically representative of the honey bee population present in England and Wales. In order to detect viruses at low prevalence level, a new viral RNA extraction method based on virus precipitation was developed and the viral RNAs obtained were screened for eight honey bee viruses using Real Time rt-PCR. Once information about the prevalence and the distribution of the eight honey bee viruses was obtained, a further characterisation of Deformed wing virus DWV and Black queen cell virus BQCV was performed using classic rt-PCR coupled with Sanger sequencing, and phylogenetic trees were obtained using bioinformatic tools. The viruses, grouped according to their nucleotide similarity, were reported on a geographically referenced map in order to highlight the distribution of the variants found in England and Wales. The results obtained in this thesis have resulted in a better knowledge of the epidemiology and taxonomy of honey bee viruses, determination of the full sequence of Slow paralysis virus SBPV for the first time, and a new virus RNA extraction method that can be exploited in other research fields
Epidemiology and taxonomy of honey bee viruses in England and Wales
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Syntactic theory and the science of (language) history
We will present:
i) Significant cross-family gene-language correlations among Eurasian populations.
ii) The possibility of formally evaluating the relative position of the IE, Uralic and Altaic phylogenies above within a wider set of other Eurasian and American languages.
On these grounds, we will try to argue for two conclusions: 1) capturing phylogenetic signals with a model embodying a high degree of universal hypotheses about language is not only possible, but also recommendable for the purposes of statistical reliability; 2) deeply deductive approaches advocated by some modern cognitive theories are useful for the scientific foundation of the study of human history