47 research outputs found

    Drosophila evolution over space and time (DEST):A new population genomics resource

    Get PDF
    Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome datasets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate datasets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in > 20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This dataset, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental meta-data. A web-based genome browser and web portal provide easy access to the SNP dataset. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan dataset. Our resource will enable population geneticists to analyze spatio-temporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.DrosEU is funded by a Special Topic Networks (STN) grant from the European Society for Evolutionary Biology (ESEB). MK (M. Kapun) was supported by the Austrian Science Foundation (grant no. FWF P32275); JG by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (H2020-ERC-2014-CoG-647900) and by the Spanish Ministry of Science and Innovation (BFU-2011-24397); TF by the Swiss National Science Foundation (SNSF grants PP00P3_133641, PP00P3_165836, and 31003A_182262) and a Mercator Fellowship from the German Research Foundation (DFG), held as a EvoPAD Visiting Professor at the Institute for Evolution and Biodiversity, University of Münster; AOB by the National Institutes of Health (R35 GM119686); MK (M. Kankare) by Academy of Finland grant 322980; VL by Danish Natural Science Research Council (FNU) grant 4002-00113B; FS Deutsche Forschungsgemeinschaft (DFG) grant STA1154/4-1, Project 408908608; JP by the Deutsche Forschungsgemeinschaft Projects 274388701 and 347368302; AU by FPI fellowship (BES-2012-052999); ET Israel Science Foundation (ISF) grant 1737/17; MSV, MSR and MJ by a grant from the Ministry of Education, Science and Technological Development of the Republic of Serbia (451-03-68/2020-14/200178); AP, KE and MT by a grant from the Ministry of Education, Science and Technological Development of the Republic of Serbia (451-03-68/2020-14/200007); and TM NSERC grant RGPIN-2018-05551.Peer reviewe

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Corrigendum to: Drosophila Evolution over Space and Time (DEST): a New Population Genomics Resource

    Get PDF
    Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome datasets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate datasets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in > 20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This dataset, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental meta-data. A web-based genome browser and web portal provide easy access to the SNP dataset. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan dataset. Our resource will enable population geneticists to analyze spatio-temporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.DrosEU is funded by a Special Topic Networks (STN) grant from the European Society for Evolutionary Biology (ESEB). MK (M. Kapun) was supported by the Austrian Science Foundation (grant no. FWF P32275); JG by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (H2020-ERC-2014-CoG-647900) and by the Spanish Ministry of Science and Innovation (BFU-2011-24397); TF by the Swiss National Science Foundation (SNSF grants PP00P3_133641, PP00P3_165836, and 31003A_182262) and a Mercator Fellowship from the German Research Foundation (DFG), held as a EvoPAD Visiting Professor at the Institute for Evolution and Biodiversity, University of Münster; AOB by the National Institutes of Health (R35 GM119686); MK (M. Kankare) by Academy of Finland grant 322980; VL by Danish Natural Science Research Council (FNU) grant 4002-00113B; FS Deutsche Forschungsgemeinschaft (DFG) grant STA1154/4-1, Project 408908608; JP by the Deutsche Forschungsgemeinschaft Projects 274388701 and 347368302; AU by FPI fellowship (BES-2012-052999); ET Israel Science Foundation (ISF) grant 1737/17; MSV, MSR and MJ by a grant from the Ministry of Education, Science and Technological Development of the Republic of Serbia (451-03-68/2020-14/200178); AP, KE and MT by a grant from the Ministry of Education, Science and Technological Development of the Republic of Serbia (451-03-68/2020-14/200007); and TM NSERC grant RGPIN-2018-05551.Peer reviewe

    Multidimensional signals and analytic flexibility: Estimating degrees of freedom in human speech analyses

    Get PDF
    Recent empirical studies have highlighted the large degree of analytic flexibility in data analysis which can lead to substantially different conclusions based on the same data set. Thus, researchers have expressed their concerns that these researcher degrees of freedom might facilitate bias and can lead to claims that do not stand the test of time. Even greater flexibility is to be expected in fields in which the primary data lend themselves to a variety of possible operationalizations. The multidimensional, temporally extended nature of speech constitutes an ideal testing ground for assessing the variability in analytic approaches, which derives not only from aspects of statistical modeling, but also from decisions regarding the quantification of the measured behavior. In the present study, we gave the same speech production data set to 46 teams of researchers and asked them to answer the same research question, resulting insubstantial variability in reported effect sizes and their interpretation. Using Bayesian meta-analytic tools, we further find little to no evidence that the observed variability can be explained by analysts’ prior beliefs, expertise or the perceived quality of their analyses. In light of this idiosyncratic variability, we recommend that researchers more transparently share details of their analysis, strengthen the link between theoretical construct and quantitative system and calibrate their (un)certainty in their conclusions

    Solving Large p-median Problems by a Multistage Hybrid Approach Using Demand Points Aggregation and Variable Neighbourhood Search

    Get PDF
    A hybridisation of a clustering-based technique and of a variable neighbourhood search (VNS) is designed to solve large-scale p-median problems. The approach is based on a multi-stage methodology where learning from previous stages is taken into account when tackling the next stage. Each stage is made up of several subproblems that are solved by a fast procedure to produce good feasible solutions. Within each stage, the solutions returned are put together to make up a new promising subset of potential facilities. This augmented p-median problem is then solved by VNS. As these problems used aggregation, a cost evaluation based on the original demand points instead of aggregation is computed for each of the ‘aggregation’-based solution. The one yielding the least cost is then selected and its chosen facilities included into the next stages. This multi-stage process is repeated several times until a certain criterion is met. This approach is enhanced by an efficient way to aggregate the data and a neighbourhood reduction scheme when allocating demand points to their nearest facilities. The proposed approach is tested, using various values of p, on the largest data sets from the literature with up to 89,600 demand points with encouraging results

    Casimiroa edulis seed extracts show anticonvulsive properties in rats

    No full text
    A single dose of 5, 10 and 100 mg/kg of Casimiroa edulis aqueous extract (AQ); 10, 100 and 1000 mg/kg of C. edulis ethanolic extract (E-OH); in addition, 10, 30 and 12 mg/kg of propyleneglycol (Pg), phenytoin (Phen) and phenobarbital (Phb) was orally given to adult male Wistar rat groups. Thereafter, all groups were assayed for protection against maximal electroshock (MES) and pentylenetetrazole (METsc) seizure inducing tests at hourly intervals throughout 8 h. For MES, a maximal protection of 70% at the 2nd and 4th h with 10 mg/kg AQ and 100 mg/kg E-OH doses, occurred. That of Phen, Phb and Pg was 80, 90 and 10% at the 8th, 6th and 2nd h, respectively. The averaged values of the MES unprotected rats under 10 and 100 mg/kg of AQ and E-OH extracts, showed that a shortened reflex duration as well as a delayed latency and uprising times occurred. On the other hand, just an enlarged latency and no protection against METsc device in AQ and EOH was observed. Phen and Phb maximal protection was 80 and 100% at the 4th and 6th hour against METsc. Thus, AQ is tenfold more potent anticonvulsive extract than E-OH against MES

    Interpretovatelné fuzzy pravidlové systémy pro detekci finančních podvodů

    No full text
    Systems for detecting financial statement frauds have attracted considerable interest in computational intelligence research. Diverse classification methods have been employed to perform automatic detection of fraudulent companies. However, previous research has aimed to develop highly accurate detection systems, while neglecting the interpretability of those systems. Here we propose a novel fuzzy rule-based detection system that integrates a feature selection component and rule extraction to achieve a highly interpretable system in terms of rule complexity and granularity. Specifically, we use a genetic feature selection to remove irrelevant attributes and then we perform a comparative analysis of state-of-the-art fuzzy rule-based systems, including FURIA and evolutionary fuzzy rule-based systems. Here, we show that using such systems leads not only to competitive accuracy but also to desirable interpretability. This finding has important implications for auditors and other users of the detection systems of financial statement fraud.Systémy pro odhalování podvodů s finančními výkazy přitahují značný zájem ve výzkumu výpočetní inteligence. K automatické detekci podvodných společností byly použity různé klasifikační metody. Předchozí výzkum se však zaměřil na vývoj vysoce přesných detekčních systémů, přičemž zanedbával interpretovatelnost těchto systémů. Zde navrhujeme nový fuzzy detekční systém založený na pravidlech, který integruje komponentu pro selekci atributů a extrakci pravidel pro dosažení vysoce interpretovatelného systému z hlediska složitosti a granularity pravidel. Konkrétně používáme genetickou selekci atributů k odstranění irelevantních atributů a poté provádíme srovnávací analýzu nejmodernějších fuzzy systémů založených na pravidlech, včetně FURIA a evolučních fuzzy systémů založených na pravidlech. Ukazujeme, že použití takových systémů vede nejen ke konkurenceschopné přesnosti, ale také k žádoucí interpretovatelnosti. Toto zjištění má důležité důsledky pro auditory a ostatní uživatele detekčních systémů podvodů s finančními výkazy
    corecore