113,322 research outputs found

    Formally analysing the concepts of domestic violence.

    Get PDF
    The types of police inquiries performed these days are incredibly diverse. Often data processing architectures are not suited to cope with this diversity since most of the case data is still stored as unstructured text. In this paper Formal Concept Analysis (FCA) is showcased for its exploratory data analysis capabilities in discovering domestic violence intelligence from a dataset of unstructured police reports filed with the regional police Amsterdam-Amstelland in the Netherlands. From this data analysis it is shown that FCA can be a powerful instrument to operationally improve policing practice. For one, it is shown that the definition of domestic violence employed by the police is not always as clear as it should be, making it hard to use it effectively for classification purposes. In addition, this paper presents newly discovered knowledge for automatically classifying certain cases as either domestic or non-domestic violence is. Moreover, it provides practical advice for detecting incorrect classifications performed by police officers. A final aspect to be discussed is the problems encountered because of the sometimes unstructured way of working of police officers. The added value of this paper resides in both using FCA for exploratory data analysis, as well as with the application of FCA for the detection of domestic violence.Formal concept analysis (FCA); Domestic violence; Knowledge discovery in databases; Text mining; Exploratory data analysis; Knowledge enrichment; Concept discovery;

    Bridging the biodiversity data gaps: Recommendations to meet users’ data needs

    Get PDF
    A strong case has been made for freely available, high quality data on species occurrence, in order to track changes in biodiversity. However, one of the main issues surrounding the provision of such data is that sources vary in quality, scope, and accuracy. Therefore publishers of such data must face the challenge of maximizing quality, utility and breadth of data coverage, in order to make such data useful to users. Here, we report a number of recommendations that stem from a content need assessment survey conducted by the Global Biodiversity Information Facility (GBIF). Through this survey, we aimed to distil the main user needs regarding biodiversity data. We find a broad range of recommendations from the survey respondents, principally concerning issues such as data quality, bias, and coverage, and extending ease of access. We recommend a candidate set of actions for the GBIF that fall into three classes: 1) addressing data gaps, data volume, and data quality, 2) aggregating new kinds of data for new applications, and 3) promoting ease-of-use and providing incentives for wider use. Addressing the challenge of providing high quality primary biodiversity data can potentially serve the needs of many international biodiversity initiatives, including the new 2020 biodiversity targets of the Convention on Biological Diversity, the emerging global biodiversity observation network (GEO BON), and the new Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES)

    A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

    Get PDF
    This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization

    Visual and computational analysis of structure-activity relationships in high-throughput screening data

    Get PDF
    Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

    Tracking materials science data lineage to manage millions of materials experiments and analyses

    Get PDF
    In an era of rapid advancement of algorithms that extract knowledge from data, data and metadata management are increasingly critical to research success. In materials science, there are few examples of experimental databases that contain many different types of information, and compared with other disciplines, the database sizes are relatively small. Underlying these issues are the challenges in managing and linking data across disparate synthesis and characterization experiments, which we address with the development of a lightweight data management framework that is generally applicable for experimental science and beyond. Five years of managing experiments with this system has yielded the Materials Experiment and Analysis Database (MEAD) that contains raw data and metadata from millions of materials synthesis and characterization experiments, as well as the analysis and distillation of that data into property and performance metrics via software in an accompanying open source repository. The unprecedented quantity and diversity of experimental data are searchable by experiment and analysis attributes generated by both researchers and data processing software. The search web interface allows users to visualize their search results and download zipped packages of data with full annotations of their lineage. The enormity of the data provides substantial challenges and opportunities for incorporating data science in the physical sciences, and MEAD’s data and algorithm management framework will foster increased incorporation of automation and autonomous discovery in materials and chemistry research

    The Evaluation Of Molecular Similarity And Molecular Diversity Methods Using Biological Activity Data

    Get PDF
    This paper reviews the techniques available for quantifying the effectiveness of methods for molecule similarity and molecular diversity, focusing in particular on similarity searching and on compound selection procedures. The evaluation criteria considered are based on biological activity data, both qualitative and quantitative, with rather different criteria needing to be used depending on the type of data available

    Inductive queries for a drug designing robot scientist

    Get PDF
    It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
    • …
    corecore