Search CORE

257,294 research outputs found

Recommended from our members

Data-driven approaches to empirical discovery

Author: Langley Pat
Publication venue: eScholarship, University of California
Publication date: 31/10/1988
Field of study

eScholarship - University of California

Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy

Author: Bekhuis T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/04/2006
Field of study

Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians. © 2006Bekhuis; licensee BioMed Central Ltd

Springer - Publisher Connector

PubMed Central

D-Scholarship@Pitt

Inference for Large Panel Data with Many Covariates

Author: Pelger Markus
Zou Jiacheng
Publication venue
Publication date: 20/01/2023
Field of study

This paper proposes a novel testing procedure for selecting a sparse set of covariates that explains a large dimensional panel. Our selection method provides correct false detection control while having higher power than existing approaches. We develop the inferential theory for large panels with many covariates by combining post-selection inference with a novel multiple testing adjustment. Our data-driven hypotheses are conditional on the sparse covariate selection. We control for family-wise error rates for covariate discovery for large cross-sections. As an easy-to-use and practically relevant procedure, we propose Panel-PoSI, which combines the data-driven adjustment for panel multiple testing with valid post-selection p-values of a generalized LASSO, that allows us to incorporate priors. In an empirical study, we select a small number of asset pricing factors that explain a large cross-section of investment strategies. Our method dominates the benchmarks out-of-sample due to its better size and power

arXiv.org e-Print Archive

New approaches to pattern discovery in signals via empirical mode decomposition

Author: Alexander Voznesenskiy
Dmitry Kaplun
Dmitry Klionskiy
Mikhail Kupriyanov
Publication venue: 'JVE International Ltd.'
Publication date: 30/06/2017
Field of study

Empirical mode decomposition (EMD) is an adaptive, data-driven technique for processing and analyzing various types of non-stationary vibrational signals. EMD is a powerful and effective tool for signal preprocessing (denoising, detrending, regularity estimation) and time-frequency analysis. This paper discusses pattern discovery in signals via EMD. New approaches to this problem are introduced. In addition, the methods expounded here may be considered as a way of denoising and coping with the redundancy problem of EMD. A general classification of intrinsic mode functions (IMFs) in accordance with their physical interpretation is offered and an attempt is made to perform classification on the basis of the regression theory, special classification statistics and a clustering algorithm. The main advantage of the suggested techniques is their capability of working automatically. Simulation studies have been undertaken on multiharmonic vibrational signals

Recommended from our members

Enterprise Agility: Why Is Transformation so Hard?

Author: Barroca Leonor
Karvonen Teemu
Sharp Helen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Enterprise agility requires capabilities to transform, sense and seize new business opportunities more quickly than competitors. However, acquiring those capabilities, such as continuous delivery and scaling agility to product programmes, portfolios and business models, is challenging in many organisations. This paper introduces definitions of enterprise agility involving business management and cultural lenses for analysing large-scale agile transformation. The case organisation, in the higher education domain, leverages collaborative discovery sprints and an experimental programme to enable a bottom-up approach to transformation. Meanwhile the prevalence of bureaucracy and organisational silos are often contradictory to agile principles and values. The case study results identify transformation challenges based on observations from a five-month research period. Initial findings indicate that increased focus on organisational culture and leveraging of both bottom-up innovation and supportive top-down leadership activities, could enhance the likelihood of a successful transformation

Open Research Online (The Open University)