Search CORE

2,840 research outputs found

Automatic document classification of biological literature

Author: Chen David
Muller Hans-Michael
Sternberg Paul W.
Publication venue
Publication date: 01/08/2006
Field of study

Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results: We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusions: We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept

Springer - Publisher Connector

Directory of Open Access Journals

Caltech Authors

What are we learning from business training and entrepreneurship evaluations around the developing world?

Author: McKenzie David
Woodruff Christopher
Publication venue: Department of Economics, University of Warwick
Publication date: 01/01/2013
Field of study

Business training programs are a popular policy option to try to improve the performance of enterprises around the world. The last few years have seen rapid growth in the number of evaluations of these programs in developing countries. We undertake a critical review of these studies with the goal of synthesizing the emerging lessons and understanding the limitations of the existing research and the areas in which more work is needed. We find that there is substantial heterogeneity in the length, content, and types of firms participating in the training programs evaluated. Many evaluations suffer from low statistical power, measure impacts only within a year of training, and experience problems with survey attrition and measurement of firm profits and revenues. Over these short time horizons, there are relatively modest impacts of training on survivorship of existing firms, but stronger evidence that training programs help prospective owners launch new businesses more quickly. Most studies find that existing firm owners implement some of the practices taught in training, but the magnitudes of these improvements in practices are often relatively modest. Few studies find significant impacts on profits or sales, although a couple of the studies with more statistical power have done so. Some studies have also found benefits to microfinance organizations of offering training. To date there is little evidence to help guide policymakers as to whether any impacts found come from trained firms competing away sales from other businesses versus through productivity improvements, and little evidence to guide the development of the provision of training at market prices. We conclude by summarizing some directions and key questions for future studies

CiteSeerX

Warwick Research Archives Portal Repository

Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics

Author: Oliveira J
Pereira R
Sousa M
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Clinical genetics has an important role in the healthcare system to provide a definitive diagnosis for many rare syndromes. It also can have an influence over genetics prevention, disease prognosis and assisting the selection of the best options of care/treatment for patients. Next-generation sequencing (NGS) has transformed clinical genetics making possible to analyze hundreds of genes at an unprecedented speed and at a lower price when comparing to conventional Sanger sequencing. Despite the growing literature concerning NGS in a clinical setting, this review aims to fill the gap that exists among (bio)informaticians, molecular geneticists and clinicians, by presenting a general overview of the NGS technology and workflow. First, we will review the current NGS platforms, focusing on the two main platforms Illumina and Ion Torrent, and discussing the major strong points and weaknesses intrinsic to each platform. Next, the NGS analytical bioinformatic pipelines are dissected, giving some emphasis to the algorithms commonly used to generate process data and to analyze sequence variants. Finally, the main challenges around NGS bioinformatics are placed in perspective for future developments. Even with the huge achievements made in NGS technology and bioinformatics, further improvements in bioinformatic algorithms are still required to deal with complex and genetically heterogeneous disorders

Rigorous RG algorithms and area laws for low energy eigenstates in 1D

Author: Arad Itai
Landau Zeph
Vazirani Umesh
Vidick Thomas
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2017
Field of study

One of the central challenges in the study of quantum many-body systems is the complexity of simulating them on a classical computer. A recent advance (Landau et al. in Nat Phys, 2015) gave a polynomial time algorithm to compute a succinct classical description for unique ground states of gapped 1D quantum systems. Despite this progress many questions remained unsolved, including whether there exist efficient algorithms when the ground space is degenerate (and of polynomial dimension in the system size), or for the polynomially many lowest energy states, or even whether such states admit succinct classical descriptions or area laws. In this paper we give a new algorithm, based on a rigorously justified RG type transformation, for finding low energy states for 1D Hamiltonians acting on a chain of nparticles. In the process we resolve some of the aforementioned open questions, including giving a polynomial time algorithm for poly(n) degenerate ground spaces and an n^(O(log n)) algorithm for the poly(n) lowest energy states (under a mild density condition). For these classes of systems the existence of a succinct classical description and area laws were not rigorously proved before this work. The algorithms are natural and efficient, and for the case of finding unique ground states for frustration-free Hamiltonians the running time is Õ(nM(n)), where M(n) is the time required to multiply two n × n matrices

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Caltech Authors