Search CORE

383 research outputs found

Protein (Multi-)Location Prediction: Using Location Inter-Dependencies in a Probabilistic Framework

Author: Shatkay Hagit
Simha Ramanuja
Publication venue
Publication date: 29/07/2013
Field of study

Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins, assuming that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems have attempted to predict multiple locations of proteins, they typically treat locations as independent or capture inter-dependencies by treating each locations-combination present in the training set as an individual location-class. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the multiple-location-prediction process, using a collection of Bayesian network classifiers. We evaluate our system on a dataset of single- and multi-localized proteins. Our results, obtained by incorporating inter-dependencies are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without restricting predictions to be based only on location-combinations present in the training set.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

Springer - Publisher Connector

A Linear Classifier Based on Entity Recognition Tools and a Statistical Approach to Method Extraction in the Protein-Protein Interaction Literature

Author: Conover Michael
Lourenço Anália
Nematzadeh Azadeh
Pan Fengxia
Rocha Luis M.
Shatkay Hagit
Wong Andrew
Publication venue
Publication date: 01/01/2011
Field of study

We participated, in the Article Classification and the Interaction Method subtasks (ACT and IMT, respectively) of the Protein-Protein Interaction task of the BioCreative III Challenge. For the ACT, we pursued an extensive testing of available Named Entity Recognition and dictionary tools, and used the most promising ones to extend our Variable Trigonometric Threshold linear classifier. For the IMT, we experimented with a primarily statistical approach, as opposed to employing a deeper natural language processing strategy. Finally, we also studied the benefits of integrating the method extraction approach that we have used for the IMT into the ACT pipeline. For the ACT, our linear article classifier leads to a ranking and classification performance significantly higher than all the reported submissions. For the IMT, our results are comparable to those of other systems, which took very different approaches. For the ACT, we show that the use of named entity recognition tools leads to a substantial improvement in the ranking and classification of articles relevant to protein-protein interaction. Thus, we show that our substantially expanded linear classifier is a very competitive classifier in this domain. Moreover, this classifier produces interpretable surfaces that can be understood as "rules" for human understanding of the classification. In terms of the IMT task, in contrast to other participants, our approach focused on identifying sentences that are likely to bear evidence for the application of a PPI detection method, rather than on classifying a document as relevant to a method. As BioCreative III did not perform an evaluation of the evidence provided by the system, we have conducted a separate assessment; the evaluators agree that our tool is indeed effective in detecting relevant evidence for PPI detection methods.Comment: BMC Bioinformatics. In Pres

arXiv.org e-Print Archive

Universidade do Minho: RepositoriUM

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Relationships between Urban Forest Patch Characteristics and Near-Ground Solar Radiation in Baltimore, MD

Author: Shatkay Ruth
Publication venue
Publication date: 01/01/2021
Field of study

Forest patches in urban areas perform multiple ecological functions that can aid in regulating microclimate, managing stormwater flows, and improving air quality. Many of these functions and services are driven by solar radiation inputs below the forest canopy. However, the relationships between near-ground solar radiation and urban forest patch characteristics are not well studied or understood. For this thesis, we estimated near-ground solar radiation in six forest patches in Baltimore, MD, USA using hemispherical photographs to calculate global site factor (GSF). In addition, we determined patch compactness, as well as the origin, slope, aspect, distance from edge, and degree of invasion at each sampling site. Results show that patch attributes affect solar radiation inputs, although the strength of the relationships between GSF and the studied patch characteristics vary between sites. The identified patterns in near-ground solar radiation can be used to inform effective conservation and management of urban forest patches

Digital Repository at the University of Maryland

Recommended from our members

How to Get the Most out of Your Curation Effort

Author: Rzhetsky Andrey
Shatkay Hagit
Wilbur W. John
Publication venue
Publication date: 21/12/2023
Field of study

Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology.</p

Knowledge UChicago

Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

Author: A. Rzhetsky
F. Pan
Friedman
H. Shatkay
Krallinger
Krauthammer
Raychaudhuri
Shatkay
Tanabe
W. J. Wilbur
Wilbur
Yeh
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks

Crossref

PubMed Central