Search CORE

51 research outputs found

New approaches to facilitate genome analysis

Author: Scordis Philip
Publication venue: UCL (University College London)
Publication date: 01/01/2001
Field of study

In this era of concerted genome sequencing efforts, biological sequence information is abundant. With many prokaryotic and simple eukaryotic genomes completed, and with the genomes of more complex organisms nearing completion, the bioinformatics community, those charged with the interpretation of these data, are becoming concerned with the efficacy of current analysis tools. One step towards a more complete understanding of biology at the molecular level is the unambiguous functional assignment of every newly sequenced protein. The sheer scale of this problem precludes the conventional process of biochemically determining function for every example. Rather we must rely on demonstrating similarity to previously characterised proteins via computational methods, which can then be used to infer homology and hence structural and functional relationships. Our ability to do this with any measure of reliability unfortunately diminishes as the pools of experimentally determined sequence data become muddied with sequences that are themselves characterised with "in silico" annotation.Part of the problem stems from the complexity of modelling biology in general, and of evolution in particular. For example, once similarity has been identified between sequences, in order to assign a common function it is important to identify whether the inferred homologous relationship has an orthologous or paralogous origin, which currently cannot be done computationally. The modularity of proteins also poses problems for automatic annotation, as similar domains may occur in proteins with very different functions. Once accepted into the sequence databases, incorrect functional assignments become available for mass propagation and the consequences of incorporating those errors in further "in silico" experiments are potentially catastrophic. One solution to this problem is to collate families of proteins with demonstrable homologous relationships, derive a pattern that represents the essence of those relationships, and use this as a signature to trawl for similarity in the sequence databases. This approach not only provides a more sensitive model of evolution, but also allows annotation from all members of the family to contribute to any assignments made. This thesis describes the development of a new search method (FingerPRINTScan) that exploits the familial models in the PRINTS database to provide more powerful diagnosis of evolutionary relationships. FingerPRINTScan is both selective and sensitive, allowing both precise identification of super-family, family and sub-family relationships, and the detection of more distant ones. Illustrations of the diagnostic performance of the method are given with respect to the haemoglobin and transfer RNA synthetase families, and whole genome data.FingerPRINTScan has become widely used in the biological community, e.g. as the primary search interface to PRINTS via a dedicated web site at the university of Manchester, and as one of the search components of InterPro at the European Bioinformatics Institute (EBI). Furthermore, it is currently responsible for facilitating the use of PRINTS in a number of significant annotation roles, such as the automatic annotation of TrEMBL at the EBI, and as part of the computational suite used to annotate the Drosophila melanogaster genome at Celera Genomics

UCL Discovery

Financing the Impact of Terrorism: Can Insurers Cope?

Author: Scordis Nicos A.
Publication venue: St. John\u27s Law Scholarship Repository
Publication date: 30/04/2012
Field of study

bepress Legal Repository

St. John's University School of Law

持続的社会に向けた官民パートナーシップによる保険スキームに関する考察

Author: SCORDIS Nicos A.
諏澤吉彦
Publication venue: 京都産業大学マネジメント研究会
Publication date: 01/09/2013
Field of study

　本研究は，官民パートナーシップによる保険スキームが，持続的社会の実現に貢献し得るかどうかを，生活保障システム，自然災害保険および賠償責任保険に焦点を当てて探ることを目的としている．保険が，そのリスク移転機能を発揮するためには，インセンティブ問題や過大な資本コストなどの諸要因により損なわれるリスクの保険可能性を，低コストで補完する必要がある．このことについて生活保障システムでは，公的保険と民間保険の組合せによる二層構造が，モラルハザードと逆選択を効果的に縮小し得ることがわかった．自然災害保険においては，補償の制限，再保険および保険料率算出などへの公的関与が，保険カバーの安定供給に貢献するいっぽうで，高リスク地域での過度の財物建設などの問題を引き起こす可能性が見出された．自動車損害賠償や製造物責任などに対する賠償責任保険については，過失責任主義の修正が，安全努力を促進するいっぽうで，付保強制や保険料率規制が，逆選択とモラルハザードの問題を悪化させるおそれがあった

Kyoto Sangyo University Academic Repository / 京都産業大学学術リ

The Impact of Insurance on a Sustainable Society Exposed to Natural Disaster Risks

Author: SCORDIS Nicos A.
諏澤吉彦
Publication venue: 京都産業大学マネジメント研究会
Publication date: 01/09/2014
Field of study

Natural disasters caused by seismic activity and extreme weather events have an increasingly significant impact. This rise is, at least partly, attributed to global warming and/or economic growth in disasterprone areas. Despite the encouragement by the Principles for Sustainable Insurance (PSI) suggesting that insurers finance macroeconomic risk, it is challenging for the private market alone to do so. A viable alternative is to finance macroeconomic risk through collaborations between insurers and governments (or other public institutions). We examine model plans of such private-public partnerships currently operating in Asia, North America, and Europe. We identify commonalities in the different plans including coverage limitations, government-sponsored reinsurance, strict rate regulation, and compulsory participation. We conclude that the plans contain features complementary to the insurability-of-risk concept and that they preserve the availability of insurance coverage. These features, however, exacerbate basis risk, encourage excessive development in high-risk locations, and increase the cost of screening uninsured exposures. We also observe that attempts to improve on one attribute of the plan create problems in other attributes. Finally, we offer suggestions for improving the design of public-private insurance plans

Kyoto Sangyo University Academic Repository / 京都産業大学学術リ

The BIOMarkers in Atopic Dermatitis and Psoriasis (BIOMAP) glossary: developing a lingua franca to facilitate data harmonization and cross‐cohort analyses

Author: Apfelbacher
BIOMAP consortium
Bosma
Broderick
Christian
Dand
Flohr
Ghosh
Hangel
Hubenthal
Middelkamp-Hup
Min Josine L
Musters
Paternoster Lavinia
Rodriguez
Satagopam
Scordis
Smith Catherine
Spuls
Szymczak
Weidinger
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

Explore Bristol Research

Recommended from our members

Predicting seizure recurrence after an initial seizure-like episode from routine clinical notes using large language models: A retrospective cohort study

Author: Ali Waqar
Alsentzer Emily
Bartmann Ana Paula
Beaulieu-Jones Brett K.
de Jong Johann
Kohane Isaac
Patra Arijit
Scordis Phil
Villamar Mauricio F.
Wissel Benjamin D.
Publication venue
Publication date: 26/11/2023
Field of study

Background: The evaluation and management of first-time seizure-like events in children can be difficult because these episodes are not always directly observed and might be epileptic seizures or other conditions (seizure mimics). We aimed to evaluate whether machine learning models using real-world data could predict seizure recurrence after an initial seizure-like event. Methods: This retrospective cohort study compared models trained and evaluated on two separate datasets between Jan 1, 2010, and Jan 1, 2020: electronic medical records (EMRs) at Boston Children's Hospital and de-identified, patient-level, administrative claims data from the IBM MarketScan research database. The study population comprised patients with an initial diagnosis of either epilepsy or convulsions before the age of 21 years, based on International Classification of Diseases, Clinical Modification (ICD-CM) codes. We compared machine learning-based predictive modelling using structured data (logistic regression and XGBoost) with emerging techniques in natural language processing by use of large language models. Findings: The primary cohort comprised 14 021 patients at Boston Children's Hospital matching inclusion criteria with an initial seizure-like event and the comparison cohort comprised 15 062 patients within the IBM MarketScan research database. Seizure recurrence based on a composite expert-derived definition occurred in 57% of patients at Boston Children's Hospital and 63% of patients within IBM MarketScan. Large language models with additional domain-specific and location-specific pre-training on patients excluded from the study (F1-score 0·826 [95% CI 0·817-0·835], AUC 0·897 [95% CI 0·875-0·913]) performed best. All large language models, including the base model without additional pre-training (F1-score 0·739 [95% CI 0·738-0·741], AUROC 0·846 [95% CI 0·826-0·861]) outperformed models trained with structured data. With structured data only, XGBoost outperformed logistic regression and XGBoost models trained with the Boston Children's Hospital EMR (logistic regression: F1-score 0·650 [95% CI 0·643-0·657], AUC 0·694 [95% CI 0·685-0·705], XGBoost: F1-score 0·679 [0·676-0·683], AUC 0·725 [0·717-0·734]) performed similarly to models trained on the IBM MarketScan database (logistic regression: F1-score 0·596 [0·590-0·601], AUC 0·670 [0·664-0·675], XGBoost: F1-score 0·678 [0·668-0·687], AUC 0·710 [0·703-0·714]). Interpretation: Physician's clinical notes about an initial seizure-like event include substantial signals for prediction of seizure recurrence, and additional domain-specific and location-specific pre-training can significantly improve the performance of clinical large language models, even for specialised cohorts.</p

Knowledge UChicago

The Molecule Pages database

Author: Attwood
B. Riley
B. Saunders
Bader
Benson
E. Chenette
Finn
Gilman
Letunic
Li
M. Day
Mishra
Mulder
S. Lyon
S. Subramaniam
Scordis
Sonnhammer
Stark
Subramaniam
Publication venue: Oxford University Press
Publication date
Field of study

The UCSD-Nature Signaling Gateway Molecule Pages (http://www.signaling-gateway.org/molecule) provides essential information on more than 3800 mammalian proteins involved in cellular signaling. The Molecule Pages contain expert-authored and peer-reviewed information based on the published literature, complemented by regularly updated information derived from public data source references and sequence analysis. The expert-authored data includes both a full-text review about the molecule, with citations, and highly structured data for bioinformatics interrogation, including information on protein interactions and states, transitions between states and protein function. The expert-authored pages are anonymously peer reviewed by the Nature Publishing Group. The Molecule Pages data is present in an object-relational database format and is freely accessible to the authors, the reviewers and the public from a web browser that serves as a presentation layer. The Molecule Pages are supported by several applications that along with the database and the interfaces form a multi-tier architecture. The Molecule Pages and the Signaling Gateway are routinely accessed by a very large research community

Crossref

PubMed Central

The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012

Author: A. Coletta
A. L. Mitchell
A. Pavlopoulou
A. Theodosiou
Altschul
Apweiler
Attwood
Attwood
Attwood
Attwood
C. Roma-Mateo
Chen
G. Muirhead
Gilks
Henikoff
Huang
I. Popov
Kawamura
Nordle
P. B. Philippou
Roma-Mateo
Schnoes
Scordis
Sonnhammer
T. K. Attwood
Vaughan
Wong
Wright
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

The PRINTS database, now in its 21st year, houses a collection of diagnostic protein family ‘fingerprints’. Fingerprints are groups of conserved motifs, evident in multiple sequence alignments, whose unique inter-relationships provide distinctive signatures for particular protein families and structural/functional domains. As such, they may be used to assign uncharacterized sequences to known families, and hence to infer tentative functional, structural and/or evolutionary relationships. The February 2012 release (version 42.0) includes 2156 fingerprints, encoding 12 444 individual motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. Here, we report the current status of the database, and introduce a number of recent developments that help both to render a variety of our annotation and analysis tools easier to use and to make them more widely available

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Dokuz Eylul University Research Information System

Clustering of Alzheimer's and Parkinson's disease based on genetic burden of shared molecular mechanisms

Author: Corvol J.C. (Jean Christophe)
Domingo-Fernandez D.
Emon M.A.
Frohlich H.
Heinson A.
Hofmann-Apitius M
Scordis P.
Sood M.
Vrooman H.A. (Henri)
Wu P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

One of the visions of precision medicine has been to re-define disease taxonomies based on molecular characteristics rather than on phenotypic evidence. However, achieving this goal is highly challenging, specifically in neurology. Our contribution is a machine-learning based joint molecular subtyping of Alzheimer’s (AD) and Parkinson’s Disease (PD), based on the genetic burden of 15 molecular mechanisms comprising 27 proteins (e.g. APOE) that have been described in both diseases. We demonstrate that our joint AD/PD clustering using a combination of sparse autoencoders and sparse non-negative matrix factorization is reproducible and can be associated with significant differences of AD and PD patient subgroups on a clinical, pathophysiological and molecular level. Hence, clusters are disease-associated. To our knowledge this work is the first demonstration of a mechanism based stratification in the field of neurodegenerative diseases. Overall, we thus see this work as an important step towards a molecular mechanism-based taxonomy of neurological disorders, which could help in developing better targeted therapies in the future by going beyond classical phenotype based disease definitions

Southampton (e-Prints Soton)

INRIA a CCSD electronic archive server

HAL Descartes

EUR Research Repository

Erasmus University Digital Repository

Fast index based algorithms and software for matching position specific scoring matrices

Author: A Kel
A Sandelin
B Dorohonceanu
D Weeks
G Castillo
H Gonnet
J Henikoff
J Henikoff
J Kärkkäinen
K Quandt
L Goldstein
LR Murphy
M Abouelhoda
M Beckstette
M Beckstette
M Gribskov
Michael Beckstette
N de Bruijn
N Hulo
P Embrechts
P Haverty
P Scordis
R Giegerich
R Staden
R Tatusov
Robert Giegerich
Robert Homann
S Kurtz
S Kurtz
S Rahmann
S Rajasekaran
Stefan Kurtz
T Kasai
T Li
T Wu
T Wu
TK Attwood
V Freschi
V Matys
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task. RESULTS: We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our method is based on dynamic programming and, in contrast to other methods, it employs lazy evaluation of the dynamic programming matrix. We evaluated algorithm ESAsearch with nucleotide PSSMs and with amino acid PSSMs. Compared to the best previous methods, ESAsearch shows speedups of a factor between 17 and 275 for nucleotide PSSMs, and speedups up to factor 1.8 for amino acid PSSMs. Comparisons with the most widely used programs even show speedups by a factor of at least 3.8. Alphabet reduction yields an additional speedup factor of 2 on amino acid sequences compared to results achieved with the 20 symbol standard alphabet. The lazy evaluation method is also much faster than previous methods, with speedups of a factor between 3 and 330. CONCLUSION: Our analysis of ESAsearch reveals sublinear runtime in the expected case, and linear runtime in the worst case for sequences not shorter than | [Formula: see text] |(m )+ m - 1, where m is the length of the PSSM and [Formula: see text] a finite alphabet. In practice, ESAsearch shows superior performance over the most widely used programs, especially for DNA sequences. The new algorithm for accurate on-the-fly calculations of thresholds has the potential to replace formerly used approximation approaches. Beyond the algorithmic contributions, we provide a robust, well documented, and easy to use software package, implementing the ideas and algorithms presented in this manuscript

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University