151 research outputs found

    Information Is Selection-A Review of Basics Shows Substantial Potential for Improvement of Digital Information Representation

    Get PDF
    Any piece of information is a selection from a set of possibilities. In this paper, this set is called a "domain". Digital information consists of number sequences, which are selections from a domain. At present, these number sequences are defined contextually in a very variable way, which impairs their comparability. Therefore, global uniformly defined "domain vectors" (DVs), with a structure containing a "Uniform Locator" ("UL"), referred to as "UL plus number sequence", are proposed. The "UL" is an efficient global pointer to the uniform online definition of the subsequent number sequence. DVs are globally defined, identified, comparable, and searchable by criteria which users can define online. In medicine, for example, patients, doctors, and medical specialists can define DVs online and can, therefore, form global criteria which are important for certain diagnoses. This allows for the immediate generation of precise diagnostic specific statistics of "similar medical cases", in order to discern the best therapy. The introduction of a compact DV data structure may substantially improve the digital representation of medical information

    Network-driven strategies to integrate and exploit biomedical data

    Get PDF
    [eng] In the quest for understanding complex biological systems, the scientific community has been delving into protein, chemical and disease biology, populating biomedical databases with a wealth of data and knowledge. Currently, the field of biomedicine has entered a Big Data era, in which computational-driven research can largely benefit from existing knowledge to better understand and characterize biological and chemical entities. And yet, the heterogeneity and complexity of biomedical data trigger the need for a proper integration and representation of this knowledge, so that it can be effectively and efficiently exploited. In this thesis, we aim at developing new strategies to leverage the current biomedical knowledge, so that meaningful information can be extracted and fused into downstream applications. To this goal, we have capitalized on network analysis algorithms to integrate and exploit biomedical data in a wide variety of scenarios, providing a better understanding of pharmacoomics experiments while helping accelerate the drug discovery process. More specifically, we have (i) devised an approach to identify functional gene sets associated with drug response mechanisms of action, (ii) created a resource of biomedical descriptors able to anticipate cellular drug response and identify new drug repurposing opportunities, (iii) designed a tool to annotate biomedical support for a given set of experimental observations, and (iv) reviewed different chemical and biological descriptors relevant for drug discovery, illustrating how they can be used to provide solutions to current challenges in biomedicine.[cat] En la cerca d’una millor comprensió dels sistemes biològics complexos, la comunitat científica ha estat aprofundint en la biologia de les proteïnes, fàrmacs i malalties, poblant les bases de dades biomèdiques amb un gran volum de dades i coneixement. En l’actualitat, el camp de la biomedicina es troba en una era de “dades massives” (Big Data), on la investigació duta a terme per ordinadors se’n pot beneficiar per entendre i caracteritzar millor les entitats químiques i biològiques. No obstant, la heterogeneïtat i complexitat de les dades biomèdiques requereix que aquestes s’integrin i es representin d’una manera idònia, permetent així explotar aquesta informació d’una manera efectiva i eficient. L’objectiu d’aquesta tesis doctoral és desenvolupar noves estratègies que permetin explotar el coneixement biomèdic actual i així extreure informació rellevant per aplicacions biomèdiques futures. Per aquesta finalitat, em fet servir algoritmes de xarxes per tal d’integrar i explotar el coneixement biomèdic en diferents tasques, proporcionant un millor enteniment dels experiments farmacoòmics per tal d’ajudar accelerar el procés de descobriment de nous fàrmacs. Com a resultat, en aquesta tesi hem (i) dissenyat una estratègia per identificar grups funcionals de gens associats a la resposta de línies cel·lulars als fàrmacs, (ii) creat una col·lecció de descriptors biomèdics capaços, entre altres coses, d’anticipar com les cèl·lules responen als fàrmacs o trobar nous usos per fàrmacs existents, (iii) desenvolupat una eina per descobrir quins contextos biològics corresponen a una associació biològica observada experimentalment i, finalment, (iv) hem explorat diferents descriptors químics i biològics rellevants pel procés de descobriment de nous fàrmacs, mostrant com aquests poden ser utilitzats per trobar solucions a reptes actuals dins el camp de la biomedicina

    Efficient search and comparison algorithms for 3D protein binding site retrieval and structure alignment from large-scale databases

    Get PDF
    Finding similar 3D structures is crucial for discovering potential structural, evolutionary, and functional relationships among proteins. As the number of known protein structures has dramatically increased, traditional methods can no longer provide the life science community with the adequate informatics capability needed to conduct large-scale and complex analyses. A suite of high-throughput and accurate protein structure search and comparison methods is essential. To meet the needs of the community, we develop several bioinformatics methods for protein binding site comparison and global structure alignment. First, we developed an efficient protein binding site search that is based on extracting geometric features both locally and globally. The main idea of this work was to capture spatial relationships among landmarks of binding site surfaces and bfuild a vocabulary of visual words to represent the characteristics of the surfaces. A vector model was then used to speed up the search of similar surfaces that share similar visual words with the query interface. Second, we developed an approach for accurate protein binding site comparison. Our algorithm provides an accurate binding site alignment by applying a two-level heuristic process which progressively refines alignment results from coarse surface point level to accurate residue atom level. This setting allowed us to explore different combinations of pairs of corresponding residues, thus improving the alignment quality of the binding site surfaces. Finally, we introduced a parallel algorithm for global protein structure alignment. Specifically, to speed up the time-consuming structure alignment process of protein 3D structures, we designed a parallel protein structure alignment framework to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, the framework is capable of parallelizing traditional structure alignment algorithms. Our findings can be applied in various research areas, such as prediction of protein inte

    Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining

    Get PDF
    Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy. Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task. A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process. This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques. Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert
    corecore