193 research outputs found

    Bringing named entity recognition on Drupal content management system

    Get PDF
    Publicado em "8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014)"Content management systems and frameworks (CMS/F) play a key role in Web development. They support common Web operations and provide for a number of optional modules to implement customized functionalities. Given the increasing demand for text mining (TM) applications, it seems logical that CMS/F extend their offer of TM modules. In this regard, this work contributes to Drupal CMS/F with modules that support customized named entity recognition and enable the construction of domain-specific document search engines. Implementation relies on well-recognized Apache Information Retrieval and TM initiatives, namely Apache Lucene, Apache Solr and Apache Unstructured Information Management Architecture (UIMA). As proof of concept, we present here the development of a Drupal CMS/F that retrieves biomedical articles and performs automatic recognition of organism names to enable further organism-driven document screening

    Evaluation of linear classifiers on articles containing pharmacokinetic evidence of drug-drug interactions

    Full text link
    Background. Drug-drug interaction (DDI) is a major cause of morbidity and mortality. [...] Biomedical literature mining can aid DDI research by extracting relevant DDI signals from either the published literature or large clinical databases. However, though drug interaction is an ideal area for translational research, the inclusion of literature mining methodologies in DDI workflows is still very preliminary. One area that can benefit from literature mining is the automatic identification of a large number of potential DDIs, whose pharmacological mechanisms and clinical significance can then be studied via in vitro pharmacology and in populo pharmaco-epidemiology. Experiments. We implemented a set of classifiers for identifying published articles relevant to experimental pharmacokinetic DDI evidence. These documents are important for identifying causal mechanisms behind putative drug-drug interactions, an important step in the extraction of large numbers of potential DDIs. We evaluate performance of several linear classifiers on PubMed abstracts, under different feature transformation and dimensionality reduction methods. In addition, we investigate the performance benefits of including various publicly-available named entity recognition features, as well as a set of internally-developed pharmacokinetic dictionaries. Results. We found that several classifiers performed well in distinguishing relevant and irrelevant abstracts. We found that the combination of unigram and bigram textual features gave better performance than unigram features alone, and also that normalization transforms that adjusted for feature frequency and document length improved classification. For some classifiers, such as linear discriminant analysis (LDA), proper dimensionality reduction had a large impact on performance. Finally, the inclusion of NER features and dictionaries was found not to help classification.Comment: Pacific Symposium on Biocomputing, 201

    A rational framework for production decision making in blood establishments

    Get PDF
    SAD_BaSe is a blood bank data analysis software, created to assist in the management of blood donations and the blood production chain in blood establishments. In particular, the system keeps track of several collection and production indicators, enables the definition of collection and production strategies, and the measurement of quality indicators required by the Quality Management System regulating the general operation of blood establishments. This paper describes the general scenario of blood establishments and its main requirements in terms of data management and analysis. It presents the architecture of SAD_BaSe and identifies its main contributions. Specifically, it brings forward the generation of customized reports driven by decision making needs and the use of data mining techniques in the analysis of donor suspensions and donation discards

    A multilayered graph-based framework to explore behavioural phenomena in social media conversations

    Get PDF
    Objective Social media is part of current health communications. This research aims to delve into the effects of social contagion, biased assimilation, and homophily in building and changing health opinions on social media. Materials and Methods Conversations about COVID-19 vaccination on English and Spanish Twitter are the case studies. A new multilayered graph-based framework supports the integrated analysis of content similarity within and across posts, users, and conversations to interpret contrasting and confluent user stances. Deep learning models are applied to infer stance. Graph centrality and homophily scores support the interpretation of information reproduction. Results The results show that semantically related English posts tend to present a similar stance about COVID-19 vaccination (rstance=0.51) whereas Spanish posts are more heterophilic (rstance=0.38). Neither case showed evidence of homophily regarding user influence or vaccine hashtags. Graph filters for Pfizer and Astrazeneca with a similarity threshold of 0.85 show stance homophily in English scenarios (i.e. rstance=0.45 and rstance=0.58, respectively) and small homophily in Spanish scenarios (i.e. r=0.12 and r=0.3, respectively). Highly connected users are a minority and are not socially influential. Spanish conversations showed stance homophily, i.e. most of the connected conversations promote vaccination (rstance=0.42), whereas English conversations are more likely to offer contrasting stances. Conclusion The methodology proposed for quantifying the impact of natural and intentional social behaviours in health information reproduction can be applied to any of the main social platforms and any given topic of conversation. Its effectiveness was demonstrated by two case studies describing English and Spanish demographic and sociocultural scenarios.This study was supported by MCIN/AEI/ 10.13039/501100011033 under the scope of the CURMIS4th project (Grant PID2020–113673RB-I00), the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) under the scope of the strategic funding of ED431C2018/55-GRC Competitive Reference Group, the “Centro singular de investigación de Galicia” (accreditation 2019–2022), and the Portuguese Foundation for Science and Technology(FCT) under the scope of the strategic funding of UIDB/04469/2020 unit. SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from the University of Vigo for hosting its IT infrastructure. Funding for open access charge: Universidade de Vigo/CISUG.info:eu-repo/semantics/publishedVersio

    SAD_BaSe: a blood bank data analysis software

    Get PDF
    Publicado em "6th International Conference on Practical Applications of Computational Biology & Bioinformatics (ISBN : 978-3-642-28838-8)"The main goal of this project was to build a Web-based information system – SAD_BaSe – that monitors blood donations and the blood production chain in a user-friendly way. In particular, the system keeps track of several data indicators and supports their analysis, enabling the definition of collection and production strategies and, the measurement of quality indicators required by the Quality Management System of blood establishments. Data mining supports the analysis of donor eligibility criteria

    Using text mining techniques for classical music scores analysis

    Get PDF
    Music Classification is a particular area of Computational Musicology that provides valuable insights about the evolving of compo- sition patterns and assists in catalogue generation. The proposed work detaches from former works by classifying music based on music score in- formation. Text Mining techniques support music score processing while Classification techniques are used in the construction of decision mod- els. Although research is still at its earliest beginnings, the work already provides valuable contributes to symbolic music representation process- ing and subsequent analysis. Score processing involved the counting of ascending and descending chromatic intervals, note duration and meta- information tagging. Analysis involved feature selection and the evalu- ation of several data mining algorithms, ensuring extensibility towards larger repositories or more complex problems. Experiments report the analysis of composition epochs on a subset of the Mutopia project open archive of classical LilyPond-annotated music scores

    Evaluating web site structure based on navigation profiles and site topology

    Get PDF
    This work aims at pointing out the benefits of a topology-oriented wide scope, but differentiated, profile analysis. The goal was to conciliate advanced common website usage profiling techniques with the analysis of the website's topology information, outputting valuable knowledge in an intuitive and comprehensible way. Server load balancing, crawler activity evaluation and Web site restructuring are the primary analysis concerns and, in this regard, experiments over six month data of a real-world Web site were considered successful.Fundação para a Ciência e a Tecnologia (FCT

    Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

    Get PDF
    Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents

    The in vitro and the in silico power couple: facilitating the discovery of novel anti infective strategies based on antimicrobial peptides and quorum sensing inhibitors

    Get PDF
    Book of Abstracts of CEB Annual Meeting 2017[Excerpt] The persistent growth of antibiotic-resistance and the resilience of biofilm-related infections is pressing researchers to develop novel strategies to control infectious diseases. New antimicrobial strategies, namely concerning the use of i) antimicrobial peptides (AMP) (natural compounds with alternative mechanisms of action), ii) quorum-sensing inhibitors (QSI) (destabilisers of key communication mechanisms that regulate virulence and biofilm formation); and iii) antimicrobial combinations (can lower effective concentrations and achieve synergy), can lead to more effective therapeutics for this ever-growing, world-wide problem. [...]info:eu-repo/semantics/publishedVersio

    ProFuelDB : an open-access database of physiological properties of biofuel-producing anaerobic prokaryotes

    Get PDF
    Anaerobic microorganisms are attractive for the synthesis of fuels and chemicals, but information on the physiology of culturable anaerobes is dispersed in scientific literature. Herewith we present the ProFuelDB, a web-based publicly available database prototype compiling information on the physiology of anaerobic prokaryotes with relevance for biofuels production. It is foreseen that this prototype will evolve into a broader database of physiological properties of anaerobic prokaryotes with biotechnological relevance
    corecore