488 research outputs found

    A multilayer network approach for guiding drug repositioning in neglected diseases

    Get PDF
    Drug development for neglected diseases has been historically hampered due to lack of market incentives. The advent of public domain resources containing chemical information from high throughput screenings is changing the landscape of drug discovery for these diseases. In this work we took advantage of data from extensively studied organisms like human, mouse, E. coli and yeast, among others, to develop a novel integrative network model to prioritize and identify candidate drug targets in neglected pathogen proteomes, and bioactive drug-like molecules. We modeled genomic (proteins) and chemical (bioactive compounds) data as a multilayer weighted network graph that takes advantage of bioactivity data across 221 species, chemical similarities between 1.7 105 compounds and several functional relations among 1.67 105 proteins. These relations comprised orthology, sharing of protein domains, and shared participation in defined biochemical pathways. We showcase the application of this network graph to the problem of prioritization of new candidate targets, based on the information available in the graph for known compound-target associations. We validated this strategy by performing a cross validation procedure for known mouse and Trypanosoma cruzi targets and showed that our approach outperforms classic alignment-based approaches. Moreover, our model provides additional flexibility as two different network definitions could be considered, finding in both cases qualitatively different but sensible candidate targets. We also showcase the application of the network to suggest targets for orphan compounds that are active against Plasmodium falciparum in high-throughput screens. In this case our approach provided a reduced prioritization list of target proteins for the query molecules and showed the ability to propose new testable hypotheses for each compound. Moreover, we found that some predictions highlighted by our network model were supported by independent experimental validations as found post-facto in the literature.Fil: Berenstein, Ariel José. Fundación Instituto Leloir; Argentina. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Física; ArgentinaFil: Magariños, María Paula. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); ArgentinaFil: Chernomoretz, Ariel. Fundación Instituto Leloir; Argentina. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Física; ArgentinaFil: Fernandez Aguero, Maria Jose. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); Argentin

    The Immune Epitope Database and Analysis Resource Program 2003–2018: reflections and outlook

    Get PDF
    The Immune Epitope Database and Analysis Resource (IEDB) contains information related to antibodies and T cells across an expansive scope of research fields (infectious diseases, allergy, autoimmunity, and transplantation). Capture and representation of the data to reflect growing scientific standards and techniques have required continual refinement of our rigorous curation and query and reporting processes beginning with the automated classification of over 28 million PubMed abstracts, and resulting in easily searchable data from over 20,000 published manuscripts. Data related to MHC binding and elution, nonpeptidics, natural processing, receptors, and 3D structure is first captured through manual curation and subsequently maintained through recuration to reflect evolving scientific standards. Upon promotion to the free, public database, users can query and export records of specific relevance via the online web portal which undergoes iterative development to best enable efficient data access. In parallel, the companion Analysis Resource site hosts a variety of tools that assist in the bioinformatic analyses of epitopes and related structures, which can be applied to IEDB-derived and independent datasets alike. Available tools are classified into two categories: analysis and prediction. Analysis tools include epitope clustering, sequence conservancy, and more, while prediction tools cover T and B cell epitope binding, immunogenicity, and TCR/BCR structures. In addition to these tools, benchmarking servers which allow for unbiased performance comparison are also offered. In order to expand and support the user-base of both the database and Analysis Resource, the research team actively engages in community outreach through publication of ongoing work, conference attendance and presentations, hosting of user workshops, and the provision of online help. This review provides a description of the IEDB database infrastructure, curation and recuration processes, query and reporting capabilities, the Analysis Resource, and our Community Outreach efforts, including assessment of the impact of the IEDB across the research community.Fil: Martini, Sheridan. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Nielsen, Morten. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; Argentina. Technical University of Denmark; DinamarcaFil: Peters, Bjoern. La Jolla Institute for Allergy and Immunology; Estados Unidos. University of California at San Diego; Estados UnidosFil: Sette, Alessandro. La Jolla Institute for Allergy and Immunology; Estados Unidos. University of California at San Diego; Estados Unido

    Voice-over documentaries: synchronization techniques in English-Spanish and English-Polish translation

    Get PDF
    This thesis discusses a fundamental aspect related to voice-over translation in documentaries: synchronization techniques. The analysis is based on the classification of different types of synchronies— action synchrony, kinetic synchrony, voice-over isochrony, and literal synchrony— proposed by Franco, Matamala, and Orero (2010). The aim of this study is to provide insight about why and when different synchronies may be triggered, both by textual and/or by extralinguistic factors, in filmic material. The analysis is thus based on a comparison of the Spanish and Polish voiceover translations of the same documentary episode of the series Seconds from Disaster originally produced in English. This qualitative comparison demonstrates how the existing asymmetries between voice-over practices in Spanish and in Polish influence the synchronization process, and consequently, the reception of the movie as well

    Constructing a semantic predication gold standard from the biomedical literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology.</p> <p>Results</p> <p>We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations.</p> <p>Conclusions</p> <p>While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.</p

    CFAR Ship Detection in Polarimetric Synthetic Aperture Radar Images Based on Whitening Filter

    Get PDF
    Polarimetric whitening filter (PWF) can be used to filter polarimetric synthetic aperture radar (PolSAR) images to improve the contrast between ships and sea clutter background. For this reason, the output of the filter can be used to detect ships. This paper deals with the setting of the threshold over PolSAR images filtered by the PWF. Two parameter-constant false alarm rate (2P-CFAR) is a common detection method used on whitened polarimetric images. It assumes that the probability density function (PDF) of the filtered image intensity is characterized by a log-normal distribution. However, this assumption does not always hold. In this paper, we propose a systemic analytical framework for CFAR algorithms based on PWF or multi-look PWF (MPWF). The framework covers the entire log-cumulants space in terms of the textural distributions in the product model, including the constant, gamma, inverse gamma, Fisher, beta, inverse beta, and generalized gamma distributions (GΓDs). We derive the analytical forms of the PDF for each of the textural distributions and the probability of false alarm (PFA). Finally, the threshold is derived by fixing the false alarm rate (FAR). Experimental results using both the simulated and real data demonstrate that the derived expressions and CFAR algorithms are valid and robust

    Entity-centric knowledge discovery for idiosyncratic domains

    Get PDF
    Technical and scientific knowledge is produced at an ever-accelerating pace, leading to increasing issues when trying to automatically organize or process it, e.g., when searching for relevant prior work. Knowledge can today be produced both in unstructured (plain text) and structured (metadata or linked data) forms. However, unstructured content is still themost dominant formused to represent scientific knowledge. In order to facilitate the extraction and discovery of relevant content, new automated and scalable methods for processing, structuring and organizing scientific knowledge are called for. In this context, a number of applications are emerging, ranging fromNamed Entity Recognition (NER) and Entity Linking tools for scientific papers to specific platforms leveraging information extraction techniques to organize scientific knowledge. In this thesis, we tackle the tasks of Entity Recognition, Disambiguation and Linking in idiosyncratic domains with an emphasis on scientific literature. Furthermore, we study the related task of co-reference resolution with a specific focus on named entities. We start by exploring Named Entity Recognition, a task that aims to identify the boundaries of named entities in textual contents. We propose a newmethod to generate candidate named entities based on n-gram collocation statistics and design several entity recognition features to further classify them. In addition, we show how the use of external knowledge bases (either domain-specific like DBLP or generic like DBPedia) can be leveraged to improve the effectiveness of NER for idiosyncratic domains. Subsequently, we move to Entity Disambiguation, which is typically performed after entity recognition in order to link an entity to a knowledge base. We propose novel semi-supervised methods for word disambiguation leveraging the structure of a community-based ontology of scientific concepts. Our approach exploits the graph structure that connects different terms and their definitions to automatically identify the correct sense that was originally picked by the authors of a scientific publication. We then turn to co-reference resolution, a task aiming at identifying entities that appear using various forms throughout the text. We propose an approach to type entities leveraging an inverted index built on top of a knowledge base, and to subsequently re-assign entities based on the semantic relatedness of the introduced types. Finally, we describe an application which goal is to help researchers discover and manage scientific publications. We focus on the problem of selecting relevant tags to organize collections of research papers in that context. We experimentally demonstrate that the use of a community-authored ontology together with information about the position of the concepts in the documents allows to significantly increase the precision of tag selection over standard methods

    Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case Study of Stack Overflow

    Get PDF
    Technical question and answer (Q&A) websites have changed how developers seek information on the web and become more popular due to the shortcomings in official documentation and alternative knowledge sharing resources. Stack Overflow (SO) is one of the largest and most popular online Q&A websites for developers where they can share knowledge by answering questions and learn new skills by asking questions. Unfortunately, a large number of questions (up to 29%) are not answered at all, which might hurt the quality or purpose of this community-oriented knowledge base. In this thesis, we first attempt to detect the potentially unanswered questions during their submission using machine learning models. We compare unanswered and answered questions quantitatively and qualitatively. The quantitative analysis suggests that topics discussed in the question, the experience of the question submitter, and readability of question texts could often determine whether a question would be answered or not. Our qualitative study also reveals why the questions remain unanswered that could guide novice users to improve their questions. During analyzing the questions of SO, we see that many of them remain unanswered and unresolved because they contain such code segments that could potentially have programming issues (e.g., error, unexpected behavior); unfortunately, the issues could always not be reproduced by other users. This irreproducibility of issues might prevent questions of SO from getting answers or appropriate answers. In our second study, we thus conduct an exploratory study on the reproducibility of the issues discussed in questions and the correlation between issue reproducibility status (of questions) and corresponding answer meta-data such as the presence of an accepted answer. According to our analysis, a question with reproducible issues has at least three times higher chance of receiving an accepted answer than the question with irreproducible issues. However, users can improve the quality of questions and answers by editing. Unfortunately, such edits may be rejected (i.e., rollback) due to undesired modifications and ambiguities. We thus offer a comprehensive overview of reasons and ambiguities in the SO rollback edits. We identify 14 reasons for rollback edits and eight ambiguities that are often present in those edits. We also develop algorithms to detect ambiguities automatically. During the above studies, we find that about half of the questions that received working solutions have negative scores. About 18\% of the accepted answers also do not score the maximum votes. Furthermore, many users are complaining against the downvotes that are cast to their questions and answers. All these findings cast serious doubts on the reliability of the evaluation mechanism employed at SO. We thus concentrate on the assessment mechanism of SO to ensure a non-biased, reliable quality assessment mechanism of SO. This study compares the subjective assessment of questions with their objective assessment using 2.5 million questions and ten text analysis metrics. We also develop machine learning models to classify the promoted and discouraged questions and predict them during their submission time. We believe that the findings from our studies and proposed techniques have the potential to (1) help the users to ask better questions with appropriate code examples, and (2) improve the editing and assessment mechanism of SO to promote better content quality

    Examining Twitter Content of State Emergency Management During Hurricane Joaquin

    Get PDF
    Those tasked with disseminating life-protecting messages during crises have many factors to consider. Social media sites have become an information source for individuals during these times, and more research is needed examining the use of specific message strategies by emergency management agencies that may elicit attention and retransmission. This study examines Twitter content concerning Hurricane Joaquin. Content analysis of tweets from state emergency management accounts was performed to provide an overview of the content and stylistic elements used in tweets associated with the event. The findings are discussed in the context of both past research on the matter and implications for emergency management agencies responding to high-consequence events

    Conducting usability testing in agile software development environment

    Get PDF
    Abstract. This Master’s thesis conducted a qualitative approached case study with triangulation to action research for usability testing in Agile software development. First, there was done literature review to create theoretical basis for research. After literature review, the research method was constructed from usability testing methods and qualitative research tools. When research method was explained, the implementation of the research method was opened to tell how the created research method was used. Explained implementation was followed by presenting the findings. Then discussion conducted cohesion between literature, research method, implementation, and findings. Finally, conclusions summarized the main points of the thesis. Literature review brought up several usability testing methods. Research method formed to use UCD, interview, SUS, and TAP from the usability testing methods that were found. Implementation conducted eight questions for semi structured interview, five tasks for second sprint’s TAP and four tasks for third sprint’s SUS. It was identified that TAP and SUS can be used in the same usability testing session. Moreover, there were found nine hypotheses to be tested. Eight of those hypotheses were successfully tested. It was also found that action research requires adaption during the research and Agile generates changes in the plans successfully. Usability testing should be done by using at least one usability testing method and taking end user into test sessions as soon as possible. Semi structured interview’s benefit is to have more open conversation type moment with the test user. TAP achieves descriptive analysis considering how test user experiences usability of the application. SUS offers general usability level of the application in the numeric scale. Improving steps for usability testing can be seen from type of the need for the different usability responses. Another way to identify improvement is to test a new usability testing method. One way to bring usability testing in Agile software development is to conduct usability testing at the end of every sprint
    corecore