6,732 research outputs found

    i-JEN: Visual interactive Malaysia crime news retrieval system

    Get PDF
    Supporting crime news investigation involves a mechanism to help monitor the current and past status of criminal events. We believe this could be well facilitated by focusing on the user interfaces and the event crime model aspects. In this paper we discuss on a development of Visual Interactive Malaysia Crime News Retrieval System (i-JEN) and describe the approach, user studies and planned, the system architecture and future plan. Our main objectives are to construct crime-based event; investigate the use of crime-based event in improving the classification and clustering; develop an interactive crime news retrieval system; visualize crime news in an effective and interactive way; integrate them into a usable and robust system and evaluate the usability and system performance. The system will serve as a news monitoring system which aims to automatically organize, retrieve and present the crime news in such a way as to support an effective monitoring, searching, and browsing for the target users groups of general public, news analysts and policemen or crime investigators. The study will contribute to the better understanding of the crime data consumption in the Malaysian context as well as the developed system with the visualisation features to address crime data and the eventual goal of combating the crimes

    Protein (Multi-)Location Prediction: Using Location Inter-Dependencies in a Probabilistic Framework

    Full text link
    Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins, assuming that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems have attempted to predict multiple locations of proteins, they typically treat locations as independent or capture inter-dependencies by treating each locations-combination present in the training set as an individual location-class. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the multiple-location-prediction process, using a collection of Bayesian network classifiers. We evaluate our system on a dataset of single- and multi-localized proteins. Our results, obtained by incorporating inter-dependencies are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without restricting predictions to be based only on location-combinations present in the training set.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Informaticology: combining Computer Science, Data Science, and Fiction Science

    Full text link
    Motivated by an intention to remedy current complications with Dutch terminology concerning informatics, the term informaticology is positioned to denote an academic counterpart of informatics where informatics is conceived of as a container for a coherent family of practical disciplines ranging from computer engineering and software engineering to network technology, data center management, information technology, and information management in a broad sense. Informaticology escapes from the limitations of instrumental objectives and the perspective of usage that both restrict the scope of informatics. That is achieved by including fiction science in informaticology and by ranking fiction science on equal terms with computer science and data science, and framing (the study of) game design, evelopment, assessment and distribution, ranging from serious gaming to entertainment gaming, as a chapter of fiction science. A suggestion for the scope of fiction science is specified in some detail. In order to illustrate the coherence of informaticology thus conceived, a potential application of fiction to the ontology of instruction sequences and to software quality assessment is sketched, thereby highlighting a possible role of fiction (science) within informaticology but outside gaming

    Case Base Mining for Adaptation Knowledge Acquisition

    Get PDF
    In case-based reasoning, the adaptation of a source case in order to solve the target problem is at the same time crucial and difficult to implement. The reason for this difficulty is that, in general, adaptation strongly depends on domain-dependent knowledge. This fact motivates research on adaptation knowledge acquisition (AKA). This paper presents an approach to AKA based on the principles and techniques of knowledge discovery from databases and data-mining. It is implemented in CABAMAKA, a system that explores the variations within the case base to elicit adaptation knowledge. This system has been successfully tested in an application of case-based reasoning to decision support in the domain of breast cancer treatment

    What's unusual in online disease outbreak news?

    Get PDF
    Background: Accurate and timely detection of public health events of international concern is necessary to help support risk assessment and response and save lives. Novel event-based methods that use the World Wide Web as a signal source offer potential to extend health surveillance into areas where traditional indicator networks are lacking. In this paper we address the issue of systematically evaluating online health news to support automatic alerting using daily disease-country counts text mined from real world data using BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance against expert moderated ProMED-mail postings. Results: We report sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), mean alerts/100 days and F1, at 95% confidence interval (CI) for 287 ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period. Results indicate that W2 had the best F1 with a slight benefit for day of week effect over C2. In drill down analysis we indicate issues arising from the granular choice of country-level modeling, sudden drops in reporting due to day of week effects and reporting bias. Automatic alerting has been implemented in BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news alerts have the potential to enhance manual analytical methods by increasing throughput, timeliness and detection rates. Systematic evaluation of health news aberrations is necessary to push forward our understanding of the complex relationship between news report volumes and case numbers and to select the best performing features and algorithms

    Potentially Polluting Marine Sites GeoDB: An S-100 Geospatial Database as an Effective Contribution to the Protection of the Marine Environment

    Get PDF
    Potentially Polluting Marine Sites (PPMS) are objects on, or areas of, the seabed that may release pollution in the future. A rationale for, and design of, a geospatial database to inventory and manipu-late PPMS is presented. Built as an S-100 Product Specification, it is specified through human-readable UML diagrams and implemented through machine-readable GML files, and includes auxiliary information such as pollution-control resources and potentially vulnerable sites in order to support analyses of the core data. The design and some aspects of implementation are presented, along with metadata requirements and structure, and a perspective on potential uses of the database

    Interoperability and FAIRness through a novel combination of Web technologies

    Get PDF
    Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs

    Gene Expression Profiling of Bronchoalveolar Lavage Cells Preceding a Clinical Diagnosis of Chronic Lung Allograft Dysfunction.

    Get PDF
    BackgroundChronic Lung Allograft Dysfunction (CLAD) is the main limitation to long-term survival after lung transplantation. Although CLAD is usually not responsive to treatment, earlier identification may improve treatment prospects.MethodsIn a nested case control study, 1-year post transplant surveillance bronchoalveolar lavage (BAL) fluid samples were obtained from incipient CLAD (n = 9) and CLAD free (n = 8) lung transplant recipients. Incipient CLAD cases were diagnosed with CLAD within 2 years, while controls were free from CLAD for at least 4 years following bronchoscopy. Transcription profiles in the BAL cell pellets were assayed with the HG-U133 Plus 2.0 microarray (Affymetrix). Differential gene expression analysis, based on an absolute fold change (incipient CLAD vs no CLAD) >2.0 and an unadjusted p-value ≤0.05, generated a candidate list containing 55 differentially expressed probe sets (51 up-regulated, 4 down-regulated).ResultsThe cell pellets in incipient CLAD cases were skewed toward immune response pathways, dominated by genes related to recruitment, retention, activation and proliferation of cytotoxic lymphocytes (CD8+ T-cells and natural killer cells). Both hierarchical clustering and a supervised machine learning tool were able to correctly categorize most samples (82.3% and 94.1% respectively) into incipient CLAD and CLAD-free categories.ConclusionsThese findings suggest that a pathobiology, similar to AR, precedes a clinical diagnosis of CLAD. A larger prospective investigation of the BAL cell pellet transcriptome as a biomarker for CLAD risk stratification is warranted
    corecore