416,963 research outputs found

    Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work

    Full text link
    Deep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bias issues. Moving beyond fixed training datasets will lead to more robust visual systems, especially when deployed on robots in new environments which must train on the objects they encounter there. To make this possible, it is important to break free from the need for manual annotators. Recent work has begun to investigate how to use the massive amount of images available on the Web in place of manual image annotations. We contribute to this research thread with two findings: (1) a study correlating a given level of noisily labels to the expected drop in accuracy, for two deep architectures, on two different types of noise, that clearly identifies GoogLeNet as a suitable architecture for learning from Web data; (2) a recipe for the creation of Web datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy. By combining these two results, we obtain a method for learning powerful deep object models automatically from the Web. We confirm the effectiveness of our approach through object categorization experiments using our Web-derived version of ImageNet on a popular robot vision benchmark database, and on a lifelong object discovery task on a mobile robot.Comment: 8 pages, 7 figures, 3 table

    A scale-out RDF molecule store for distributed processing of biomedical data

    Get PDF
    The computational analysis of protein-protein interaction and biomolecular pathway data paves the way to efficient in silico drug discovery and therapeutic target identification. However, relevant data sources are currently distributed across a wide range of disparate, large-scale, publicly-available databases and repositories and are described using a wide range of taxonomies and ontologies. Sophisticated integration, manipulation, processing and analysis of these datasets are required in order to reveal previously undiscovered interactions and pathways that will lead to the discovery of new drugs. The BioMANTA project focuses on utilizing Semantic Web technologies together with a scale-out architecture to tackle the above challenges and to provide efficient analysis, querying, and reasoning about protein-protein interaction data. This paper describes the initial results of the BioMANTA project. The fully-developed system will allow knowledge representation and processing that are not currently available in typical scale-out or Semantic Web databases. We present the design of the architecture, basic ontology and some implementation details that aim to provide efficient, scalable RDF storage and inferencing. The results of initial performance evaluation are also provided

    Intelligent Agent-Based Data Mining in Electronic Markets

    Get PDF
    The advent of web-based electronic commerce has brought a tremendous increase in the volume of “collectable data” that can be mined for valuable managerial knowledge. Utilizing intelligent agents can enhance the data mining procedures that are employed in this process. We focus on the role of data mining and intelligent agent technology in the B2C and B2B e- commerce models. By identifying the complex nature of information flows between the vast numbers of economic entities, we identify opportunities for applying data mining that can lead ultimately to knowledge discovery

    Using information visualization techniques to support web service discovery

    Get PDF
    The increasing number of web services published over the Web highlights the need for an effective method for users to find appropriate web services. Existing web service discovery methods do not effectively aid a user in finding suitable web services. The current methods provide textual lists of web services that the user is required to explore and manually evaluate. Thus, these methods lead to time-consuming and ineffective web service discovery. The aim of this research was to investigate using information visualization (IV) techniques to effectively support web service discovery. The node-and-link network IV technique was selected as the most appropriate IV technique to visualize web service collections. A prototype, called SerViz, was developed as a tool for interactive visualization of web service collections incorporating the node-and-link IV technique and an alphabetical list-based technique. SerViz used the Programmable Web web service collection as the sample web service collection. A usability evaluation was conducted to compare these techniques. Ninety percent of participants preferred the network IV technique for visualizing web service collections. The network IV technique was also faster for browsing. Several usability problems were identified with the network IV technique. This motivated a need for implementing an alternative IV technique in SerViz. The node-and-link tree IV technique was selected as it was more structured than the network IV technique. A usability evaluation was conducted to compare the network and tree IV techniques. Participants slightly preferred the tree IV technique as the technique to visualize web service collections. The tree IV technique was faster for browsing the web service collection while the network IV technique was faster for searching and filtering. This research has determined that IV techniques can be used to effectively support web service discovery. Future work will involve using IV techniques to support collaborative web service discovery. Keywords: Web Service Discovery, Information Visualization, Web Service Collections, Information Visualization Techniques

    Discovery: an interactive resource for the rational selection and comparison of putative drug target proteins in malaria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Up to half a billion human clinical cases of malaria are reported each year, resulting in about 2.7 million deaths, most of which occur in sub-Saharan Africa. Due to the over-and misuse of anti-malarials, widespread resistance to all the known drugs is increasing at an alarming rate. Rational methods to select new drug target proteins and lead compounds are urgently needed. The Discovery system provides data mining functionality on extensive annotations of five malaria species together with the human and mosquito hosts, enabling the selection of new targets based on multiple protein and ligand properties.</p> <p>Methods</p> <p>A web-based system was developed where researchers are able to mine information on malaria proteins and predicted ligands, as well as perform comparisons to the human and mosquito host characteristics. Protein features used include: domains, motifs, EC numbers, GO terms, orthologs, protein-protein interactions, protein-ligand interactions and host-pathogen interactions among others. Searching by chemical structure is also available.</p> <p>Results</p> <p>An <it>in silico</it> system for the selection of putative drug targets and lead compounds is presented, together with an example study on the bifunctional DHFR-TS from <it>Plasmodium falciparum</it>.</p> <p>Conclusion</p> <p>The Discovery system allows for the identification of putative drug targets and lead compounds in Plasmodium species based on the filtering of protein and chemical properties.</p

    SwissBioisostere: a database of molecular replacements for ligand design

    Get PDF
    The SwissBioisostere database (http://www.swissbioisostere.ch) contains information on molecular replacements and their performance in biochemical assays. It is meant to provide researchers in drug discovery projects with ideas for bioisosteric modifications of their current lead molecule, as well as to give interested scientists access to the details on particular molecular replacements. As of August 2012, the database contains 21 293 355 datapoints corresponding to 5 586 462 unique replacements that have been measured in 35 039 assays against 1948 molecular targets representing 30 target classes. The accessible data were created through detection of matched molecular pairs and mining bioactivity data in the ChEMBL database. The SwissBioisostere database is hosted by the Swiss Institute of Bioinformatics and available via a web-based interfac
    corecore