3,099 research outputs found

    WormBase 2012: more genomes, more data, new website

    Get PDF
    Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community

    Effectively incorporating selected multimedia content into medical publications

    Get PDF
    Until fairly recently, medical publications have been handicapped by being restricted to non-electronic formats, effectively preventing the dissemination of complex audiovisual and three-dimensional data. However, authors and readers could significantly profit from advances in electronic publishing that permit the inclusion of multimedia content directly into an article. For the first time, the de facto gold standard for scientific publishing, the portable document format (PDF), is used here as a platform to embed a video and an audio sequence of patient data into a publication. Fully interactive three-dimensional models of a face and a schematic representation of a human brain are also part of this publication. We discuss the potential of this approach and its impact on the communication of scientific medical data, particularly with regard to electronic and open access publications. Finally, we emphasise how medical teaching can benefit from this new tool and comment on the future of medical publishing

    Application of the Markov Chain Method in a Health Portal Recommendation System

    Get PDF
    This study produced a recommendation system that can effectively recommend items on a health portal. Toward this aim, a transaction log that records usersā€™ traversal activities on the Medical College of Wisconsinā€™s HealthLink, a health portal with a subject directory, was utilized and investigated. This study proposed a mixed-method that included the transaction log analysis method, the Markov chain analysis method, and the inferential analysis method. The transaction log analysis method was applied to extract usersā€™ traversal activities from the log. The Markov chain analysis method was adopted to model usersā€™ traversal activities and then generate recommendation lists for topics, articles, and Q&A items on the health portal. The inferential analysis method was applied to test whether there are any correlations between recommendation lists generated by the proposed recommendation system and recommendation lists ranked by experts. The topics selected for this study are Infections, the Heart, and Cancer. These three topics were the three most viewed topics in the portal. The findings of this study revealed the consistency between the recommendation lists generated from the proposed system and the lists ranked by experts. At the topic level, two topic recommendation lists generated from the proposed system were consistent with the lists ranked by experts, while one topic recommendation list was highly consistent with the list ranked by experts. At the article level, one article recommendation list generated from the proposed system was consistent with the list ranked by experts, while 14 article recommendation lists were highly consistent with the lists ranked by experts. At the Q&A item level, three Q&A item recommendation lists generated from the proposed system were consistent with the lists ranked by experts, while 12 Q&A item recommendation lists were highly consistent with the lists ranked by experts. The findings demonstrated the significance of usersā€™ traversal data extracted from the transaction log. The methodology applied in this study proposed a systematic approach to generating the recommendation systems for other similar portals. The outcomes of this study can facilitate usersā€™ navigation, and provide a new method for building a recommendation system that recommends items at three levels: the topic level, the article level, and the Q&A item level

    Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

    Full text link
    Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field

    Temporal Information Processing: A Survey

    Get PDF
    Temporal Information Processing is a subfield of Natural Language Processing, valuable in many tasks like Question Answering and Summarization. Temporal Information Processing is broadened, ranging from classical theories of time and language to current computational approaches for Temporal Information Extraction. This later trend consists on the automatic extraction of events and temporal expressions. Such issues have attracted great attention especially with the development of annotated corpora and annotations schemes mainly TimeBank and TimeML. In this paper, we give a survey of Temporal Information Extraction from Natural Language texts

    KERT: Automatic Extraction and Ranking of Topical Keyphrases from Content-Representative Document Titles

    Full text link
    We introduce KERT (Keyphrase Extraction and Ranking by Topic), a framework for topical keyphrase generation and ranking. By shifting from the unigram-centric traditional methods of unsupervised keyphrase extraction to a phrase-centric approach, we are able to directly compare and rank phrases of different lengths. We construct a topical keyphrase ranking function which implements the four criteria that represent high quality topical keyphrases (coverage, purity, phraseness, and completeness). The effectiveness of our approach is demonstrated on two collections of content-representative titles in the domains of Computer Science and Physics.Comment: 9 page

    Subject-relevant Document Recommendation: A Reference Topic-Based Approach

    Get PDF
    Knowledge-intensive workers, such as academic researchers, medical professionals or patent engineers, have a demanding need of searching information relevant to their work. Content-based recommender system (CBRS) makes recommendation by analyzing similarity of textual contents between documents and usersā€™ preferences. Although content-based filtering has been one of the promising approaches to document recommendations, it encounters the over-specialization problem. CBRS tends to recommend documents that are similar to what have been in userā€™s preference profile. Rationally, citations in an article represent the intellectual/affective balance of the individual interpretation in time and domain understanding. A cited article shall be associated with and may reflect the subject domain of its citing articles. Our study addresses the over-specialization problem to support the information needs of researchers. We propose a Reference Topic-based Document Recommendation (RTDR) technique, which exploits the citation information of a focal userā€™s preferred documents and thereby recommends documents that are relevant to the subject domain of his or her preference. Our primary evaluation results suggest the outperformance of the proposed RTDR to the benchmarks

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System
    • ā€¦
    corecore