The overwhelming amount of available scholarly literature in the life sciences poses significant challenges to scientists wishing to keep up with important developments related to their research, but also provides a useful resource for the discovery of recent information concerning genes, diseases, compounds and the interactions between them. In this paper, we describe an algorithm called Bio-LDA that uses extracted biological terminology to automatically identify latent topics, and provides a variety of measures to uncover putative relations among topics and bio-terms. Relationships identified using those approaches are combined with existing data in life science datasets to provide additional insight. Three case studies demonstrate the utility of the Bio-LDA model, including association predication, association search and connectivity map generation. This combined approach offers new opportunities for knowledge discovery in many areas of biology including target identification, lead hopping and drug repurposing.Comment: 14 pages, 8 figures, 10 table
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.