10 research outputs found
Agreeing to disagree: reconciling conflicting taxonomic views using a logic-based approach
Taxonomy alignment is a way to integrate two or more taxonomies. Semantic interoperability between datasets, information systems, and knowledge bases is facilitated by combining the different input taxonomies into merged taxonomies that reconcile apparent differences or conflicts. We show how alignment problems can be solved with a logic-based region connection calculus (RCC-5) approach, using five base relations to compare concepts: congruence, inclusion, inverse inclusion, overlap, and disjointness. To illustrate this method, we use different “geo-taxonomies”, which organize the United States into several, apparently conflicting, geospatial hierarchies. For example, we align T(CEN), a taxonomy derived from the Census Bureau’s regions map, with T(NDC), from the National Diversity Council (NDC), and with T(TZ), a taxonomy capturing the U.S. time zones. Using these case studies, we show how this logic-based approach can reconcile conflicts between taxonomies. We have implemented these case studies with an open source tool called Euler/X which has been applied primarily for solving complex alignment problems in biological classification. In this paper, we demonstrate the feasibility and broad applicability of this approach to other domains and alignment problems in support of semantic interoperability.DEB- 1155984DBI-1342595DBI-1643002Ope
Context determines content: an approach to resource recommendation in folksonomies
By means of tagging in social bookmarking applications, so called folksonomies emerge collaboratively. Folksonomies have shown to contain information that is beneficial for resource recommendation. However, as folksonomies are not designed to support recommendation tasks, there are drawbacks of the various recommendation techniques. Graph-based recommendation in folksonomies for example suffers from the problem of concept drift. Vector space based recommendation approaches in folksonomies suffer from sparseness of available data. In this paper, we propose the flexible framework VSScore which incorporates context-specific information into the recommendation process to tackle these issues. Additionally, as an alternative to the evaluation methodology LeavePostOut we propose an adaptation LeaveRTOut for resource recommendation in folksonomies. In a subset of resource recommendation tasks evaluated, the proposed recommendation framework VSScore performs significantly more effective than the baseline algorithm FolkRank
Resolving "orphaned" non-specific structures using machine learning and natural language processing methods
Scholarly publications of biodiversity literature contain a vast amount of information in human readable format. The detailed morphological descriptions in these publications contain rich information that can be extracted to facilitate analysis and computational biology research. However, the idiosyncrasies of morphological descriptions still pose a number of challenges to machines. In this work, we demonstrate the use of two different approaches to resolve meronym (i.e. part-of) relations between anatomical parts and their anchor organs, including a syntactic rule-based approach and a SVM-based (support vector machine) method. Both methods made use of domain ontologies. We compared the two approaches with two other baseline methods and the evaluation results show the syntactic methods (92.1% F1 score) outperformed the SVM methods (80.7% F1 score) and the part-of ontologies were valuable knowledge sources for the task. It is notable that the mistakes made by the two approaches rarely overlapped. Additional tests will be conducted on the development version of the Explorer of Taxon Concepts toolkit before we make the functionality publicly available. Meanwhile, we will further investigate and leverage the complementary nature of the two approaches to further drive down the error rate, as in practical application, even a 1% error rate could lead to hundreds of errors.National Science Foundation [NSF DBI-1147266]OPEN ACCESSThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Bringing a Semantic MediaWiki Flora to Life
The existing web representation of the Flora of North America (FNA) project needs improvement. Despite being electronically available, it has little more functionality than its printed counterpart. Over the past few years, our team has been working diligently to build a new more effective online presence for the FNA. The main objective is to capitalize on modern Natural Language Processing (NLP) tools built for biodiversity data (Explorer of Taxon Concepts or ETC; Cui et al. 2016), and present the FNA online in both machine and human readable formats. With machine-comprehensible data, the mobilization and usability of flora treatments is enhanced and capabilities for data linkage to a Biodiversity Knowledge Graph (Page 2016) are enabled. For example, usability of treatments increases when morphological statements are parsed into finely grained pieces of data using ETC, because these data can be easily traversed across taxonomic groups to reveal trends. Additionally, the development of new features in our online FNA is facilitated by FNA data parsing and processing in ETC, including a feature to enable users to explore all treatments and illustrations generated by an author of interest. The current status of the ongoing project to develop a Semantic MediaWiki (SMW) platform for the FNA is presented here. New features recently implemented are introduced, challenges in assembling the Semantic MediaWiki are discussed, and future opportunities, which include the integration of additional floras and data sources, are explored. Furthermore, implications of standardization of taxonomic treatments, which work such as this entails, will be discussed
Recommended from our members
Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building.
BackgroundTaxonomic descriptions are traditionally composed in natural language and published in a format that cannot be directly used by computers. The Exploring Taxon Concepts (ETC) project has been developing a set of web-based software tools that convert morphological descriptions published in telegraphic style to character data that can be reused and repurposed. This paper introduces the first semi-automated pipeline, to our knowledge, that converts morphological descriptions into taxon-character matrices to support systematics and evolutionary biology research. We then demonstrate and evaluate the use of the ETC Input Creation - Text Capture - Matrix Generation pipeline to generate body part measurement matrices from a set of 188 spider morphological descriptions and report the findings.ResultsFrom the given set of spider taxonomic publications, two versions of input (original and normalized) were generated and used by the ETC Text Capture and ETC Matrix Generation tools. The tools produced two corresponding spider body part measurement matrices, and the matrix from the normalized input was found to be much more similar to a gold standard matrix hand-curated by the scientist co-authors. Special conventions utilized in the original descriptions (e.g., the omission of measurement units) were attributed to the lower performance of using the original input. The results show that simple normalization of the description text greatly increased the quality of the machine-generated matrix and reduced edit effort. The machine-generated matrix also helped identify issues in the gold standard matrix.ConclusionsETC Text Capture and ETC Matrix Generation are low-barrier and effective tools for extracting measurement values from spider taxonomic descriptions and are more effective when the descriptions are self-contained. Special conventions that make the description text less self-contained challenge automated extraction of data from biodiversity descriptions and hinder the automated reuse of the published knowledge. The tools will be updated to support new requirements revealed in this case study