Search CORE

88,106 research outputs found

Advances in Processing, Mining, and Learning Complex Data: From Foundations to Real-World Applications

Author: He Wu
Li Gang
Pan Shirui
Wu Jia
Zhang Chengqi
Zhou Chuan
Publication venue: ODU Digital Commons
Publication date: 01/01/2018
Field of study

Processing, mining, and learning complex data refer to an advanced study area of data mining and knowledge discovery concerning the development and analysis of approaches for discovering patterns and learning models from data with a complex structure (e.g., multirelational data, XML data, text data, image data, time series, sequences, graphs, streaming data, and trees) [1–5]. These kinds of data are commonly encountered in many social, economic, scientific, and engineering applications. Complex data pose new challenges for current research in data mining and knowledge discovery as they require new methods for processing, mining, and learning them. Traditional data analysis methods often require the data to be represented as vectors [6]. However, many data objects in real-world applications, such as chemical compounds in biopharmacy, brain regions in brain health data, users in business networks, and time-series information in medical data, contain rich structure information (e.g., relationships between data and temporal structures). Such a simple feature-vector representation inherently loses the structure information of the objects. In reality, objects may have complicated characteristics, depending on how the objects are assessed and characterized. Meanwhile, the data may come from heterogeneous domains [7], such as traditional tabular-based data, sequential patterns, graphs, time-series information, and semistructured data. Novel data analytics methods are desired to discover meaningful knowledge in advanced applications from data objects with complex characteristics. This special issue contributes to the fundamental research in processing, mining, and learning complex data, focusing on the analysis of complex data sources

Deakin Research Online

Directory of Open Access Journals

Old Dominion University

Advances in Processing, Mining, and Learning Complex Data: From Foundations to Real-World Applications

Author: He Wu
Li Gang
Pan Shirui
Wu Jia
Zhang Chengqi
Zhou Chuan
Publication venue: ODU Digital Commons
Publication date: 01/01/2018
Field of study

Old Dominion University

frances : cloud-based historical text mining with deep learning and parallel processing

Author: Askins Wilfrid
Charlton Ash
Filgueira Rosa
Terras Melissa
Yu Lilin
Publication venue: IEEE
Publication date: 25/09/2023
Field of study

frances is an advanced cloud-based text mining digital platform that leverages information extraction, knowledge graphs, natural language processing (NLP), deep learning, and parallel processing techniques. It has been specifically designed to unlock the full potential of historical digital textual collections, such as those from the National Library of Scotland, offering cloud-based capabilities and extended support for complex NLP analyses and data visualizations. frances enables realtime recurrent operational text mining and provides robust capabilities for temporal analysis, accompanied by automatic visualizations for easy result inspection. In this paper, we present the motivation behind the development of frances, emphasizing its innovative design and novel implementation aspects. We also outline future development directions. Additionally, we evaluate the platform through two comprehensive case studies in history and publishing history. Feedback from participants in these studies demonstrates that frances accelerates their work and facilitates rapid testing and dissemination of ideas.Postprin

University of St. Andrews - Pure

St Andrews Research Repository

Concept graphs: Applications to biomedical text categorization and concept extraction

Author: Bleik Said
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2013
Field of study

As science advances, the underlying literature grows rapidly providing valuable knowledge mines for researchers and practitioners. The text content that makes up these knowledge collections is often unstructured and, thus, extracting relevant or novel information could be nontrivial and costly. In addition, human knowledge and expertise are being transformed into structured digital information in the form of vocabulary databases and ontologies. These knowledge bases hold substantial hierarchical and semantic relationships of common domain concepts. Consequently, automating learning tasks could be reinforced with those knowledge bases through constructing human-like representations of knowledge. This allows developing algorithms that simulate the human reasoning tasks of content perception, concept identification, and classification. This study explores the representation of text documents using concept graphs that are constructed with the help of a domain ontology. In particular, the target data sets are collections of biomedical text documents, and the domain ontology is a collection of predefined biomedical concepts and relationships among them. The proposed representation preserves those relationships and allows using the structural features of graphs in text mining and learning algorithms. Those features emphasize the significance of the underlying relationship information that exists in the text content behind the interrelated topics and concepts of a text document. The experiments presented in this study include text categorization and concept extraction applied on biomedical data sets. The experimental results demonstrate how the relationships extracted from text and captured in graph structures can be used to improve the performance of the aforementioned applications. The discussed techniques can be used in creating and maintaining digital libraries through enhancing indexing, retrieval, and management of documents as well as in a broad range of domain-specific applications such as drug discovery, hypothesis generation, and the analysis of molecular structures in chemoinformatics

Digital Commons @ New Jersey Institute of Technology (NJIT)

A Review of Relational Machine Learning for Knowledge Graphs

Author: Gabrilovich Evgeniy
Murphy Kevin
Nickel Maximilian
Tresp Volker
Publication venue: Center for Brains, Minds and Machines (CBMM), arXiv
Publication date: 23/03/2015
Field of study

Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be “trained” on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two different kinds of statistical relational models, both of which can scale to massive datasets. The first is based on tensor factorization methods and related latent variable models. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. In particular, we discuss Google’s Knowledge Vault project.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216

arXiv.org e-Print Archive

DSpace@MIT

frances: a deep learning NLP and text mining web tool to unlock historical digital collections : a case study on the Encyclopaedia Britannica

Author: Filgueira Rosa
Publication venue: IEEE
Publication date: 06/10/2022
Field of study

Funding: This work was supported by the NLS Digital Fellowship and by the Google Cloud Platform research credit program.This work presents frances, an integrated text mining tool that combines information extraction, knowledge graphs, NLP, deep learning, parallel processing and Semantic Web techniques to unlock the full value of historical digital textual collections, offering new capabilities for researchers to use powerful analysis methods without being distracted by the technology and middleware details. To demonstrate these capabilities, we use the first eight editions of the Encyclopaedia Britannica offered by the National Library of Scotland (NLS) as an example digital collection to mine and analyse. We have developed novel parallel heuristics to extract terms from the original collection (alongside metadata), which provides a mix of unstructured and semi-structured input data, and populated a new knowledge graph with this information. Our Natural Language Processing models enable frances to perform advanced analyses that go significantly beyond simple search using the information stored in the knowledge graph. Furthermore, frances also allows for creating and running complex text mining analyses at scale. Our results show that the novel computational techniques developed within frances provide a vehicle for researchers to formalize and connect findings and insights derived from the analysis of large-scale digital corpora such as the Encyclopaedia Britannica.Postprin

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

University of St. Andrews - Pure

St Andrews Research Repository

Recommended from our members

Mining Scholarly Data for Fine-Grained Knowledge Graph Construction

Author: Buscaldi Davide
Dessì Danilo
Motta Enrico
Osborne Francesco
Reforgiato Recupero Diego
Publication venue
Publication date: 01/01/2019
Field of study

Knowledge graphs (KG) are large network of entities and relationships, tipically expressed as RDF triples, relevant to a specific domain or an organization. Scientific Knowledge Graphs (SKGs) focus on the scholarly domain and typically contain metadata describing research publications such as authors, venues, organizations, research topics, and citations. The next big challenge in this field regards the generation of SKGs that also contain a explicit representation of the knowledge presented in research publications. In this paper, we present a preliminary approach that uses a set of NLP and Deep Learning methods for extracting entities and relationships from research publications and then integrates them in a KG. More specifically, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools, ii) describe an approach for integrating entities and relationships generated by these tools, iii) analyse an automatically generated Knowledge Graph including 10, 425 entities and 25, 655 relationships derived from 12, 007 publications in the field of Semantic Web, and iv) discuss how Deep Learning methods can be applied to overcome some limitations of the current techniques

Open Research Online (The Open University)

Archivio istituzionale della ricerca - Università di Cagliari