Search CORE

234,773 research outputs found

Scalable Text Mining with Sparse Generative Models

Author: Puurula Antti
Publication venue: 'University of Waikato'
Publication date: 22/06/2015
Field of study

The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text mining tasks. Sparse computation using inverted indices is proposed for inference on probabilistic models. This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines. The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable. Extensive experimentation on text classification and ranked retrieval datasets are conducted, showing that the proposed solution matches or outperforms the leading task-specific methods in effectiveness, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes. The developed methods were further applied in two 2014 Kaggle data mining prize competitions with over a hundred competing teams, earning first and second places

arXiv.org e-Print Archive

Research Commons@Waikato

Supply chain risk management : systematic literature review and a conceptual framework for capturing interdependencies between risks

Author: Dickson Alex
Qazi Abroon
Quigley John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/04/2015
Field of study

The purpose of this research is to conduct a comprehensive and systematic review of the literature in the field of 'Supply Chain Risk Management' and identify important research gaps for potential research. Furthermore, a conceptual risk management framework is also proposed that encompasses holistic view of the field. 'Systematic Literature Review' method is used to examine quality articles published over a time period of almost 15 years (2000 - June, 2014). The findings of the study are validated through text mining software. Systematic literature review has identified the progress of research based on various descriptive and thematic typologies. The review and text mining analysis have also provided an insight into major research gaps. Based on the identified gaps, a framework is developed that can help researchers model interdependencies between risk factors

University of Strathclyde Institutional Repository

Measuring impact of academic research in computer and information science on society

Author: Abbott A.
Bar-Ilan J.
Haustein S.
Oppenheim C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/01/2020
Field of study

Academic research in computer & information science (CIS) has contributed immensely to all aspects of society. As academic research today is substantially supported by various government sources, recent political changes have created ambivalence amongst academics about the future of research funding. With uncertainty looming, it is important to develop a framework to extract and measure the information relating to impact of CIS research on society to justify public funding, and demonstrate the actual contribution and impact of CIS research outside academia. A new method combining discourse analysis and text mining of a collection of over 1000 pages of impact case study documents written in free-text format for the Research Excellence Framework (REF) 2014 was developed in order to identify the most commonly used categories or headings for reporting impact of CIS research by UK Universities (UKU). According to the research reported in REF2014, UKU acquired 83 patents in various areas of CIS, created 64 spin-offs, generated £857.5 million in different financial forms, created substantial employment, reached over 6 billion users worldwide and has helped save over £1 billion Pounds due to improved processes etc. to various sectors internationally, between 2008 and 2013

Crossref

University of Strathclyde Institutional Repository

Sheffield Hallam University Research Archive

White Rose Research Online

EXACT2: the semantics of biomedical protocols

Author: A Maccagnan
A Pease
A Sackmann
A Sujathaa
Brian B Rudkin
CJ Mungall
Daniel Nadis
Doi
Emma Haddi
Grunwald
H Obokata
I Mura
J Taubert
K Wolstencroft
Larisa N Soldatova
LN Soldatova
LN Soldatova
LN Soldatova
M Courtot
M Hilario
M Schilling
Nigel J Saunders
Piyali S Basu
R Garside
RD King
Ross D King
RR Brinkman
S Mitchell
S Rune
S Shapin
T Bittner
T Klingström
Th Paul
V Rätzel
Véronique Baumlé
W Ceusters
Wolfgang Marwan
Z Xiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

© 2014 Soldatova et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.This article has been made available through the Brunel Open Access Publishing Fund.Background: The reliability and reproducibility of experimental procedures is a cornerstone of scientific practice. There is a pressing technological need for the better representation of biomedical protocols to enable other agents (human or machine) to better reproduce results. A framework that ensures that all information required for the replication of experimental protocols is essential to achieve reproducibility. Methods: We have developed the ontology EXACT2 (EXperimental ACTions) that is designed to capture the full semantics of biomedical protocols required for their reproducibility. To construct EXACT2 we manually inspected hundreds of published and commercial biomedical protocols from several areas of biomedicine. After establishing a clear pattern for extracting the required information we utilized text-mining tools to translate the protocols into a machine amenable format. We have verified the utility of EXACT2 through the successful processing of previously ‘unseen’ (not used for the construction of EXACT2) protocols. Results: The paper reports on a fundamentally new version EXACT2 that supports the semantically-defined representation of biomedical protocols. The ability of EXACT2 to capture the semantics of biomedical procedures was verified through a text mining use case. In this EXACT2 is used as a reference model for text mining tools to identify terms pertinent to experimental actions, and their properties, in biomedical protocols expressed in natural language. An EXACT2-based framework for the translation of biomedical protocols to a machine amenable format is proposed. Conclusions: The EXACT2 ontology is sufficient to record, in a machine processable form, the essential information about biomedical protocols. EXACT2 defines explicit semantics of experimental actions, and can be used by various computer applications. It can serve as a reference model for for the translation of biomedical protocols in natural language into a semantically-defined format.This work has been partially funded by the Brunel University BRIEF award and a grant from Occams Resources

Goldsmiths Research Online

Crossref

Springer - Publisher Connector

PubMed Central

Brunel University Research Archive

Exploring Text Mining and Analytics for Applications in Public Security: An in-depth dive into a systematic literature review

Author: Carvalho Victor Diogho Heuer de
Costa Ana Paula Cabral Seixas
Publication venue: SciELO Preprints
Publication date: 19/01/2023
Field of study

Text mining and related analytics emerge as a technological approach to support human activities in extracting useful knowledge through texts in several formats. From a managerial point of view, it can help organizations in planning and decision-making processes, providing information that was not previously evident through textual materials produced internally or even externally. In this context, within the public/governmental scope, public security agencies are great beneficiaries of the tools associated with text mining, in several aspects, from applications in the criminal area to the collection of people's opinions and sentiments about the actions taken to promote their welfare. This article reports details of a systematic literature review focused on identifying the main areas of text mining application in public security, the most recurrent technological tools, and future research directions. The searches covered four major article bases (Scopus, Web of Science, IEEE Xplore, and ACM Digital Library), selecting 194 materials published between 2014 and the first half of 2021, among journals, conferences, and book chapters. There were several findings concerning the targets of the literature review, as presented in the results of this article

SciELO Preprints

A Longitudinal Analysis of Job Skills for Entry-Level Data Analysts

Author: Dong Tianxi
Triche Jason
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/10/2020
Field of study

The explosive growth of the data analytics field has continued over the past decade with no signs of slowing down. Given the fast pace of technology changes and the need for IT professionals to constantly keep up with the field, it is important to analyze the job skills and knowledge required in the data analyst and business intelligence (BI) analyst job market. In this research, we examine over 9,000 job postings for entry-level data analytics jobs over five years (2014-2018). Using a text mining approach and a custom text mining dictionary, we identify a preliminary set of analytic competencies sought in practice. Further, the longitudinal data also demonstrates how these key skills have evolved over time. We find that the three biggest trends include proficiency with Python, Tableau, and R. We also find that an increasing number of jobs emphasize data visualization. Some skills, like Microsoft Access, SAP, and Cognos, declined in popularity over the time frame studied. Using the results of the study, universities can make informed curriculum decisions, and instructors can decide what skills to teach based on industry needs. Our custom text mining dictionary can be added to the growing literature and assist other researchers in this space

Trinity University

AIS Electronic Library (AISeL)

A patent time series processing component for technology intelligence by trend identification functionality

Author: Chen H
Lu J
Zhang G
Zhu D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

© 2014, Springer-Verlag London. Technology intelligence indicates the concept and applications that transform data hidden in patents or scientific literatures into technical insight for technology strategy-making support. The existing frameworks and applications of technology intelligence mainly focus on obtaining text-based knowledge with text mining components. However, what is the corresponding technological trend of the knowledge over time is seldom taken into consideration. In order to capture the hidden trend turning points and improve the framework of existing technology intelligence, this paper proposes a patent time series processing component with trend identification functionality. We use piecewise linear representation method to generate and quantify the trend of patent publication activities, then utilize the outcome to identify trend turning points and provide trend tags to the existing text mining component, thus making it possible to combine the text-based and time-based knowledge together to support technology strategy making more satisfactorily. A case study using Australia patents (year 1983–2012) in Information and Communications Technology industry is presented to demonstrate the feasibility of the component when dealing with real-world tasks. The result shows that the new component identifies the trend reasonably well, at the same time learns valuable trend turning points in historical patent time series

OPUS - University of Technology Sydney

The evolution of Latino threat narrative from 1997 to 2014

Author: Wei Kai
Publication venue: 'iSchools'
Publication date
Field of study

This study presents preliminary findings of a project focusing on the evolution of Latino threat narrative, a social process of portraying Latinos with derogatory terms. A total of 440,984 newspapers articles about Latinos across 13 news outlets from 1997 to 2014 were analyzed using text mining. The results of this study demonstrate the potential association between LTN in print news media and significant political and social events, including: September 11, 2001 terror event; passage of restrictive immigration legislation in 2001, 2002, 2005, and 2006; and mass protests against immigration reform in 2006. The study also reveals greater intensity in the use of LTN-related words during the (Republican) Bush administration than the immediately preceding and following (Democratic) administrations. This is the first work that uses text mining techniques to explore Latino threat narrative at a large scale over a long period of time

Illinois Digital Environment for Access to Learning and Scholarship Repository