962 research outputs found
Text mining techniques for patent analysis.
Abstract Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a series of text mining techniques that conforms to the analytical process used by patent analysts. These techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping. The issues of efficiency and effectiveness are considered in the design of these techniques. Some important features of the proposed methodology include a rigorous approach to verify the usefulness of segment extracts as the document surrogates, a corpus-and dictionary-free algorithm for keyphrase extraction, an efficient co-word analysis method that can be applied to large volume of patents, and an automatic procedure to create generic cluster titles for ease of result interpretation. Evaluation of these techniques was conducted. The results confirm that the machine-generated summaries do preserve more important content words than some other sections for classification. To demonstrate the feasibility, the proposed methodology was applied to a realworld patent set for domain analysis and mapping, which shows that our approach is more effective than existing classification systems. The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches
Towards Personalized and Human-in-the-Loop Document Summarization
The ubiquitous availability of computing devices and the widespread use of
the internet have generated a large amount of data continuously. Therefore, the
amount of available information on any given topic is far beyond humans'
processing capacity to properly process, causing what is known as information
overload. To efficiently cope with large amounts of information and generate
content with significant value to users, we require identifying, merging and
summarising information. Data summaries can help gather related information and
collect it into a shorter format that enables answering complicated questions,
gaining new insight and discovering conceptual boundaries.
This thesis focuses on three main challenges to alleviate information
overload using novel summarisation techniques. It further intends to facilitate
the analysis of documents to support personalised information extraction. This
thesis separates the research issues into four areas, covering (i) feature
engineering in document summarisation, (ii) traditional static and inflexible
summaries, (iii) traditional generic summarisation approaches, and (iv) the
need for reference summaries. We propose novel approaches to tackle these
challenges, by: i)enabling automatic intelligent feature engineering, ii)
enabling flexible and interactive summarisation, iii) utilising intelligent and
personalised summarisation approaches. The experimental results prove the
efficiency of the proposed approaches compared to other state-of-the-art
models. We further propose solutions to the information overload problem in
different domains through summarisation, covering network traffic data, health
data and business process data.Comment: PhD thesi
TOPIC MODELLING METHODOLOGY: ITS USE IN INFORMATION SYSTEMS AND OTHER MANAGERIAL DISCIPLINES
Over the last decade, quantitative text mining approaches to content analysis have gained increasing traction within information systems research, and related fields, such as business administration. Recently, topic models, which are supposed to provide their user with an overview of themes being dis-cussed in documents, have gained popularity. However, while convenient tools for the creation of this model class exist, the evaluation of topic models poses significant challenges to their users. In this research, we investigate how questions of model validity and trustworthiness of presented analyses are addressed across disciplines. We accomplish this by providing a structured review of methodological approaches across the Financial Times 50 journal ranking. We identify 59 methodological research papers, 24 implementations of topic models, as well as 33 research papers using topic models in In-formation Systems (IS) research, and 29 papers using such models in other managerial disciplines. Results indicate a need for model implementations usable by a wider audience, as well as the need for more implementations of model validation techniques, and the need for a discussion about the theoretical foundations of topic modelling based research
The 10 Research Topics in the Internet of Things
Since the term first coined in 1999 by Kevin Ashton, the Internet of Things
(IoT) has gained significant momentum as a technology to connect physical
objects to the Internet and to facilitate machine-to-human and
machine-to-machine communications. Over the past two decades, IoT has been an
active area of research and development endeavours by many technical and
commercial communities. Yet, IoT technology is still not mature and many issues
need to be addressed. In this paper, we identify 10 key research topics and
discuss the research problems and opportunities within these topics.Comment: 10 pages. IEEE CIC 2020 vision pape
COSPO/CENDI Industry Day Conference
The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless
An Integrated Framework for Patent Analysis and Mining
Patent documents are important intellectual resources of protecting interests of individuals, organizations and companies. These patent documents have great research values, beneficial to the industry, business, law, and policy-making communities. Patent mining aims at assisting patent analysts in investigating, processing, and analyzing patent documents, which has attracted increasing interest in academia and industry. However, despite recent advances in patent mining, several critical issues in current patent mining systems have not been well explored in previous studies.
These issues include: 1) the query retrieval problem that assists patent analysts finding all relevant patent documents for a given patent application; 2) the patent documents comparative summarization problem that facilitates patent analysts in quickly reviewing any given patent documents pairs; and 3) the key patent documents discovery problem that helps patent analysts to quickly grasp the linkage between different technologies in order to better understand the technical trend from a collection of patent documents.
This dissertation follows the stream of research that covers the aforementioned issues of existing patent analysis and mining systems. In this work, we delve into three interleaved aspects of patent mining techniques, including (1) PatSearch, a framework of automatically generating the search query from a given patent application and retrieving relevant patents to user; (2) PatCom, a framework for investigating the relationship in terms of commonality and difference between patent documents pairs, and (3) PatDom, a framework for integrating multiple types of patent information to identify important patents from a large volume of patent documents.
In summary, the increasing amount and textual complexity of patent repository lead to a series of challenges that are not well addressed in the current generation systems. My work proposed reasonable solutions to these challenges and provided insights on how to address these challenges using a simple yet effective integrated patent mining framework
- …