110 research outputs found

    Text segmentation techniques: A critical review

    Get PDF
    Text segmentation is widely used for processing text. It is a method of splitting a document into smaller parts, which is usually called segments. Each segment has its relevant meaning. Those segments categorized as word, sentence, topic, phrase or any information unit depending on the task of the text analysis. This study presents various reasons of usage of text segmentation for different analyzing approaches. We categorized the types of documents and languages used. The main contribution of this study includes a summarization of 50 research papers and an illustration of past decade (January 2007- January 2017)’s of research that applied text segmentation as their main approach for analysing text. Results revealed the popularity of using text segmentation in different languages. Besides that, the “word” seems to be the most practical and usable segment, as it is the smaller unit than the phrase, sentence or line

    Text segmentation for analysing different languages

    Get PDF
    Over the past several years, researchers have applied different methods of text segmentation. Text segmentation is defined as a method of splitting a document into smaller segments, assuming with its own relevant meaning. Those segments can be classified into the tag, word, sentence, topic, phrase and any information unit. Firstly, this study reviews the different types of text segmentation methods used in different types of documentation, and later discusses the various reasons for utilizing it in opinion mining. The main contribution of this study includes a summarisation of research papers from the past 10 years that applied text segmentation as their main approach in text analysing. Results show that word segmentation was successfully and widely used for processing different languages

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

    Shape recognition through multi-level fusion of features and classifiers

    Get PDF
    Shape recognition is a fundamental problem and a special type of image classification, where each shape is considered as a class. Current approaches to shape recognition mainly focus on designing low-level shape descriptors, and classify them using some machine learning approaches. In order to achieve effective learning of shape features, it is essential to ensure that a comprehensive set of high quality features can be extracted from the original shape data. Thus we have been motivated to develop methods of fusion of features and classifiers for advancing the classification performance. In this paper, we propose a multi-level framework for fusion of features and classifiers in the setting of gran-ular computing. The proposed framework involves creation of diversity among classifiers, through adopting feature selection and fusion to create diverse feature sets and to train diverse classifiers using different learn-Xinming Wang algorithms. The experimental results show that the proposed multi-level framework can effectively create diversity among classifiers leading to considerable advances in the classification performance

    Data Mining

    Get PDF
    The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

    Short Text Classification with Tolerance Near Sets

    Get PDF
    Text classification is a classical machine learning application in Natural Language Processing, which aims to assign labels to textual units such as documents, sentences, paragraphs, and queries. Applications of text classification include sentiment classification and news categorization. Sentiment classification identifies the polarity of text such as positive, negative or neutral based on textual features. In this thesis, we implemented a modified form of a tolerance-based algorithm (TSC) to classify sentiment polarities of tweets as well as news categories from text. The TSC algorithm is a supervised algorithm that was designed to perform short text classification with tolerance near sets (TNS). The proposed TSC algorithm uses pre-trained SBERT algorithm vectors for creating tolerance classes. The effectiveness of the TSC algorithm has been demonstrated by testing it on ten well-researched data sets. One of the datasets (Covid-Sentiment) was hand-crafted with tweets from Twitter of opinions related to COVID. Experiments demonstrate that TSC outperforms five classical ML algorithms with one dataset, and is comparable with all other datasets using a weighted F1-score measure.Master of Science in Applied Computer Scienc

    Granular Fuzzy Regression Domain Adaptation in Takagi-Sugeno Fuzzy Models

    Full text link
    © 1993-2012 IEEE. In classical data-driven machine learning methods, massive amounts of labeled data are required to build a high-performance prediction model. However, the amount of labeled data in many real-world applications is insufficient, so establishing a prediction model is impossible. Transfer learning has recently emerged as a solution to this problem. It exploits the knowledge accumulated in auxiliary domains to help construct prediction models in a target domain with inadequate training data. Most existing transfer learning methods solve classification tasks; only a few are devoted to regression problems. In addition, the current methods ignore the inherent phenomenon of information granularity in transfer learning. In this study, granular computing techniques are applied to transfer learning. Three granular fuzzy regression domain adaptation methods to determine the estimated values for a regression target are proposed to address three challenging cases in domain adaptation. The proposed granular fuzzy regression domain adaptation methods change the input and/or output space of the source domain's model using space transformation, so that the fuzzy rules are more compatible with the target data. Experiments on synthetic and real-world datasets validate the effectiveness of the proposed methods

    Emergent quality issues in the supply of Chinese medicinal plants: A mixed methods investigation of their contemporary occurrence and historical persistence

    Get PDF
    Quality issues that emerged centuries ago in Chinese medicinal plants (CMP) were investigated to explore why they still persist in an era of advanced analytical testing and extensive legislation so that a solution to improve CMP quality could be proposed. This is important for 85% of the world’s population who rely on medicinal plants (MP) for primary healthcare considering the adverse events, including fatalities that arise from such quality issues. CMP are the most prevalent medicinal plants globally. This investigation used mixed-methods, including 15 interviews with CMP expert key informants (KI), together with thematic analysis that identified the main CMP quality issues, why they persisted, and informed solutions. An unexplained case example, Eleutherococcus nodiflorus (EN), was analysed by collection of 106 samples of EN, its known toxic adulterant Periploca sepium (PS), and a related substitute, Eleutherococcus senticosus (ES), across mainland China, Taiwan and the UK. Authenticity of the samples was determined using High-performance thinlayer chromatography. Misidentification, adulteration, substitution and toxicity were the main CMP quality issues identified. Adulteration was found widespread globally with 57.4% EN found authentic, and 24.6% adulterated with cardiotoxic PS, mostly at markets and traditional pharmacies. The EN study further highlighted that the reason CMP quality issues persisted was due to the laboratory-bound nature of analytical methods and testing currently used that leave gaps in detection throughout much of the supply chain. CMP quality could be more effectively tested with patented analytical technology (PAT) and simpler field-based testing including indicator strip tests. Education highlighting the long-term economic value and communal benefit of delivering better quality CMP to consumers was recommended in favour of the financial motivation for actions that lead to the persistence of well-known and recurrent CMP quality issues

    Development of Mining Sector Applications for Emerging Remote Sensing and Deep Learning Technologies

    Get PDF
    This thesis uses neural networks and deep learning to address practical, real-world problems in the mining sector. The main focus is on developing novel applications in the area of object detection from remotely sensed data. This area has many potential mining applications and is an important part of moving towards data driven strategic decision making across the mining sector. The scientific contributions of this research are twofold; firstly, each of the three case studies demonstrate new applications which couple remote sensing and neural network based technologies for improved data driven decision making. Secondly, the thesis presents a framework to guide implementation of these technologies in the mining sector, providing a guide for researchers and professionals undertaking further studies of this type. The first case study builds a fully connected neural network method to locate supporting rock bolts from 3D laser scan data. This method combines input features from the remote sensing and mobile robotics research communities, generating accuracy scores up to 22% higher than those found using either feature set in isolation. The neural network approach also is compared to the widely used random forest classifier and is shown to outperform this classifier on the test datasets. Additionally, the algorithms’ performance is enhanced by adding a confusion class to the training data and by grouping the output predictions using density based spatial clustering. The method is tested on two datasets, gathered using different laser scanners, in different types of underground mines which have different rock bolting patterns. In both cases the method is found to be highly capable of detecting the rock bolts with recall scores of 0.87-0.96. The second case study investigates modern deep learning for LiDAR data. Here, multiple transfer learning strategies and LiDAR data representations are examined for the task of identifying historic mining remains. A transfer learning approach based on a Lunar crater detection model is used, due to the task similarities between both the underlying data structures and the geometries of the objects to be detected. The relationship between dataset resolution and detection accuracy is also examined, with the results showing that the approach is capable of detecting pits and shafts to a high degree of accuracy with precision and recall scores between 0.80-0.92, provided the input data is of sufficient quality and resolution. Alongside resolution, different LiDAR data representations are explored, showing that the precision-recall balance varies depending on the input LiDAR data representation. The third case study creates a deep convolutional neural network model to detect artisanal scale mining from multispectral satellite data. This model is trained from initialisation without transfer learning and demonstrates that accurate multispectral models can be built from a smaller training dataset when appropriate design and data augmentation strategies are adopted. Alongside the deep learning model, novel mosaicing algorithms are developed both to improve cloud cover penetration and to decrease noise in the final prediction maps. When applied to the study area, the results from this model provide valuable information about the expansion, migration and forest encroachment of artisanal scale mining in southwestern Ghana over the last four years. Finally, this thesis presents an implementation framework for these neural network based object detection models, to generalise the findings from this research to new mining sector deep learning tasks. This framework can be used to identify applications which would benefit from neural network approaches; to build the models; and to apply these algorithms in a real world environment. The case study chapters confirm that the neural network models are capable of interpreting remotely sensed data to a high degree of accuracy on real world mining problems, while the framework guides the development of new models to solve a wide range of related challenges
    • 

    corecore