344,551 research outputs found

    Self-Organized Ordering of Terms and Documents in NSF Awards Data

    Get PDF
    We present the results of an analysis of a text corpus of 129,000 abstracts of NSF-sponsored basic research projects between years 1990 and 2003. The methods used in the analysis include term extraction based on a reference corpus and an entropy measure, and the Self-Organizing Map algorithm for the formation of a term map and a document map. Methodologically, the basic approach is based on earlier developments, such as word category maps and the WEBSOM method, but in the level of details, we report several new aspects and quantitative comparison results between methodological variants in this article. The data covers a quite large proportion of US-based scientific research during recent years. The analysis results indicate the basic patterns discernable in the data, both at the level of the awards and at the terminology used in them

    Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

    Get PDF
    Automatic extraction of patient demographics and psychiatric diagnoses from clinical notes allows for the collection of patient data on a large scale. This data could be used for a variety of research purposes including outcomes studies or developing clinical trials. However, current research has not yet discussed the automatic extraction of demographics and psychiatric diagnoses in detail. The aim of this study is to apply text mining to extract patient demographics - age, gender, marital status, education level, and admission diagnoses from the psychiatric assessments at a mental health hospital and also assign codes to each category. Gender is coded as either Male or Female, marital status is coded as either Single, Married, Divorced, or Widowed, and education level can be coded starting with Some High School through Graduate Degree (PhD/JD/MD etc. Level). Classifications for diagnoses are based on the DSM-IV. For each category, a rule-based approach was developed utilizing keyword-based regular expressions as well as constituency trees and typed dependencies. We employ a two-step approach that first maximizes recall through the development of keyword-based patterns and if necessary, maximizes precision by using NLP-based rules to handle the problem of ambiguity. To develop and evaluate our method, we annotated a corpus of 200 assessments, using a portion of the corpus for developing the method and the rest as a test set. F-score was satisfactory for each category (Age: 0.997; Gender: 0.989; Primary Diagnosis: 0.983; Marital Status: 0.875; Education Level: 0.851) as was coding accuracy (Age: 1.0; Gender: 0.989; Primary Diagnosis: 0.922; Marital Status: 0.889; Education Level: 0.778). These results indicate that a rule-based approach could be considered for extracting these types of information in the psychiatric field. At the same time, the results showed a drop in performance from the development set to the test set, which is partly due to the need for more generality in the rules developed

    Analysis of science process skills of chemical education students through Self-project Based Learning (SjBL) in the Covid-19 pandemic era

    Get PDF
    Research has been implemented related to the analysis of the students’ science process skills through project-based learning in a Covid-19 pandemic era. The prohibition of face-to-face lectures causes hindered experimental activities to be conducted on campus. This condition encourages project-based learning to carry out extraction experiments independently (Self-Project Based Learning, SjBL). The method used in this study was the Pre-Experimental Design One-Shot Case Study involving 94 4th semester chemistry education students. The instruments used in this study were the science process skills assessment sheet, the activity observation sheet, and the students’ response of the questionnaire. There were two highest skills showed by the students, namely, (1) determining tools and materials and (2) determining research variables indicators. The two skills were in excellent category. Skills to determine the work steps, and to make a data table including good categories. Quite good category was in the skills of making research objectives, making hypotheses, analyzing and drawing conclusions. Making a problem statement was in the bad category. In general, students’ science process skills were in the good category. The activity of implementing student projects received an excellent category and students gave a positive response to project implementation during a pandemic. The results of this study contribute to science learning in the future. Efforts are needed to train science process skills to prospective chemistry teachers so that teachers who have good science process skills are producedPeer Reviewe

    Integrasi Model Kesuksesan Adopsi E-Commerce Berbasis Technological Frames of References (TFR) – Knowledge Management (KM)

    Get PDF
    This study aims to examine several variables supporting the implementation of e-Commerce websites as digital platforms in supporting the business activities of SMEs. Several previous studies have found several variables/factors that support the implementation of e-Commerce with each perspective raised, so that in this study the variables/factors raised are the extraction of all factors from various perspectives and are expected to accommodate different points of view and previous research. Factor extraction is carried out using the Principal Component Analysis (PCA) method using the SPSS 22 Version tool to identify variables/factors from various and various needs, several factors that have components will be extracted and viewed in one category so as to minimize ambiguity factors as e-Commerce Enablers. In this study, data were collected through interviews and online dissemination to Clothing Line Business Actors and staff using digital platform users as many as 75 respondents and obtained some supporting data related to the use of digital platforms in the application of e-Commerce in daily business. Based on the extraction results obtained 5 main drivers that become e-Commerce Enabler based on several models of e-Commerce implementation based on Technological Frames of References (TFR) and Knowledge Management, namely: Organizational Trigger (OT), Environmental Trigger (ET), Individual Intention (II) , Knowledge Aspect and Capability (KAC), and Technology Infrastructure (IT) with each factor with components extracted so as to minimize ambiguity and still accommodate various needs and perspectives.Keywords : Specification Design, Information System, Integrated Document, Rational Unified Process, MASS Carg

    Using pixel-based and object-based methods to classify urban hyperspectral features

    Get PDF
    Object-based image analysis methods have been developed recently. They have since become a very active research topic in the remote sensing community. This is mainly because the researchers have begun to study the spatial structures within the data. In contrast, pixel-based methods only use the spectral content of data. To evaluate the applicability of object-based image analysis methods for land-cover information extraction from hyperspectral data, a comprehensive comparative analysis was performed. In this study, six supervised classification methods were selected from pixel-based category, including the maximum likelihood (ML), fisher linear likelihood (FLL), support vector machine (SVM), binary encoding (BE), spectral angle mapper (SAM) and spectral information divergence (SID). The classifiers were conducted on several features extracted from original spectral bands in order to avoid the problem of the Hughes phenomenon, and obtain a sufficient number of training samples. Three supervised and four unsupervised feature extraction methods were used. Pixel based classification was conducted in the first step of the proposed algorithm. The effective feature number (EFN) was then obtained. Image objects were thereafter created using the fractal net evolution approach (FNEA), the segmentation method implemented in eCognition software. Several experiments have been carried out to find the best segmentation parameters. The classification accuracy of these objects was compared with the accuracy of the pixel-based methods. In these experiments, the Pavia University Campus hyperspectral dataset was used. This dataset was collected by the ROSIS sensor over an urban area in Italy. The results reveal that when using any combination of feature extraction and classification methods, the performance of object-based methods was better than pixel-based ones. Furthermore the statistical analysis of results shows that on average, there is almost an 8 percent improvement in classification accuracy when we use the object-based methods

    Self-Organized Ordering of Terms and Documents in NSF Awards Data

    Get PDF
    We present the results of an analysis of a text corpus of 129,000 abstracts of NSF-sponsored basic research projects between years 1990 and 2003. The methods used in the analysis include term extraction based on a reference corpus and an entropy measure, and the Self-Organizing Map algorithm for the formation of a term map and a document map. Methodologically, the basic approach is based on earlier developments, such as word category maps and the WEBSOM method, but in the level of details, we report several new aspects and quantitative comparison results between methodological variants in this article. The data covers a quite large proportion of US-based scientific research during recent years. The analysis results indicate the basic patterns discernable in the data, both at the level of the awards and at the terminology used in them

    Biologically inspired feature extraction for rotation and scale tolerant pattern analysis

    Get PDF
    Biologically motivated information processing has been an important area of scientific research for decades. The central topic addressed in this dissertation is utilization of lateral inhibition and more generally, linear networks with recurrent connectivity along with complex-log conformal mapping in machine based implementations of information encoding, feature extraction and pattern recognition. The reasoning behind and method for spatially uniform implementation of inhibitory/excitatory network model in the framework of non-uniform log-polar transform is presented. For the space invariant connectivity model characterized by Topelitz-Block-Toeplitz matrix, the overall network response is obtained without matrix inverse operations providing the connection matrix generating function is bound by unity. It was shown that for the network with the inter-neuron connection function expandable in a Fourier series in polar angle, the overall network response is steerable. The decorrelating/whitening characteristics of networks with lateral inhibition are used in order to develop space invariant pre-whitening kernels specialized for specific category of input signals. These filters have extremely small memory footprint and are successfully utilized in order to improve performance of adaptive neural whitening algorithms. Finally, the method for feature extraction based on localized Independent Component Analysis (ICA) transform in log-polar domain and aided by previously developed pre-whitening filters is implemented. Since output codes produced by ICA are very sparse, a small number of non-zero coefficients was sufficient to encode input data and obtain reliable pattern recognition performance

    A Stacked Multi-Granularity Convolution Denoising Auto-Encoder

    Get PDF
    With the development of big data, artificial intelligence has provided many intelligent solutions to urban life. For instance, an image-based intelligent technology, such as image classification of diseases, is widely used in daily life. However, the image in real life is mostly unlabeled, so the performance of many image-based intelligent models shows limitations. Therefore, how to use a large amount of unlabeled image data to build an efficient and high-quality model for better urban life has been an urgent research topic. In this paper, we propose an unsupervised image feature extraction method that is referred to as a stacked multi-granularity convolution denoising auto-encoder (SMGCDAE). The algorithm is based on a convolutional neural network (CNN), yet it introduces a multi-granularity kernel. This approach resolved issues with image unicity by extracting a diverse category of high-level features. In addition, the denoising auto-encoder ensures stability and improves the classification accuracy by extracting more robust features. The algorithm was assessed using three image benchmark datasets and a series of meningitis images, achieving higher average accuracy than other methods. These results suggest that the algorithm is capable of extracting more discriminative high-level features and thus offers superior performance compared with the existing methodologies

    ECONOMIC VALUATION OF ECOSYSTEM SERVICES: A CASE STUDY FOR SUSTAINABLE MANAGEMENT OF DEGRADED PEATLANDS IN LATVIA

    Get PDF
    Ecosystem services (ES) are the benefits that people obtain from using ecosystems and can be divided into the following three categories: provisioning, regulating and supporting and cultural services. The strategical importance of ecosystem services is set by the EU Biodiversity Strategy, which put ecosystem services firmly on the EU policy agenda. The aim of the paper is to present and discuss the model for economic (monetary) valuation of ecosystem services for sustainable management of degraded peatlands in Latvia. Based on an economic valuation of ecosystem services (ES), it is possible to compare different territories and different management scenarios. Peatland ecosystems globally represent a major store of soil carbon, a sink for carbon dioxide and a source of atmospheric methane. Climate change may threaten these stocks due to the peat oxidation caused by the draught in areas where the peat extraction has been carried out, as well as the increased risk of forest fires. In Latvia, currently there have not been developed a strategy for the implementation of standard approaches and basic principles for the management of degraded peatlands. There are several options for re-cultivation of degraded peatlands, but for sustainable land use, it is very important to choose the most optimal option from the economic, ecological and society perspective. The research was based on data obtained from a biophysical ES assessment for 28 indicators for 3 scenarios from a 5, 25 and 50-year perspective. The collection of primary data, as well as an aggregation and comparative assessment of secondary data have been carried out by using approbated scientific research methods and ES assessment indicators. The obtained data were adapted to the Latvian socio-economic situation by using correction factors. Depending on ES category, the following assessment methods were used: the market pricing method; the benefit transfer method and the direct market pricing method; the avoided costs method. Economic valuation of peatland re-cultivation scenarios assists land-use planners and policymakers in making ecologically, economically and socio-culturally sustainable land-use decisions, where ecological and economic data are used for a calculated assessment of the land-use options

    Enhancing Performance in Medical Articles Summarization with Multi-Feature Selection

    Get PDF
    The research aimed at providing an outcome summary of extraordinary events information for public health surveillance systems based on the extraction of online medical articles. The data set used is 7,346 pieces. Characteristics possessed by online medical articles include paragraphs that comprise more than one and the core location of the story or important sentences scattered at the beginning, middle and end of a paragraph. Therefore, this study conducted a summary by maintaining important phrases related to the information of extraordinary events scattered in every paragraph in the medical article online. The summary method used is maximal marginal relevance with an n-best value of 0.7. While the multi feature selection in question is the use of features to improve the performance of the summary system. The first feature selection is the use of title and statistic number of word and noun occurrence, and weighting tf-idf. In addition, other features are word level category in medical content patterns to identify important sentences of each paragraph in the online medical article. The important sentences defined in this study are classified into three categories: core sentence, explanatory sentence, and supporting sentence. The system test in this study was divided into two categories, such as extrinsic and intrinsic test. Extrinsic test is comparing the summary results of the decisions made by the experts with the output resulting from the system. While intrinsic test compared three n-Best weighting value method, feature selection combination, and combined feature selection combination with word level category in medical content. The extrinsic evaluation result was 72%. While intrinsic evaluation result of feature selection combination merger method with word category in medical content was 91,6% for precision, 92,6% for recall and f-measure was 92,2%
    • …
    corecore