223 research outputs found
A Mathematical Measurement For Korean Text Mining and Its Application
Department of Mathematical SciencesIn modern society we are buried beneath an overwhelming amount of text data on the internet. We are less inclined to just surf the web and pass the time. To solve this problem, especially to grasp part and parcel of the text data we are presented, there have been numerous studies on the relationship between text data and the ease of the perception of the text???s meaning. However, most of the studies focused on English text data. Since most research did not take into account the linguistic characters, these same methods are not suitable for Korean text. Some special method is required to analyze Korean text data utilizing the characteristics of Korean. Thus we are proposing a new framework for Korean text mining in various texts via proper mathematical measurements.
The framework is constructed with three parts:
1) text summarization
2) text clustering
3) relational text learning.
Text summarization is the method of extracting the essential sentences from the text. As a measure of importance, we propose specific formulas which focus on the characteristics of Korean. These formulas will provide the input features for the fuzzy summarization system.
However, this method has a significant defect for large data set. The number of the summarized sentences increases with the word count of a particular text. To solve this, we propose using text clustering. This field has been studied for a long time. It has a tradeo??? of accuracy for speed. Considering the syllable features of Asian linguistics, we have designed ???Syllable Vector??? as a new measurement. It has shown remarkable performance as implemented with text clustering, especially for high accuracy and speed through e???ectively reducing dimensions.
Thirdly, we considered the relational feature of text data. The above concepts deal with the document itself. That is, text information has an independent relationship between documents. To handle these relations, we designed a new architecture for text learning using neural networks (NN). Recently, the most remarkable work in natural language processing (NLP) is ???word2vec???, which is built with artificial neural networks. Our proposed model has a learning structure of bipartite layers using meta information between text data, with a focus on citation relationships. This structure reflects the latent topic of the text using the quoted information. It can solve the shortcomings of the conventional system based on the term-document matrix.ope
Recommended from our members
Probing the Ion Transport Properties of Ultrashort Carbon Nanotubes Integrated with Supported Lipid Bilayers via Electrochemical Analysis.
Supported lipid bilayers (SLBs) are commonly used to investigate interactions between cell membranes and their environment. These model platforms can be formed on electrode surfaces and analyzed using electrochemical methods for bioapplications. Carbon nanotube porins (CNTPs) integrated with SLBs have emerged as promising artificial ion channel platforms. In this study, we present the integration and ion transport characterization of CNTPs in in vivo environments. We combine experimental and simulation data obtained from electrochemical analysis to analyze the membrane resistance of the equivalent circuits. Our results show that carrying CNTPs on a gold electrode results in high conductance for monovalent cations (K+ and Na+) and low conductance for divalent cations (Ca2+)
GenHPF: General Healthcare Predictive Framework with Multi-task Multi-source Learning
Despite the remarkable progress in the development of predictive models for
healthcare, applying these algorithms on a large scale has been challenging.
Algorithms trained on a particular task, based on specific data formats
available in a set of medical records, tend to not generalize well to other
tasks or databases in which the data fields may differ. To address this
challenge, we propose General Healthcare Predictive Framework (GenHPF), which
is applicable to any EHR with minimal preprocessing for multiple prediction
tasks. GenHPF resolves heterogeneity in medical codes and schemas by converting
EHRs into a hierarchical textual representation while incorporating as many
features as possible. To evaluate the efficacy of GenHPF, we conduct multi-task
learning experiments with single-source and multi-source settings, on three
publicly available EHR datasets with different schemas for 12 clinically
meaningful prediction tasks. Our framework significantly outperforms baseline
models that utilize domain knowledge in multi-source learning, improving
average AUROC by 1.2%P in pooled learning and 2.6%P in transfer learning while
also showing comparable results when trained on a single EHR dataset.
Furthermore, we demonstrate that self-supervised pretraining using multi-source
datasets is effective when combined with GenHPF, resulting in a 0.6%P AUROC
improvement compared to models without pretraining. By eliminating the need for
preprocessing and feature engineering, we believe that this work offers a solid
framework for multi-task and multi-source learning that can be leveraged to
speed up the scaling and usage of predictive algorithms in healthcare.Comment: Accepted by IEEE Journal of Biomedical and Health Informatic
Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation
As an emerging field in Machine Learning, Explainable AI (XAI) has been
offering remarkable performance in interpreting the decisions made by
Convolutional Neural Networks (CNNs). To achieve visual explanations for CNNs,
methods based on class activation mapping and randomized input sampling have
gained great popularity. However, the attribution methods based on these
techniques provide lower resolution and blurry explanation maps that limit
their explanation power. To circumvent this issue, visualization based on
various layers is sought. In this work, we collect visualization maps from
multiple layers of the model based on an attribution-based input sampling
technique and aggregate them to reach a fine-grained and complete explanation.
We also propose a layer selection strategy that applies to the whole family of
CNN-based models, based on which our extraction framework is applied to
visualize the last layers of each convolutional block of the model. Moreover,
we perform an empirical analysis of the efficacy of derived lower-level
information to enhance the represented attributions. Comprehensive experiments
conducted on shallow and deep models trained on natural and industrial
datasets, using both ground-truth and model-truth based evaluation metrics
validate our proposed algorithm by meeting or outperforming the
state-of-the-art methods in terms of explanation ability and visual quality,
demonstrating that our method shows stability regardless of the size of objects
or instances to be explained.Comment: 9 pages, 9 figures, Accepted at the Thirty-Fifth AAAI Conference on
Artificial Intelligence (AAAI-21
Homogeneous bilayer graphene film based flexible transparent conductor
Graphene is considered a promising candidate to replace conventional
transparent conductors due to its low opacity, high carrier mobility and
flexible structure. Multi-layer graphene or stacked single layer graphenes have
been investigated in the past but both have their drawbacks. The uniformity of
multi-layer graphene is still questionable, and single layer graphene stacks
require many transfer processes to achieve sufficiently low sheet resistance.
In this work, bilayer graphene film grown with low pressure chemical vapor
deposition was used as a transparent conductor for the first time. The
technique was demonstrated to be highly efficient in fabricating a conductive
and uniform transparent conductor compared to multi-layer or single layer
graphene. Four transfers of bilayer graphene yielded a transparent conducting
film with a sheet resistance of 180 {\Omega}_{\square} at a transmittance of
83%. In addition, bilayer graphene films transferred onto plastic substrate
showed remarkable robustness against bending, with sheet resistance change less
than 15% at 2.14% strain, a 20-fold improvement over commercial indium oxide
films.Comment: Published in Nanoscale, Nov. 2011 : http://www.rsc.org/nanoscal
A semi-analytic method with an effect of memory for solving fractional differential equations
In this paper, we propose a new modification of the multistage generalized differential transform method (MsGDTM) for solving fractional differential equations. In MsGDTM, it is the key how to impose an initial condition in each sub-domain to obtain an accurate approximate solution. In several literature works (Odibat et al. in Comput. Math. Appl. 59:1462-1472, 2010; Alomari in Comput. Math. Appl. 61:2528-2534, 2011; Gokdoğan et al. in Math. Comput. Model. 54:2132-2138, 2011), authors have updated an initial condition in each sub-domain by using the approximate solution in the previous sub-domain. However, we point out that this approach is hard to apply an effect of memory which is the basic property of fractional differential equations. Here we provide a new algorithm to impose the initial conditions by using the integral operator that enhances accuracy. Several illustrative examples are demonstrated, and it is shown that the proposed technique is robust and accurate for solving fractional differential equations.close0
- …