11 research outputs found

    Acronym-Expansion Disambiguation for Intelligent Processing of Enterprise Information

    Get PDF
    An acronym is an abbreviation of several words in such a way that the abbreviation itself forms a pronounceable word. Acronyms occur frequently throughout various documents, especially those of a technical nature, for example, research papers and patents. While these acronyms can enhance document readability, in a variety of fields, they have a negative effect on business intelligence. To resolve this problem, we propose a method of acronym-expansion disambiguation to collect high-quality enterprise information. In experimental evaluations, we demonstrate its efficiency through the use of objective comparisons

    Least Slack Time Rate first: an Efficient Scheduling Algorithm for Pervasive Computing Environment

    No full text
    Real-time systems like pervasive computing have to complete executing a task within the predetermined time while ensuring that the execution results are logically correct. Such systems require intelligent scheduling methods that can adequately promptly distribute the given tasks to a processor(s). In this paper, we propose LSTR (Least Slack Time Rate first), a new and simple scheduling algorithm, for a multi-processor environment, and demonstrate its efficient performance through various tests

    Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

    No full text
    In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech

    Analysis of Learning Influence of Training Data Selected by Distribution Consistency

    No full text
    This study suggests a method to select core data that will be helpful for machine learning. Specifically, we form a two-dimensional distribution based on the similarity of the training data and compose grids with fixed ratios on the distribution. In each grid, we select data based on the distribution consistency (DC) of the target class data and examine how it affects the classifier. We use CIFAR-10 for the experiment and set various grid ratios from 0.5 to 0.005. The influences of these variables were analyzed with the use of different training data sizes selected based on high-DC, low-DC (inverse of high DC), and random (no criteria) selections. As a result, the average point accuracy at 0.95% (±0.65) and the point accuracy at 1.54% (±0.59) improved for the grid configurations of 0.008 and 0.005, respectively. These outcomes justify an improved performance compared with that of the existing approach (data distribution search). In this study, we confirmed that the learning performance improved when the training data were selected for very small grid and high-DC settings

    OurPlaces: Cross-Cultural Crowdsourcing Platform for Location Recommendation Services

    No full text
    This paper presents a cross-cultural crowdsourcing platform, called OurPlaces, where people from different cultures can share their spatial experiences. We built a three-layered architecture composed of: (i) places (locations where people have visited); (ii) cognition (how people have experienced these places); and (iii) users (those who have visited these places). Notably, cognition is represented as a paring of two similar places from different cultures (e.g., Versailles and Gyeongbokgung in France and Korea, respectively). As a case study, we applied the OurPlaces platform to a cross-cultural tourism recommendation system and conducted a simulation using a dataset collected from TripAdvisor. The tourist places were classified into four types (i.e., hotels, restaurants, shopping malls, and attractions). In addition, user feedback (e.g., ratings, rankings, and reviews) from various nationalities (assumed to be equivalent to cultures) was exploited to measure the similarities between tourism places and to generate a cognition layer on the platform. To demonstrate the effectiveness of the OurPlaces-based system, we compared it with a Pearson correlation-based system as a baseline. The experimental results show that the proposed system outperforms the baseline by 2.5% and 4.1% in the best case in terms of MAE and RMSE, respectively

    Ensemble-Based Out-of-Distribution Detection

    No full text
    To design an efficient deep learning model that can be used in the real-world, it is important to detect out-of-distribution (OOD) data well. Various studies have been conducted to solve the OOD problem. The current state-of-the-art approach uses a confidence score based on the Mahalanobis distance in a feature space. Although it outperformed the previous approaches, the results were sensitive to the quality of the trained model and the dataset complexity. Herein, we propose a novel OOD detection method that can train more efficient feature space for OOD detection. The proposed method uses an ensemble of the features trained using the softmax-based classifier and the network based on distance metric learning (DML). Through the complementary interaction of these two networks, the trained feature space has a more clumped distribution and can fit well on the Gaussian distribution by class. Therefore, OOD data can be efficiently detected by setting a threshold in the trained feature space. To evaluate the proposed method, we applied our method to various combinations of image datasets. The results show that the overall performance of the proposed approach is superior to those of other methods, including the state-of-the-art approach, on any combination of datasets

    Domain Terminology Collection for Semantic Interpretation of Sensor Network Data

    No full text
    Many studies have investigated the management of data delivered over sensor networks and attempted to standardize their relations. Sensor data come from numerous tangible and intangible sources, and existing work has focused on the integration and management of the sensor data itself. The data should be interpreted according to the sensor environment and related objects, even though the data type, and even the value, is exactly the same. This means that the sensor data should have semantic connections with all objects, and so a knowledge base that covers all domains should be constructed. In this paper, we suggest a method of domain terminology collection based on Wikipedia category information in order to prepare seed data for such knowledge bases. However, Wikipedia has two weaknesses, namely, loops and unreasonable generalizations in the category structure. To overcome these weaknesses, we utilize a horizontal bootstrapping method for category searches and domain-term collection. Both the category-article and article-link relations defined in Wikipedia are employed as terminology indicators, and we use a new measure to calculate the similarity between categories. By evaluating various aspects of the proposed approach, we show that it outperforms the baseline method, having wider coverage and higher precision. The collected domain terminologies can assist the construction of domain knowledge bases for the semantic interpretation of sensor data

    Context Representation and Fusion: Advancements and Opportunities

    No full text
    The acceptance and usability of context-aware systems have given them the edge of wide use in various domains and has also attracted the attention of researchers in the area of context-aware computing. Making user context information available to such systems is the center of attention. However, there is very little emphasis given to the process of context representation and context fusion which are integral parts of context-aware systems. Context representation and fusion facilitate in recognizing the dependency/relationship of one data source on another to extract a better understanding of user context. The problem is more critical when data is emerging from heterogeneous sources of diverse nature like sensors, user profiles, and social interactions and also at different timestamps. Both the processes of context representation and fusion are followed in one way or another; however, they are not discussed explicitly for the realization of context-aware systems. In other words most of the context-aware systems underestimate the importance context representation and fusion. This research has explicitly focused on the importance of both the processes of context representation and fusion and has streamlined their existence in the overall architecture of context-aware systems’ design and development. Various applications of context representation and fusion in context-aware systems are also highlighted in this research. A detailed review on both the processes is provided in this research with their applications. Future research directions (challenges) are also highlighted which needs proper attention for the purpose of achieving the goal of realizing context-aware systems
    corecore