35 research outputs found

    Semantical-coordinate Terms Detection from Hierarchical Knowledge Using Web Snippets

    Get PDF
    AbstractIn this paper, we describe a method to detect semantical-coordinate terms. We often use semantical-coordinate terms as objects of comparative validation and examples for the given term, and then linguistic expressions and knowledge processing are enriched. Semantical-coordinate terms should be hyponyms of a same hypernym, and their usage and concepts should be similar. However hierarchical knowledge is useful to view concepts, hierarchically coordinate terms are sometimes inappropriate as semantical-coordinate terms for our suppositions, because some of them might have multiple hypernyms and general perceptions of the terms are not taken in the consideration. On the other hand, using only Web context to detect semantical-coordinate terms, concepts of the terms are not taken in the consideration while public perceptions and their usages might be incorporated. Therefore we propose hybrid method using both hierarchical knowledge and Web snippets. We conducted bench scale tests to detect semantical-coordinate terms of some terms and discuss about the results in this paper. Through the tests and discussions, we confirmed that the semantical-coordinate terms detected by our proposed method were not only hierarchically but also semantically and intuitively appropriate

    Can You Fool AI by Doing a 180? \unicode{x2013} A Case Study on Authorship Analysis of Texts by Arata Osada

    Full text link
    This paper is our attempt at answering a twofold question covering the areas of ethics and authorship analysis. Firstly, since the methods used for performing authorship analysis imply that an author can be recognized by the content he or she creates, we were interested in finding out whether it would be possible for an author identification system to correctly attribute works to authors if in the course of years they have undergone a major psychological transition. Secondly, and from the point of view of the evolution of an author's ethical values, we checked what it would mean if the authorship attribution system encounters difficulties in detecting single authorship. We set out to answer those questions through performing a binary authorship analysis task using a text classifier based on a pre-trained transformer model and a baseline method relying on conventional similarity metrics. For the test set, we chose works of Arata Osada, a Japanese educator and specialist in the history of education, with half of them being books written before the World War II and another half in the 1950s, in between which he underwent a transformation in terms of political opinions. As a result, we were able to confirm that in the case of texts authored by Arata Osada in a time span of more than 10 years, while the classification accuracy drops by a large margin and is substantially lower than for texts by other non-fiction writers, confidence scores of the predictions remain at a similar level as in the case of a shorter time span, indicating that the classifier was in many instances tricked into deciding that texts written over a time span of multiple years were actually written by two different people, which in turn leads us to believe that such a change can affect authorship analysis, and that historical events have great impact on a person's ethical outlook as expressed in their writings

    ANALYSING ROLES OF POSITION IN CURLING BASED ON SHOTSCORES

    Get PDF
    This paper reports on the analysis of the characteristics for each position based on shot-scores in curling. We computed average shot-scores for each position from 26 curling game information, and analyzed the correlation between the differences in average shot-scores and the differences in final game scores. The results show that strong correlations appeared for the position of the lead and the fourth, and weak correlations for the second and the third. It indicates that the roles of the lead and the fourth relate to game score directly, but the roles of the second and the third relate to the game progress. We were able to confirm the relationship between the score and each position of the target team

    Modeling Learning motivation of students based on analysis of class evaluation questionnaire

    Get PDF
    In this paper, we present our research on modeling learning motivation of students by analyzing class evaluation questionnaires, carried out with students as respondents at the end of each term to improve the courses in future semesters. We firstly defined three elements influencing learning motivation: (1) interest, (2) usefulness in the future and (3) satisfaction. Original questionnaire enquiring about those three elements was designed and conducted in multiple classes across different school years. Next, we conducted an experiment to classify students’ motivation for learning using the provided answers. The results of the experiment showed that students’ learning motivation can be estimated using the three elements defined in this study

    The Bayesian Optimal Algorithm for Query Refinement in Information Retrieval

    Get PDF
    Summary To realize more efficient information retrieval it is critical to improve the user's original query, because novice users can not be expected to formulate precise and effective queries. Queries can often be improved by adding extra terms that appear in relevant documents but which were not included in the original query. This is called query expansion. Query refinement, a variant of query expansion, interactively recommends new terms related to the original query. Because previous research did not offer any criterion to guarantee optimality, this paper proposes an optimal algorithm for query refinement with reference to the Bayes criterion

    Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity

    Full text link
    This paper investigates the impact of data volume and the use of similar languages on transfer learning in a machine translation task. We find out that having more data generally leads to better performance, as it allows the model to learn more patterns and generalizations from the data. However, related languages can also be particularly effective when there is limited data available for a specific language pair, as the model can leverage the similarities between the languages to improve performance. To demonstrate, we fine-tune mBART model for a Polish-English translation task using the OPUS-100 dataset. We evaluate the performance of the model under various transfer learning configurations, including different transfer source languages and different shot levels for Polish, and report the results. Our experiments show that a combination of related languages and larger amounts of data outperforms the model trained on related languages or larger amounts of data alone. Additionally, we show the importance of related languages in zero-shot and few-shot configurations

    Statistical Analysis of Automatic Seed Word Acquisition to Improve Harmful Expression Extraction in Cyberbullying Detection

    Get PDF
    We study the social problem of cyberbullying, defined as a new form of bullying that takes place in the Internet space. This paper proposes a method for automatic acquisition of seed words to improve performance of the original method for the cyberbullying detection by Nitta et al. [1]. We conduct an experiment exactly in the same settings to find out that the method based on a Web mining technique, lost over 30% points of its performance since being proposed in 2013. Thus, we hypothesize on the reasons for the decrease in the performance and propose a number of improvements, from which we experimentally choose the best one. Furthermore, we collect several seed word sets using different approaches, evaluate and their precision. We found out that the influential factor in extraction of harmful expressions is not the number of seed words, but the way the seed words were collected and filtered

    Brute - Force Sentence Pattern Extortion from Harmful Messages for Cyberbullying Detection

    Get PDF
    Cyberbullying, or humiliating people using the Internet, has existed almost since the beginning ofInternet communication.The relatively recent introduction of smartphones and tablet computers has caused cyberbullying to evolve into a serious social problem. In Japan, members of a parent-teacher association (PTA)attempted to address the problem by scanning the Internet for cyber bullying entries. To help these PTA members and other interested parties confront this difficult task we propose a novel method for automatic detection of malicious Internet content. This method is based on a combinatorial approach resembling brute-force search algorithms, but applied in language classification. The method extracts sophisticated patterns from sentences and uses them in classification. The experiments performed on actual cyberbullying data reveal an advantage of our method vis-Ă -visprevious methods. Next, we implemented the method into an application forAndroid smartphones to automatically detect possible harmful content in messages. The method performed well in the Android environment, but still needs to be optimized for time efficiency in order to be used in practic

    WWW-based Figurative Descriptions for Japanese Word

    No full text
    corecore