19 research outputs found
Predicting judging-perceiving of Myers-Briggs Type Indicator (MBTI) in online social forum
The Myers-Briggs Type Indicator (MBTI) is a well-known personality test that assigns a personality type to a user by using four traits dichotomies. For many years, people have used MBTI as an instrument to develop self-awareness and to guide their personal decisions. Previous researches have good successes in predicting Extraversion-Introversion (E/I), Sensing-Intuition (S/N) and Thinking-Feeling (T/F) dichotomies from textual data but struggled to do so with Judging-Perceiving (J/P) dichotomy. J/P dichotomy in MBTI is a non-separable part of MBTI that have significant inference on human behavior, perception and decision towards their surroundings. It is an assessment on how someone interacts with the world when making decision. This research was set out to evaluate the performance of the individual features and classifiers for J/P dichotomy in personality computing. At the end, data leakage was found in dataset originating from the Personality Forum Café, which was used in recent researches. The results obtained from the previous research on this dataset were suggested to be overly optimistic. Using the same settings, this research managed to outperform previous researches. Five machine learning algorithms were compared, and LightGBM model was recommended for the task of predicting J/P dichotomy in MBTI personality computing
Automatic lexicon generator
Over the past decades, computer revolution has opened up many possibilities for new field of investigation. With greater accessibility to information and lowering cost of powerful computers, this has spawned new efforts towards understanding complex tasks. Lexicon in particular has long been recognized as interesting and challenging because of its complexness. It is the knowledge of individual words in the language that has been perceived as central component for all types of natural language processing system. In this paper we present an algorithm to create an automatic lexicon generator in order to generate lexicon from an input document by making use of Apple Pie Parser. The lexicon generated managed to reduce significant amount of time and manpower drastically. Psycholinguists as well as computational linguists can benefit from this automatic lexicon construction
Identifying the influential spreaders in multilayer interactions of online social networks
Online social networks (OSNs) portray a multi-layer of interactions through which users become a friend, information is propagated, ideas are shared, and interaction is constructed within an OSN. Identifying the most influential spreaders in a network is a significant step towards improving the use of existing resources to speed up the spread of information for application such as viral marketing or hindering the spread of information for application like virus blocking and rumor restraint. Users communications facilitated by OSNs could confront the temporal and spatial limitations of traditional communications in an exceptional way, thereby presenting new layers of social interactions, which coincides and collaborates with current interaction layers to redefine the multiplex OSN. In this paper, the effects of different topological network structure on influential spreaders identification are investigated. The results analysis concluded that improving the accuracy of influential spreaders identification in OSNs is not only by improving identification algorithms but also by developing a network topology that represents the information diffusion well. Moreover, in this paper a topological representation for an OSN is proposed which takes into accounts both multilayers interactions as well as overlaying links as weight. The measurement results are found to be more reliable when the identification algorithms are applied to proposed topological representation compared when these algorithms are applied to single layer representations
Retrieving answers from multiple documents using semantic skolem indexing
The representation of document content is very important factors in retrieval process. The failure to create a good knowledge representation will definitely lead to failure in terms of its retrieval no matter how good the retrieval engine is. Therefore, this research focused on creating a reliable knowledge representation for our retrieval engine. We are using skolem to capture the information conveyed by multiple text documents and used skolem as an index language. This research also focuses on utilizing the skolem index as its knowledge representation in its question answering system. The system is capable of retrieving the answer as well as states the exact document in which the answer is derived from
Skolem preprocessing using WordNet and lexicon in building effective knowledge representation
We are in the information intensive environment in which various forms of digital contents have been growing exponentially. In this era of digital data, knowledge representation has been considered as a crucial component of any information retrieval system. It is also considered as a major problem especially in representing the content of unstructured text in an effective way. Although the mission remains impossible to achieve 100% accuracy, many researchers are indulging themselves in documenting these data in many different techniques so that it can be communicated effectively and easily. Indexing is an important element that determines the success of retrieval. Since we are dealing with multiple documents, preprocessing of data is needed before the data gets indexed. Thus, this paper presents an approach on the preprocessing technique. The semantic data which have been represented in skolem clauses will be preprocessed with the help of automatic lexicon generator output and WordNet. This preprocessing plays an important role in getting rid of redundant data before it gets indexed into the semantic matrix. Besides redundancy, it also helps in dealing with common problem that exists in indexing multiple documents in which similar sentences with more or less the same meaning but have been constructed by using different sets of words. As a conclusion, the integration of WordNet and lexicon leads to better result in terms of building effective knowledge representation
Mining Facebook in Identifying Software Engineering Students' Personality and Job Matching
Abstract Getting the job that suits our capability is a dream of each job seeker. But in real life, job seekers especially the fresh graduates may end up choosing a wrong career path because of ignorance of their own strength and weaknesses and improper guidance. When this happens, they tend of perform poorly in job market. Understanding a persons' personality helps in placing them in the right jobs and organization. In our research, we would like to focus on how Facebook can be used as a platform to judge the personality of a student and how it helps in matching the right job. The scope of this research is limited to software engineering students. A system was developed and is expected to help these students to be aware of their own personality based on their user generated data from Facebook wall. Big Five Personality Model has been used in gauging the personality of each individual. Besides that, the system also suggests the most suitable software engineering jobs that fit the students based on their gauged personality. This will somehow help them not to take up the wrong career
Cyberbullying Detection in Social Networks: A Comparison Between Machine Learning and Transfer Learning Approaches
Information and Communication Technologies fueled social networking and facilitated communication. However, cyberbullying on the platform had detrimental ramifications. The user-dependent mechanisms like reporting, blocking, and removing bullying posts online is manual and ineffective. Bag-of-words text representation without metadata limited cyberbullying post text classification. This research developed an automatic system for cyberbullying detection with two approaches: Conventional Machine Learning and Transfer Learning. This research adopted AMiCA data encompassing significant amount of cyberbullying context and structured annotation process. Textual, sentiment and emotional, static and contextual word embeddings, psycholinguistics, term lists, and toxicity features were used in the conventional Machine Learning approach. This study was the first to use toxicity features to detect cyberbullying. This study is also the first to use the latest psycholinguistics features from the Linguistic Inquiry and Word (LIWC) 2022 tool, as well as Empath’s lexicon, to detect cyberbullying. The contextual embeddings of ggeluBert, tnBert, and DistilBert have alike performance, however DistilBert embeddings were elected for higher F-measure. Textual features, DistilBert embeddings, and toxicity features that struck new benchmark were the top three unique features when fed individually. The model’s performance was boosted to F-measure of 64.8% after feeding with a combination of textual, sentiment, DistilBert embeddings, psycholinguistics, and toxicity features to the Logistic Regression model that outperforms Linear SVC with faster training time and efficient handling of high-dimensionality features. Transfer Learning approach was by fine-tuning optimized version Pre-trained Language Models namely, DistilBert, DistilRoBerta, and Electra-small which were found to have speedier training computation than their base form. The fine-tuned DistilBert resulted with the highest F-measure of 72.42%, surpassing CML. Our research concluded that Transfer Learning was the best for uplifted performance and lesser effort as feature engineering and resampling was omitted
Job recommendation using facebook personality scores
Facebook is one of the most popular social media sites that has become part of our lives. User-generated Facebook data are useful and can be used to gauge personality. However, previous studies did not use Facebook data for personality assessment and mapping for professional purposes. The current study mainly aims to identify personality features using user-generated content in Facebook. A computational score is created and a model is developed by utilizing these scores in job recommendations that match the personality of the user. The personality score of Facebook is benchmarked against the Big Five Inventory (BFI) test score to determine accuracy. The scores of Facebook personality scores and BFI test reached 93.1%. The findings of this study benefits job candidates, especially fresh graduates by assisting them in identifying a career that suits their personality. This study also helps create awareness among individuals by identifying the personality strengths and weaknesses through the use of Facebook information. This study can help employers find candidates who fit the needs of the company by gauging their personality through Facebook data
Identification of significant features and machine learning technique in predicting helpful reviews
Consumers nowadays rely heavily on online reviews in making their purchase decisions. However, they are often overwhelmed by the mass amount of product reviews that are being generated on online platforms. Therefore, it is deemed essential to determine the helpful reviews, as it will significantly reduce the number of reviews that each consumer has to ponder. A review is identified as a helpful review if it has significant information that helps the reader in making a purchase decision. Many reviews posted online are lacking a sufficient amount of information used in the decision-making process. Past research has neglected much useful information that can be utilized in predicting helpful reviews. This research identifies significant information which is represented as features categorized as linguistic, metadata, readability, subjectivity, and polarity that have contributed to predicting helpful online reviews. Five machine learning models were compared on two Amazon open datasets, each consisting of 9,882,619 and 65,222 user reviews. The significant features used in the Random Forest technique managed to outperform other techniques used by previous researchers with an accuracy of 89.36%