252 research outputs found

    Researching the Research: Applying Machine Learning Techniques to Dissertation Classification

    Get PDF
    This research examines industry-based dissertation research in a doctoral computing program through the lens of machine learning algorithms to determine if natural language processing-based categorization on abstracts alone is adequate for classification. This research categorizes dissertation by both their abstracts and by their full-text using the GraphLab Create library from Apple’s Turi to identify if abstract analysis is an adequate measure of content categorization, which we found was not. We also compare the dissertation categorizations using IBM’s Watson Discovery deep machine learning tool. Our research provides perspectives on the practicality of the manual classification of technical documents; and, it provides insights into the: (1) categories of academic work created by experienced fulltime working professionals in a Computing doctoral program, (2) viability and performance of automated categorization of the abstract analysis against the fulltext dissertation analysis, and (3) natual language processing versus human manual text classification abstraction

    Bridging the demand and the offer in data science

    Get PDF
    During the last several years, we have observed an exponential increase in the demand for Data Scientists in the job market. As a result, a number of trainings, courses, books, and university educational programs (both at undergraduate, graduate and postgraduate levels) have been labeled as “Big data” or “Data Science”; the fil‐rouge of each of them is the aim at forming people with the right competencies and skills to satisfy the business sector needs. In this paper, we report on some of the exercises done in analyzing current Data Science education offer and matching with the needs of the job markets to propose a scalable matching service, ie, COmpetencies ClassificatiOn (E‐CO‐2), based on Data Science techniques. The E‐CO‐2 service can help to extract relevant information from Data Science–related documents (course descriptions, job Ads, blogs, or papers), which enable the comparison of the demand and offer in the field of Data Science Education and HR management, ultimately helping to establish the profession of Data Scientist.publishedVersio

    Improving Text Classification with Semantic Information

    Get PDF
    The Air Force contracts a variety of positions, from Information Technology to maintenance services. There is currently no automated way to verify that quotes for services are reasonably priced. Small training data sets and word sense ambiguity are challenges that such a tool would encounter, and additional semantic information could help. This thesis hypothesizes that leveraging a semantic network could improve text-based classification. This thesis uses information from ConceptNet to augment a Naive Bayes Classifier. The leveraged semantic information would add relevant words from the category domain to the model that did not appear in the training data. The experiment compares variations of a Naive Bayes Classifier leveraging semantic information, including an Ensemble Model, against classifiers that do not. Results show a significant performance increase in a smaller data set but not a larger one. Out of all models tested, an Ensemble Based Classifier performs the best on both data sets. The results show that ConceptNet does not add enough new or relevant information to affect classifier performance on large data sets

    Context based multimedia information retrieval

    Get PDF

    Schema Matching for Large-Scale Data Based on Ontology Clustering Method

    Get PDF
    Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies, has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an f-measure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation

    Automated illustration of multimedia stories

    Get PDF
    Submitted in part fulfillment of the requirements for the degree of Master in Computer ScienceWe all had the problem of forgetting about what we just read a few sentences before. This comes from the problem of attention and is more common with children and the elderly. People feel either bored or distracted by something more interesting. The challenge is how can multimedia systems assist users in reading and remembering stories? One solution is to use pictures to illustrate stories as a mean to captivate ones interest as it either tells a story or makes the viewer imagine one. This thesis researches the problem of automated story illustration as a method to increase the readers’ interest and attention. We formulate the hypothesis that an automated multimedia system can help users in reading a story by stimulating their reading memory with adequate visual illustrations. We propose a framework that tells a story and attempts to capture the readers’ attention by providing illustrations that spark the readers’ imagination. The framework automatically creates a multimedia presentation of the news story by (1) rendering news text in a sentence by-sentence fashion, (2) providing mechanisms to select the best illustration for each sentence and (3) select the set of illustrations that guarantees the best sequence. These mechanisms are rooted in image and text retrieval techniques. To further improve users’ attention, users may also activate a text-to-speech functionality according to their preference or reading difficulties. First experiments show how Flickr images can illustrate BBC news articles and provide a better experience to news readers. On top of the illustration methods, a user feedback feature was implemented to perfect the illustrations selection. With this feature users can aid the framework in selecting more accurate results. Finally, empirical evaluations were performed in order to test the user interface,image/sentence association algorithms and users’ feedback functionalities. The respective results are discussed

    Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

    Get PDF
    Peer reviewe

    Fairness and Bias in Algorithmic Hiring

    Full text link
    Employers are adopting algorithmic hiring technology throughout the recruitment pipeline. Algorithmic fairness is especially applicable in this domain due to its high stakes and structural inequalities. Unfortunately, most work in this space provides partial treatment, often constrained by two competing narratives, optimistically focused on replacing biased recruiter decisions or pessimistically pointing to the automation of discrimination. Whether, and more importantly what types of, algorithmic hiring can be less biased and more beneficial to society than low-tech alternatives currently remains unanswered, to the detriment of trustworthiness. This multidisciplinary survey caters to practitioners and researchers with a balanced and integrated coverage of systems, biases, measures, mitigation strategies, datasets, and legal aspects of algorithmic hiring and fairness. Our work supports a contextualized understanding and governance of this technology by highlighting current opportunities and limitations, providing recommendations for future work to ensure shared benefits for all stakeholders

    ResuMatcher: A Personalized Resume-Job Matching System

    Get PDF
    Today, online recruiting web sites such as Monster and Indeed.com have become one of the main channels for people to find jobs. These web platforms have provided their services for more than ten years, and have saved a lot of time and money for both job seekers and organizations who want to hire people. However, traditional information retrieval techniques may not be appropriate for users. The reason is because the number of results returned to a job seeker may be huge, so job seekers are required to spend a significant amount of time reading and reviewing their options. One popular approach to resolve this difficulty for users are recommender systems, which is a technology that has been studied for a long time. In this thesis we have made an effort to propose a personalized job-résumé matching system, which could help job seekers to find appropriate jobs more easily. We create a finite state transducer based information extraction library to extract models from résumés and job descriptions. We devised a new statistical-based ontology similarity measure to compare the résumé models and the job models. Since the most appropriate jobs will be returned first, the users of the system may get a better result than current job finding web sites. To evaluate the system, we computed Normalized Discounted Cumulative Gain (NDCG) and precision@k of our system, and compared to three other existing models as well as the live result from Indeed.com
    • 

    corecore