1,655 research outputs found

    Deep Learning for Period Classification of Historical Texts

    Get PDF
    In this study, we address the interesting task of classifying historical texts by their assumed period of writing. This task is useful in digital humanity studies where many texts have unidentified publication dates. For years, the typical approach for temporal text classification was supervised using machine-learning algorithms. These algorithms require careful feature engineering and considerable domain expertise to design a feature extractor to transform the raw text into a feature vector from which the classifier could learn to classify any unseen valid input. Recently, deep learning has produced extremely promising results for various tasks in natural language processing (NLP). The primary advantage of deep learning is that human engineers did not design the feature layers, but the features were extrapolated from data with a general-purpose learning procedure. We investigated deep learning models for period classification of historical texts. We compared three common models: paragraph vectors, convolutional neural networks (CNN), and recurrent neural networks (RNN). We demonstrate that the CNN and RNN models outperformed the paragraph vector model and supervised machine-learning algorithms. In addition, we constructed word embeddings for each time period and analyzed semantic changes of word meanings over time

    Papers in Southeast Asian Linguistics No. 9: Language policy, language planning and sociolinguistics in South-East Asia

    Get PDF

    Religion, division of labor and conflict : anti-semitism in Germany over 600 years

    Get PDF
    We study the role of economic incentives in shaping the co-existence of Jews, Catholics and Protestants, using novel data from Germany for 1,000+ cities. The Catholic usury ban and higher literacy rates gave Jews a specific advantage in the moneylending sector. Following the Protestant Reformation (1517), the Jews lost these advantages in regions that became Protestant. We show 1) a change in the geography of anti-Semitism with persecutions of Jews and anti-Jewish publications becoming more common in Protestant areas relative to Catholic areas; 2) a more pronounced change in cities where Jews had already established themselves as moneylenders. These findings are consistent with the interpretation that, following the Protestant Reformation, Jews living in Protestant regions were exposed to competition with the Christian majority, especially in moneylending, leading to an increase in anti-Semitism

    Fuerzas tradicionales de exclusión: Una revisión de la literatura cuantitativa sobre la situación económica de los pueblos indígenas, afrodescendientes y personas con discapacidad

    Get PDF
    (Disponible en inglés) La distribución desigual de riqueza en América Latina y el Caribe esta ligada a la distribución desigual de activos (humanos y físicos) y al acceso diferenciado a los mercados y servicios. Estas circunstancias, y las correspondientes tensiones sociales, deben ser entendidas en términos de fuerzas tradicionales de exlcusión; los sectores de la población que experimentan resultados desfavorables también pueden ser reconocidos por características como etnicidad, raza, género y discapacidaes físicas. Además de revisar la literatura en exclusión social, este trabajo revisa diferentes tópicos: (i) deprivación relativa (en tierra y vivienda, infraestructura física, salud e ingresos); (ii) temas de los mercados de trabajo, incluyendo acceso a los mercados en general, así como informalidad, segregación y discriminación; (iii) los puntos de transacción de representación política, protección social y violencia; y (iv) áreas en las que el análisis aun es débil y avenidas para mayor investigación en la región.

    Sentiment analysis for hate speech detection on social media: TF-IDF weighted N-Grams based approach

    Get PDF
    Thesis submitted in partial fulfillment of the requirements for the Degree of Master of Science in Information Technology (MSIT) at Strathmore UniversityHate speech on social media has unfortunately become a common occurrence in the Kenyan online community largely due to advances in mobile computing and the internet. Incidents of hate speech on social media have the potential of quickly disseminating amidst online users and escalating into acts of violence and hate crimes due to incitement, as was the case during the 2007-2008 Post Election Violence. With the upcoming, highly contested 2017 general elections, the monitoring of hate speech on social media platforms is of critical importance to detect hate speech occurrences as soon as possible to prevent any further escalations which may result in violence. Current efforts by the National Cohesion and Integration Commission to monitor hate speech on social media involve the use of web crawlers to collect possible instances of hate speech based on specific keywords. Human monitors then have to analyze the collected data to determine instances that are actually hate speech. This human analysis is not only time consuming and overwhelming but also introduces subjective notions of what constitutes hate speech. This research proposed the application of machine learning techniques to build a text binary classifier to detect hate speech on twitter. Hate speech data was collected and labelled to build the corpora. A Support Vector Machine model was trained and validated based on the labelled text data using unigram features and term frequency-inverse document frequency weighting. The research employed an experimental approach to determine which combination of features, weighting schemes and classifiers gives the best performance on the collected hate speech data. Bigram features weighted using term frequency-inverse document frequency fed into a Support Vector Machine classifier gave the best classification performance at an accuracy of 76.22 percent, with an area under the curve of 0.76 for a Receiver Operating Characteristic curve
    corecore