9,203 research outputs found

    Text based classification of companies in CrunchBase

    Get PDF
    This paper introduces two fuzzy fingerprint based text classification techniques that were successfully applied to automatically label companies from CrunchBase, based purely on their unstructured textual description. This is a real and very challenging problem due to the large set of possible labels (more than 40) and also to the fact that the textual descriptions do not have to abide by any criteria and are, therefore, extremely heterogeneous. Fuzzy fingerprints are a recently introduced technique that can be used for performing fast classification. They perform well in the presence of unbalanced datasets and can cope with a very large number of classes. In the paper, a comparison is performed against some of the best text classification techniques commonly used to address similar problems. When applied to the CrunchBase dataset, the fuzzy fingerprint based approach outperformed the other techniques.info:eu-repo/semantics/submittedVersio

    How Agricultural Economists Increase the Value of Agribusiness Research

    Get PDF
    Historically, there has been declining cooperation between agribusiness firms and agricultural economists. In new product marketing research, firms' tend to conduct their own analyses, partially due to confidentiality, usually consisting of simple univariate or bivariate statistics such as chi-squared tests of independence. The primary objective of this paper is to demonstrate, through a case study, one way in which agricultural economists can add value to agribusiness firms research. Results from the econometric model offer a richer explanation of consumer behavior and may be more useful to agribusiness firms.Teaching/Communication/Extension/Profession,

    Magnetoelectric effects in an organo-metallic quantum magnet

    Full text link
    We observe a bilinear magnetic field-induced electric polarization of 50 μC/m2\mu C/m^2 in single crystals of NiCl2_2-4SC(NH2_2)2_2 (DTN). DTN forms a tetragonal structure that breaks inversion symmetry, with the highly polar thiourea molecules all tilted in the same direction along the c-axis. Application of a magnetic field between 2 and 12 T induces canted antiferromagnetism of the Ni spins and the resulting magnetization closely tracks the electric polarization. We speculate that the Ni magnetic forces acting on the soft organic lattice can create significant distortions and modify the angles of the thiourea molecules, thereby creating a magnetoelectric effect. This is an example of how magnetoelectric effects can be constructed in organo-metallic single crystals by combining magnetic ions with electrically polar organic elements.Comment: 3 pages, 3 figure

    Detecting relevant tweets in very large tweet collections: the London Riots case study

    Get PDF
    In this paper we propose to approach the subject of detecting relevant tweets when in the presence of very large tweet collections containing a large number of different trending topics. We use a large database of tweets collected during the 2011 London Riots as a case study to demonstrate the application of the proposed techniques. In order to extract relevant content, we extend, formalize and apply a recent technique, called Twitter Topic Fuzzy Fingerprints, which, in the scope of social media, outperforms other well known text based classification methods, while being less computationally demanding, an essential feature when processing large volumes of streaming data. Using this technique we were able to detect 45% additional relevant tweets within the database

    Creating classification models from textual descriptions of companies using crunchbase

    Get PDF
    This paper compares different models for multilabel text classification, using information collected from Crunchbase, a large database that holds information about more than 600000 companies. Each company is labeled with one or more categories, from a subset of 46 possible categories, and the proposed models predict the categories based solely on the company textual description. A number of natural language processing strategies have been tested for feature extraction, including stemming, lemmatization, and part-of-speech tags. This is a highly unbalanced dataset, where the frequency of each category ranges from 0.7% to 28%. Our findings reveal that the description text of each company contain features that allow to predict its area of activity, expressed by its corresponding categories, with about 70% precision, and 42% recall. In a second set of experiments, a multiclass problem that attempts to find the most probable category, we obtained about 67% accuracy using SVM and Fuzzy Fingerprints. The resulting models may constitute an important asset for automatic classification of texts, not only consisting of company descriptions, but also other texts, such as web pages, text blogs, news pages, etc.info:eu-repo/semantics/publishedVersio

    Using geolocated tweets for characterization of Twitter in Portugal and the Portuguese administrative regions

    Get PDF
    The information published by the millions of public social network users is an important source of knowledge that can be used in academic, socioeconomic or demographic studies (distribution of male and female population, age, marital status, birth), lifestyle analysis (interests, hobbies, social habits) or be used to study online behavior (time spent online, interaction with friends or discussion about brands, products or politics). This work uses a database of about 27 million Portuguese geolocated tweets, produced in Portugal by 97.8 K users during a 1-year period, to extract information about the behavior of the geolocated Portuguese Twitter community and show that with this information it is possible to extract overall indicators such as: the daily periods of increased activity per region; prediction of regions where the concentration of the population is higher or lower in certain periods of the year; how do regional habitants feel about life; or what is talked about in each region. We also analyze the behavior of the geolocated Portuguese Twitter users based on the tweeted contents, and find indications that their behavior differs in certain relevant aspect from other Twitter communities, hypothesizing that this is in part due to the abnormal high percentage of young teenagers in the community. Finally, we present a small case study on Portuguese tourism in the Algarve region. To the best of our knowledge, this work is the first study that shows geolocated Portuguese users' behavior in Twitter focusing on geographic regional use.info:eu-repo/semantics/acceptedVersio

    Improving Twitter gender classification using multiple classifiers

    Get PDF
    The user profile information is important for many studies, but essential information, such as gender and age, is not provided when creating a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. Our experiments use an English labelled dataset containing 6.5M tweets from 65K users, and a Portuguese labelled dataset containing 5.8M tweets from 58K users. We use supervised approaches, considering four groups of features extracted from different sources: user name and screen name, user description, content of the tweets, and profile picture. A final classifier that combines the prediction of each one of the four previous partial classifiers achieves 93.2% accuracy for English and 96.9% accuracy for Portuguese data.info:eu-repo/semantics/publishedVersio

    Creating extended gender labelled datasets of Twitter users

    Get PDF
    The gender information of a Twitter user is not known a priori when analysing Twitter data, because user registration does not include gender information. This paper proposes an approach for creating extended gender labelled datasets of Twitter users. The process involves creating a smaller database of active Twitter users and to manually label the gender. The process follows by extracting features from unstructured information found on each user profile and by creating a gender classification model. The model is then applied to a larger dataset, thus providing automatic labels and corresponding confidence scores, which can be used to estimate the most accurately labeled users. The resulting databases can be further enriched with additional information extracted, for example, from the profile picture and from the user location. The proposed approach was successfully applied to English and Portuguese users, leading to two large datasets containing more than 57K labeled users each.info:eu-repo/semantics/acceptedVersio

    Electron Spin Resonance of defects in the Haldane System Y(2)BaNiO(5)

    Full text link
    We calculate the electron paramagnetic resonance (EPR) spectra of the antiferromagnetic spin-1 chain compound Y(2)BaNi(1-x)Mg(x)O(5) for different values of x and temperature T much lower than the Haldane gap (~100K). The low-energy spectrum of an anisotropic Heisenberg Hamiltonian, with all parameters determined from experiment, has been solved using DMRG. The observed EPR spectra are quantitatively reproduced by this model. The presence of end-chain S=1/2 states is clearly observed as the main peak in the spectrum and the remaining structure is completely understood.Comment: 5 pages, 4 figures include

    MISNIS: an intelligent platform for Twitter topic mining

    Get PDF
    Twitter has become a major tool for spreading news, for dissemination of positions and ideas, and for the commenting and analysis of current world events. However, with more than 500 million tweets flowing per day, it is necessary to find efficient ways of collecting, storing, managing, mining and visualizing all this information. This is especially relevant if one considers that Twitter has no ways of indexing tweet contents, and that the only available categorization “mechanism” is the #hashtag, which is totally dependent of a user's will to use it. This paper presents an intelligent platform and framework, named MISNIS - Intelligent Mining of Public Social Networks’ Influence in Society - that facilitates these issues and allows a non-technical user to easily mine a given topic from a very large tweet's corpus and obtain relevant contents and indicators such as user influence or sentiment analysis. When compared to other existent similar platforms, MISNIS is an expert system that includes specifically developed intelligent techniques that: (1) Circumvent the Twitter API restrictions that limit access to 1% of all flowing tweets. The platform has been able to collect more than 80% of all flowing portuguese language tweets in Portugal when online; (2) Intelligently retrieve most tweets related to a given topic even when the tweets do not contain the topic #hashtag or user indicated keywords. A 40% increase in the number of retrieved relevant tweets has been reported in real world case studies. The platform is currently focused on Portuguese language tweets posted in Portugal. However, most developed technologies are language independent (e.g. intelligent retrieval, sentiment analysis, etc.), and technically MISNIS can be easily expanded to cover other languages and locations
    corecore