1,916 research outputs found
Social media analytics: a survey of techniques, tools and platforms
This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing
Human Resources Recommender system based on discrete variables
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNatural Language Processing and Understanding has become one of the most exciting and challenging
fields in the area of Artificial Intelligence and Machine Learning. With the rapidly changing business
environment and surroundings, the importance of having the data transformed in such a way that
makes it easy to interpret is the greatest competitive advantage a company can have. Having said this,
the purpose of this thesis dissertation is to implement a recommender system for the Human
Resources department in a company that will aid the decision-making process of filling a specific job
position with the right candidate. The recommender system fill be fed with applicants, each being
represented by their skills, and will produce a subset of most adequate candidates given a job position.
This work uses StarSpace, a novelty neural embedding model, whose aim is to represent entities in a
common vectorial space and further perform similarity measures amongst them
Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes
E-commerce websites (e.g. Amazon) have a plethora of structured and
unstructured information (text and images) present on the product pages.
Sellers often either don't label or mislabel values of the attributes (e.g.
color, size etc.) for their products. Automatically identifying these attribute
values from an eCommerce product page that contains both text and images is a
challenging task, especially when the attribute value is not explicitly
mentioned in the catalog. In this paper, we present a scalable solution for
this problem where we pose attribute extraction problem as a question-answering
task, which we solve using \textbf{MXT}, consisting of three key components:
(i) \textbf{M}AG (Multimodal Adaptation Gate), (ii) \textbf{X}ception network,
and (iii) \textbf{T}5 encoder-decoder. Our system consists of a generative
model that \emph{generates} attribute-values for a given product by using both
textual and visual characteristics (e.g. images) of the product. We show that
our system is capable of handling zero-shot attribute prediction (when
attribute value is not seen in training data) and value-absent prediction (when
attribute value is not mentioned in the text) which are missing in traditional
classification-based and NER-based models respectively. We have trained our
models using distant supervision, removing dependency on human labeling, thus
making them practical for real-world applications. With this framework, we are
able to train a single model for 1000s of (product-type, attribute) pairs, thus
reducing the overhead of training and maintaining separate models. Extensive
experiments on two real world datasets show that our framework improves the
absolute recall@90P by 10.16\% and 6.9\% from the existing state of the art
models. In a popular e-commerce store, we have deployed our models for 1000s of
(product-type, attribute) pairs.Comment: ACL 2023 Industry Track, 8 Page
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Low-complexity Multiclass Encryption by Compressed Sensing
The idea that compressed sensing may be used to encrypt information from
unauthorised receivers has already been envisioned, but never explored in depth
since its security may seem compromised by the linearity of its encoding
process. In this paper we apply this simple encoding to define a general
private-key encryption scheme in which a transmitter distributes the same
encoded measurements to receivers of different classes, which are provided
partially corrupted encoding matrices and are thus allowed to decode the
acquired signal at provably different levels of recovery quality.
The security properties of this scheme are thoroughly analysed: firstly, the
properties of our multiclass encryption are theoretically investigated by
deriving performance bounds on the recovery quality attained by lower-class
receivers with respect to high-class ones. Then we perform a statistical
analysis of the measurements to show that, although not perfectly secure,
compressed sensing grants some level of security that comes at almost-zero cost
and thus may benefit resource-limited applications.
In addition to this we report some exemplary applications of multiclass
encryption by compressed sensing of speech signals, electrocardiographic tracks
and images, in which quality degradation is quantified as the impossibility of
some feature extraction algorithms to obtain sensitive information from
suitably degraded signal recoveries.Comment: IEEE Transactions on Signal Processing, accepted for publication.
Article in pres
- …