23,645 research outputs found
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Brazilian Congress structural balance analysis
In this work, we study the behavior of Brazilian politicians and political
parties with the help of clustering algorithms for signed social networks. For
this purpose, we extract and analyze a collection of signed networks
representing voting sessions of the lower house of Brazilian National Congress.
We process all available voting data for the period between 2011 and 2016, by
considering voting similarities between members of the Congress to define
weighted signed links. The solutions obtained by solving Correlation Clustering
(CC) problems are the basis for investigating deputies voting networks as well
as questions about loyalty, leadership, coalitions, political crisis, and
social phenomena such as mediation and polarization.Comment: 27 pages, 15 tables, 6 figures; entire article was revised, new
references added (including international press); correcting typing error
Corporate venture capital, strategic alliances, and the governance of newly public firms
We examine the effect of investments by corporate venture capitalists (CVCs) on the governance structures of venture backed IPOs. One of the main differences between CVCs and traditional venture capitalists (TVCs) is that the former often invest for strategic reasons and enter into various types of strategic alliances with their portfolio firms that last well beyond the IPO. We argue that the presence of such strategic alliances will have a significant impact on the governance structure of CVC backed firms when they go public and in the following years. Using a sample of venture backed IPOs, we evaluate several hypotheses concerning the role of CVCs in the corporate governance of newly public firms. We find that strategic CVC backed IPOs have weaker CEOs and more outsiders on the board and on the compensation committee than a carefully selected sample of matching firms. In addition, the probability of forced CEOs turnover is higher for strategic CVC backed IPOs, while at the same time these firms use staggered boards more frequently. In contrast, the governance structures of purely financial CVC backed IPO firms and their matching firms do not exhibit any significant differences.
Context and Keyword Extraction in Plain Text Using a Graph Representation
Document indexation is an essential task achieved by archivists or automatic
indexing tools. To retrieve relevant documents to a query, keywords describing
this document have to be carefully chosen. Archivists have to find out the
right topic of a document before starting to extract the keywords. For an
archivist indexing specialized documents, experience plays an important role.
But indexing documents on different topics is much harder. This article
proposes an innovative method for an indexing support system. This system takes
as input an ontology and a plain text document and provides as output
contextualized keywords of the document. The method has been evaluated by
exploiting Wikipedia's category links as a termino-ontological resources
Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament
In this paper, we want to study the informative value of negative links in
signed complex networks. For this purpose, we extract and analyze a collection
of signed networks representing voting sessions of the European Parliament
(EP). We first process some data collected by the VoteWatch Europe Website for
the whole 7 th term (2009-2014), by considering voting similarities between
Members of the EP to define weighted signed links. We then apply a selection of
community detection algorithms, designed to process only positive links, to
these data. We also apply Parallel Iterative Local Search (Parallel ILS), an
algorithm recently proposed to identify balanced partitions in signed networks.
Our results show that, contrary to the conclusions of a previous study focusing
on other data, the partitions detected by ignoring or considering the negative
links are indeed remarkably different for these networks. The relevance of
negative links for graph partitioning therefore is an open question which
should be further explored.Comment: in 2nd European Network Intelligence Conference (ENIC), Sep 2015,
Karlskrona, Swede
- …