3,043 research outputs found
Who is the director of this movie? Automatic style recognition based on shot features
We show how low-level formal features, such as shot duration, meant as length
of camera takes, and shot scale, i.e. the distance between the camera and the
subject, are distinctive of a director's style in art movies. So far such
features were thought of not having enough varieties to become distinctive of
an author. However our investigation on the full filmographies of six different
authors (Scorsese, Godard, Tarr, Fellini, Antonioni, and Bergman) for a total
number of 120 movies analysed second by second, confirms that these
shot-related features do not appear as random patterns in movies from the same
director. For feature extraction we adopt methods based on both conventional
and deep learning techniques. Our findings suggest that feature sequential
patterns, i.e. how features evolve in time, are at least as important as the
related feature distributions. To the best of our knowledge this is the first
study dealing with automatic attribution of movie authorship, which opens up
interesting lines of cross-disciplinary research on the impact of style on the
aesthetic and emotional effects on the viewers
Multilingual Language Processing From Bytes
We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads
text as bytes and outputs span annotations of the form [start, length, label]
where start positions, lengths, and labels are separate entries in our
vocabulary. Because we operate directly on unicode bytes rather than
language-specific words or characters, we can analyze text in many languages
with a single model. Due to the small vocabulary size, these multilingual
models are very compact, but produce results similar to or better than the
state-of- the-art in Part-of-Speech tagging and Named Entity Recognition that
use only the provided training datasets (no external data sources). Our models
are learning "from scratch" in that they do not rely on any elements of the
standard pipeline in Natural Language Processing (including tokenization), and
thus can run in standalone fashion on raw text
Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry
We are investigating methods by which data from dependency syntax treebanks of ancient Greek can be applied to questions of authorship in ancient Greek historiography. From the Ancient Greek Dependency Treebank were constructed syntax words (sWords) by tracing the shortest path from each leaf node to the root for each sentence tree. This paper presents the results of a preliminary test of the usefulness of the sWord as a stylometric discriminator. The sWord data was subjected to clustering analysis. The resultant groupings were in accord with traditional classifications. The use of sWords also allows a more fine-grained heuristic exploration of difficult questions of text reuse. A comparison of relative frequencies of sWords in the directly transmitted Polybius book 1 and the excerpted books 9–10 indicate that the measurements of the two texts are generally very close, but when frequencies do vary, the differences are surprisingly large. These differences reveal that a certain syntactic simplification is a salient characteristic of Polybius’ excerptor, who leaves conspicuous syntactic indicators of his modifications
Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry
We are investigating methods by which data from dependency syntax treebanks of ancient Greek can be applied to questions of authorship in ancient Greek historiography. From the Ancient Greek Dependency Treebank were constructed syntax words (sWords) by tracing the shortest path from each leaf node to the root for each sentence tree. This paper presents the results of a preliminary test of the usefulness of the sWord as a stylometric discriminator. The sWord data was subjected to clustering analysis. The resultant groupings were in accord with traditional classifications. The use of sWords also allows a more fine-grained heuristic exploration of difficult questions of text reuse. A comparison of relative frequencies of sWords in the directly transmitted Polybius book 1 and the excerpted books 9–10 indicate that the measurements of the two texts are generally very close, but when frequencies do vary, the differences are surprisingly large. These differences reveal that a certain syntactic simplification is a salient characteristic of Polybius’ excerptor, who leaves conspicuous syntactic indicators of his modifications
Toward Visualization and Analysis of Traceability Relationships in Distributed and Offshore Software Development Projects
Offshore software development projects provoke new issues to the collaborative endeavor of software development due to their global distribution and involvement of various people, processes, and tools. These problems relate to the geographical distance and the associated time-zone differences; cultural, organizational, and process issues; as well as language problems. However, existing tool support is neither adequate nor grounded in empirical observations. This paper presents two empirical studies of global software development teams and their usage of tools. The results are then used to motivate and inform the construction of more useful software development tools. The focus is on issues that are tool-related but have not yet been solved by existing tools. The two software tools presented as solutions, Ariadne and TraVis, explicitly address yet unresolved issues in global software development and also integrate with prevalent other solutions
Automatic privacy and utility evaluation of anonymized documents via deep learning
Text anonymization methods are evaluated by comparing their outputs with human-based anonymizations through standard information retrieval (IR) metrics. On the one hand, the residual disclosure risk is quantified with the recall metric, which gives the proportion of re-identifying terms successfully detected by the anonymization algorithm. On the other hand, the preserved utility is measured with the precision metric, which accounts the proportion of masked terms that were also annotated by the human experts. Nevertheless, because these evaluation metrics were meant for information retrieval rather than privacy-oriented tasks, they suffer from several drawbacks. First, they assume a unique ground truth, and this does not hold for text anonymization, where several masking choices could be equally valid to prevent re-identification. Second, annotation-based evaluation relies on human judgements, which are inherently subjective and may be prone to errors. Finally, both metrics weight terms uniformly, thereby ignoring the fact that the influence on the disclosure risk or on utility preservation of some terms may be much larger than of others. To overcome these drawbacks, in this thesis we propose two novel methods to evaluate both the disclosure risk and the utility preserved in anonymized texts. Our approach leverages deep learning methods to perform this evaluation automatically, thereby not requiring human annotations. For assessing disclosure risks, we propose using a re-identification attack, which we define as a multi-class classification task built on top of state-of-the art language models. To make it feasible, the attack has been designed to capture the means and computational resources expected to be available at the attacker's end. For utility assessment, we propose a method that measures the information loss incurred during the anonymization process, which relies on a neural masked language modeling. We illustrate the effectiveness of our methods by evaluating the disclosure risk and retained utility of several well-known techniques and tools for text anonymization on a common dataset. Empirical results show significant privacy risks for all of them (including manual anonymization) and consistently proportional utility preservation
Recovering from External Disturbances in Online Manipulation through State-Dependent Revertive Recovery Policies
Robots are increasingly entering uncertain and unstructured environments.
Within these, robots are bound to face unexpected external disturbances like
accidental human or tool collisions. Robots must develop the capacity to
respond to unexpected events. That is not only identifying the sudden anomaly,
but also deciding how to handle it. In this work, we contribute a recovery
policy that allows a robot to recovery from various anomalous scenarios across
different tasks and conditions in a consistent and robust fashion. The system
organizes tasks as a sequence of nodes composed of internal modules such as
motion generation and introspection. When an introspection module flags an
anomaly, the recovery strategy is triggered and reverts the task execution by
selecting a target node as a function of a state dependency chart. The new
skill allows the robot to overcome the effects of the external disturbance and
conclude the task. Our system recovers from accidental human and tool
collisions in a number of tasks. Of particular importance is the fact that we
test the robustness of the recovery system by triggering anomalies at each node
in the task graph showing robust recovery everywhere in the task. We also
trigger multiple and repeated anomalies at each of the nodes of the task
showing that the recovery system can consistently recover anywhere in the
presence of strong and pervasive anomalous conditions. Robust recovery systems
will be key enablers for long-term autonomy in robot systems. Supplemental info
including code, data, graphs, and result analysis can be found at [1].Comment: 8 pages, 8 figures, 1 tabl
- …