1,284 research outputs found
Where clouds are made...
Where clouds are made... explored Didcot A Power station in the last few months of its active life as it approached it’s closure in 2013. This was a commissioned project, jointly funded by Npower and South Oxfordshire Council.
Through the project and exhibition we explored different types of physical and social relationships that people had with the power station over it’s working life. For our exhibition at Cornerstone Arts Centre, Didcot, we made a wooden scaffolding structure referencing part of the pre-fab construction process. It was made to the same scale as one of the cooling towers, but represented a fragment of the whole. The almost imperceptible arc across the gallery floor drew attention to the enormity of the whole and the difficulty of comprehending the scale. Seen through the construction, we made a series of laser-cut wall drawings playing with the 70’s language of dials and switches within the control room
Computational mechanics: from theory to practice
In the last fifty years, computational mechanics has gained the attention of a large number of disciplines, ranging from physics and mathematics to biology, involving all the disciplines that deal with complex systems or processes. With ϵ-machines, computational mechanics provides powerful models that can help characterizing these systems. To date, an increasing number of studies concern the use of such methodologies; nevertheless, an attempt to make this approach more accessible in practice is lacking yet. Starting from this point, this thesis aims at investigating a more practical approach to computational mechanics so as to make it suitable for applications in a wide spectrum of domains. ϵ-machines are analyzed more in the robotics scene, trying to understand if they can be exploited in contexts with typically complex dynamics like swarms. Experiments are conducted with random walk behavior and the aggregation task. Statistical complexity is first studied and tested on the logistical map and then exploited, as a more applicative case, in the analysis of electroencephalograms as a classification parameter, resulting in the discrimination between patients (with different sleep disorders) and healthy subjects.
The number of applications that may benefit from the use of such a technique is enormous. Hopefully, this work has broadened the prospect towards a more applicative interest
Finding viable seed URLs for web corpora: A scouting approach and comparative study of available sources
International audienceThe conventional tools of the "web as corpus" framework rely heavily on URLs obtained from search engines. Recently, the corresponding querying process became much slower or impossible to perform on a low budget. I try to find acceptable substitutes, i.e. viable link sources for web corpus construction. To this end, I perform a study of possible alternatives, including social networks as well as the Open Directory Project and Wikipedia. Four different languages (Dutch, French, Indonesian and Swedish) taken as examples show that complementary approaches are needed. My scouting approach using open-source software leads to a URL directory enriched with metadata which may be used to start a web crawl. This is more than a drop-in replacement for existing tools since said metadata enables researchers to filter and select URLs that fit particular needs, as they are classified according to their language, their length and a few other indicators such as host- and markup-based data
A one-pass valency-oriented chunker for German
International audienceNon-finite state parsers provide fine-grained information. However, they are computationally demanding. Therefore, it is interesting to see how far a shallow parsing approach is able to go. In a pattern-based matching operation, the transducer described here consists of POS-tags using regular expressions that take advantage of the characteristics of German grammar. The process aims at finding linguistically relevant phrases with a good precision, which enables in turn an estimation of the actual valency of a given verb. The chunker reads its input exactly once instead of using cascades, which greatly benefits computational efficiency. This finite-state chunking approach does not return a tree structure, but rather yields various kinds of linguistic information useful to the language researcher. Possible applications include simulation of text comprehension on the syntactical level, creation of selective benchmarks and failure analysis
Challenges in web corpus construction for low-resource languages in a post-BootCaT world
Software available under an open-source license: FLUX: Filtering and Language-identification for URL Crawling Seeds https://github.com/adbar/flux-toolchainInternational audienceThe state of the art tools of the "web as corpus" framework rely heavily on URLs obtained from search engines. Recently, this querying process has become very slow or impossible to perform on a low budget. In order to find reliable data sources for Indonesian, I perform a case study of different kinds of URL sources and crawling strategies. First, I classify URLs extracted from the Open Directory Project and Wikipedia for Indonesian, Malay, Danish, and Swedish in order to enable comparisons. Then I perform web crawls focusing on Indonesian and using the mentioned sources as the start URLs. My scouting approach using open-source software results in a URL database with metadata which can be used to replace or at least to complement the BootCaT approach
La Raison aveugle ? L'époque cybernétique et ses dispositifs
Programme de la journée d'études : http://calenda.org/220954L'affirmation de Martin Heidegger (en 1966) selon laquelle la cybernétique va désormais prendre la place de la philosophie donne le ton de la vision pessimiste d'une société dominée par la technique. En fait d'étrangeté, la modernité technique est fréquemment éprouvée sous le signe de l'accélération, de l'accroissement, de l'appauvrissement du vécu et elle est diagnostiquée à la suite de Heidegger comme étant un retrait de l'humain face à une rationalité tendue vers le progrès de l'uniformisation et de la fonctionnalité ainsi que vers la recherche mathématique de l'efficacité. On peut voir avec Gilbert Hottois une filiation entre cette techno-logie opératoire (avec les discours qu'elle implique) et la calculabilité des signes chez Leibniz. Le critère de vérité au sens de cette ars characteristica s'entend en termes de véracité logique et se voit détaché de toute interprétation, ce qui ouvre la voie à une raison combinatoire dite " aveugle " (" cognitio caeca vel symbolica "). On connaît la portée de la mécanique leibnizienne concernant la technique moderne et plus précisément les systèmes informatiques. On connaît également le primat du champ du visible en philosophie, du terme d'" idée " à l'association de l'esprit et de la lumière par exemple. Dès lors, il paraît opportun de faire une critique de la technique pensée comme une Raison aveugle qui méconnaît la portée des signes. La concrétisation de la Raison sous la forme d'une machine et l'agencement de l'humain sur ce modèle (pour Foucault), le règne de la cybernétique comprise comme science du gouvernement systématisé des vivants (pour Heidegger), les technosciences (pour Henry) sont autant d'entrées dans la critique des logiques et des dispositifs
Two comparable corpora of German newspaper text gathered on the web: Bild & Die Zeit: Technical report
This technical report documents the creation of two comparable corpora of German newspaper text, focused on the daily tabloid Bild and the weekly newspaper Die Zeit. Two specialized crawlers and corpus builders were designed in order to crawl the domain names bild.de and zeit.de with the objective of gathering as many complete articles as possible. A high content quality was made possible by the specially designed boilerplate removal and metadata recording code. As a result, two separate corpora were created. Currently, the last version for Bild is from 2011 and the last version for Die Zeit is from early 2013. The corpora feature a total of respectively 60 476 and 134 222 articles. Whereas the crawler designed for Bild has been discontinued due to frequent layout changes on the website, the other one concerning Die Zeit is still actively maintained, its code has been made available under an open source license
Challenges in the linguistic exploitation of specialized republishable web corpora
Short paper talk at RESAW 2015 conference (Aarhus, Denmark).International audienceI would like to present work on texts corpora in German, gathered on the Web and processed in order to be made available to linguists and a broader user community via a web interface. The corpora are specialized in the sense that they only address a particular text genre or source at a time. Web crawling techniques are used to download the documents, then they are stored roughly in the way web archives do. More precisely, I would like to talk about two cases where texts are expected to be republishable: a "standard" case, political speeches, and a "borderline" case, German blogs under CC license.The work is performed in the context of a digital dictionary of German. The primary user base consists of lexicographers, who need valuable or at least exploitable evidence, in the form of precise quotes or definition elements.The actual gathering and processing of the corpora is described elsewhere (anonymized references). In this talk I would like to focus on a series of challenges that are to be solved in order to make data from web archives accessible to researchers and to study web text corpora: metadata extraction, quality assurance, licensing, and "scientificity".1. A proper metadata extraction is needed in order to make further downstream applications possible. It has to be performed meticulously, since experience shows that even small or rare mistakes in date encoding for instance may cause the application to be disregarded or discarded by researchers in the humanities, since linguistic trends cannot be identified properly if the content is not ordered in time. Easily available metadata in the case of speeches constrast with different content types, encodings, and markup patterns concerning the blogs. Compromises have to be made without sacrificing recall, since republishable texts are rather rare.2. Regarding the content, quality assurance is paramount, since a high quality is expected by users, all the more since they may feel reluctant to use web texts for their studies. In fact, providing "Hi-Fi" web corpora also means promoting the cause of web sources and modernization of research methodology.3. The results are hosted in Germany, and thus German copyright laws apply, which can be considered to be more restrictive than others. Additionally, there are a number of issues with licensing in general and CC licenses in particular, even with manual verification: the CC ND and (to a lesser extent) NC predicates can hinder proper republication. There are also potential copyright issues regarding blog comments.To sum up the issues described above, much work flows into ensuring the "scientificity" of web texts and making the texts not only available but also citable in a scholarly sense
- …