308 research outputs found
CCOM-HuQin: an Annotated Multimodal Chinese Fiddle Performance Dataset
HuQin is a family of traditional Chinese bowed string instruments. Playing
techniques(PTs) embodied in various playing styles add abundant emotional
coloring and aesthetic feelings to HuQin performance. The complex applied
techniques make HuQin music a challenging source for fundamental MIR tasks such
as pitch analysis, transcription and score-audio alignment. In this paper, we
present a multimodal performance dataset of HuQin music that contains
audio-visual recordings of 11,992 single PT clips and 57 annotated musical
pieces of classical excerpts. We systematically describe the HuQin PT taxonomy
based on musicological theory and practical use cases. Then we introduce the
dataset creation methodology and highlight the annotation principles featuring
PTs. We analyze the statistics in different aspects to demonstrate the variety
of PTs played in HuQin subcategories and perform preliminary experiments to
show the potential applications of the dataset in various MIR tasks and
cross-cultural music studies. Finally, we propose future work to be extended on
the dataset.Comment: 15 pages, 11 figure
Statistical Parsing by Machine Learning from a Classical Arabic Treebank
Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic.
Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations.
A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic.
The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year
Ensemble Named Entity Recognition (NER):Evaluating NER Tools in the Identification of Place Names in Historical Corpora
The field of Spatial Humanities has advanced substantially in the past years. The identification and extraction of toponyms and spatial information mentioned in historical text collections has allowed its use in innovative ways, making possible the application of spatial analysis and the mapping of these places with geographic information systems. For instance, automated place name identification is possible with Named Entity Recognition (NER) systems. Statistical NER methods based on supervised learning, in particular, are highly successful with modern datasets. However, there are still major challenges to address when dealing with historical corpora. These challenges include language changes over time, spelling variations, transliterations, OCR errors, and sources written in multiple languages among others. In this article, considering a task of place name recognition over two collections of historical correspondence, we report an evaluation of five NER systems and an approach that combines these through a voting system. We found that although individual performance of each NER system was corpus dependent, the ensemble combination was able to achieve consistent measures of precision and recall, outperforming the individual NER systems. In addition, the results showed that these NER systems are not strongly dependent on preprocessing and translation to Modern English
Retracing the 1910 Carruthers Royal Geographical Society Expedition to the Turgen Mountains of Mongolia – Reconstruction of a Century of Glacial Change
The Turgen Mountains lie in northwestern Mongolia, roughly 80 kilometers south of the Russian border. The area was visited in 1910 by a Royal Geographical Society (RGS) expedition led by Douglas Carruthers. They undertook an extensive survey of the range and produced a detailed topographic map. They also documented the extent of the glaciers with photographs. This modern study consisted of three phases. The first step was to procure the historical documents from the RGS in London, including copies of the photos, journal entries, and the map. Field work in Mongolia entailed traveling to the remote study site and retracing portions the 1910 expedition. Camera locations were matched to the historical photographs and repeat images taken. In addition, the termini of the two main glacial lobes were surveyed by GPS. Finally, spatial analysis was conducted in the computer laboratory using a GIS to generate a „historic‟ elevation model from the 1910 map and compare it to a modern DEM generated from SRTM data. Map analysis software was employed to evaluate cartometric accuracy of the 1910 map against modern Russian topographic sheets. The results of the DEM and map analysis were then validated using the field GPS data and remotely sensed imagery to quantitatively describe the changes in the glacial system. The repeat photography was analyzed using photogrammetric techniques to measure glacier changes. Also, a custom cartographic product was produced in the style of the 1910 Carruthers map. It displays the extent of the glaciers in 2010 and the locations of repeat photography stations for future expeditions. Placing the results of this study alongside previous work paints a clear picture of the Turgen glacial regime over the last century. The results suggest that while the snow and ice volume on the summits appears to be intact, lower elevation glaciers show significant ablation. This study successively demonstrates the utility of using historic expedition documents to extend the modern record of glacial change
International Summerschool Computer Science 2014: Proceedings of Summerschool 7.7. - 13.7.2014
Proceedings of International Summerschool Computer Science 201
Meaning refinement to improve cross-lingual information retrieval
Magdeburg, Univ., Fak. für Informatik, Diss., 2012von Farag Ahme
The Diffusion of a Personal Health Record for Patients with Type 2 Diabetes Mellitus in Primary Care
- …