Search CORE

4 research outputs found

ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Author: Lifshitz Yair
Marmor Yanir
Misgav Kinneret
Publication venue
Publication date: 17/07/2023
Field of study

We introduce "ivrit.ai", a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) technology in Hebrew. With over 3,300 speech hours and a over a thousand diverse speakers, ivrit.ai offers a substantial compilation of Hebrew speech across various contexts. It is delivered in three forms to cater to varying research needs: raw unprocessed audio; data post-Voice Activity Detection, and partially transcribed data. The dataset stands out for its legal accessibility, permitting use at no cost, thereby serving as a crucial resource for researchers, developers, and commercial entities. ivrit.ai opens up numerous applications, offering vast potential to enhance AI capabilities in Hebrew. Future efforts aim to expand ivrit.ai further, thereby advancing Hebrew's standing in AI research and technology.Comment: 9 pages, 1 table and 3 figure

arXiv.org e-Print Archive

Non-verbal information in spontaneous speech -- towards a new framework of analysis

Author: Barboy Moshe
Ben-Artzy Eran
Biron Tirza
Golubchik Alona
Harel David
Marmor Yanir
Szekely Smadar
Winter Yaron
Publication venue
Publication date: 13/03/2024
Field of study

Non-verbal signals in speech are encoded by prosody and carry information that ranges from conversation action to attitude and emotion. Despite its importance, the principles that govern prosodic structure are not yet adequately understood. This paper offers an analytical schema and a technological proof-of-concept for the categorization of prosodic signals and their association with meaning. The schema interprets surface-representations of multi-layered prosodic events. As a first step towards implementation, we present a classification process that disentangles prosodic phenomena of three orders. It relies on fine-tuning a pre-trained speech recognition model, enabling the simultaneous multi-class/multi-label detection. It generalizes over a large variety of spontaneous data, performing on a par with, or superior to, human annotation. In addition to a standardized formalization of prosody, disentangling prosodic patterns can direct a theory of communication and speech organization. A welcome by-product is an interpretation of prosody that will enhance speech- and language-related technologies

arXiv.org e-Print Archive

Assessing individual risk and the latent transmission of COVID-19 in a population with an interaction-driven temporal model

Author: Alex Abbey
Osnat Mokryn
Yanir Marmor
Yuval Shahar
Publication venue: Nature Portfolio
Publication date: 01/08/2023
Field of study

Abstract Interaction-driven modeling of diseases over real-world contact data has been shown to promote the understanding of the spread of diseases in communities. This temporal modeling follows the path-preserving order and timing of the contacts, which are essential for accurate modeling. Yet, other important aspects were overlooked. Various airborne pathogens differ in the duration of exposure needed for infection. Also, from the individual perspective, Covid-19 progression differs between individuals, and its severity is statistically correlated with age. Here, we enrich an interaction-driven model of Covid-19 and similar airborne viral diseases with (a) meetings duration and (b) personal disease progression. The enriched model enables predicting outcomes at both the population and the individual levels. It further allows predicting individual risk of engaging in social interactions as a function of the virus characteristics and its prevalence in the population. We further showed that the enigmatic nature of asymptomatic transmission stems from the latent effect of the network density on this transmission and that asymptomatic transmission has a substantial impact only in sparse communities

Directory of Open Access Journals

Ninety-Nine Percent? Re-Examining the Consensus on the Anthropogenic Contribution to Climate Change

Author: Avner Niv
David Dentelski
Mor Roses
Ran Damari
Yanir Marmor
Yonatan Dubi
Publication venue: MDPI AG
Publication date: 01/10/2023
Field of study

Anthropogenic activity is considered a central driver of current climate change. A recent paper, studying the consensus regarding the hypothesis that the recent increase in global temperature is predominantly human-made via the emission of greenhouse gasses (see text for reference), argued that the scientific consensus in the peer-reviewed scientific literature pertaining to this hypothesis exceeds 99%. This conclusion was reached after the authors scanned the abstracts and titles of some 3000 papers and mapped them according to their (abstract) statements regarding the above hypothesis. Here, we point out some major flaws in the methodology, analysis, and conclusions of the study. Using the data provided in the study, we show that the 99% consensus, as defined by the authors, is actually an upper limit evaluation because of the large number of “neutral” papers which were counted as pro-consensus in the paper and probably does not reflect the true situation. We further analyze these results by evaluating how so-called “skeptic” papers fit the consensus and find that biases in the literature, which were not accounted for in the aforementioned study, may place the consensus on the low side. Finally, we show that the rating method used in the study suffers from a subjective bias which is reflected in large variations between ratings of the same paper by different raters. All these lead to the conclusion that the conclusions of the study does not follow from the data

Directory of Open Access Journals