1,921 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Phoneme-based Video Indexing Using Phonetic Disparity Search

    Get PDF
    This dissertation presents and evaluates a method to the video indexing problem by investigating a categorization method that transcribes audio content through Automatic Speech Recognition (ASR) combined with Dynamic Contextualization (DC), Phonetic Disparity Search (PDS) and Metaphone indexation. The suggested approach applies genome pattern matching algorithms with computational summarization to build a database infrastructure that provides an indexed summary of the original audio content. PDS complements the contextual phoneme indexing approach by optimizing topic seek performance and accuracy in large video content structures. A prototype was established to translate news broadcast video into text and phonemes automatically by using ASR utterance conversions. Each phonetic utterance extraction was then categorized, converted to Metaphones, and stored in a repository with contextual topical information attached and indexed for posterior search analysis. Following the original design strategy, a custom parallel interface was built to measure the capabilities of dissimilar phonetic queries and provide an interface for result analysis. The postulated solution provides evidence of a superior topic matching when compared to traditional word and phoneme search methods. Experimental results demonstrate that PDS can be 3.7% better than the same phoneme query, Metaphone search proved to be 154.6% better than the same phoneme seek and 68.1 % better than the equivalent word search

    The Business Impact of Social Media - Sentiment Analysis Approach -

    Get PDF
    이 연구의 목적은 소셜 미디어에서 추출된 7개의 감성 도메인이 자동차 시장 점유율 예측에 대한 감성 분석 실험을 위한 데이터로서 적합한 지에 대한 신뢰성을 확인하고 고객들의 의견이 기업의 성과에 어떻게 영향을 미치는 지에 대하여 확인하기 위한 것이다. 본 연구는 총3단계에 걸쳐서 진행되었습니다. 첫 번째 단계는 감성사전 구축의 단계로서 2013년 1월 1일부터 2015년 12월 31일까지 미국 내 26개의 자동차 제조 회사의 고객의 소리 (VOC: Voice of the Customer) 총 45,447개를 자동차 커뮤니티로부터 크롤링 (crawling)하여 POS (Part-of-Speech) 즉 품사정보를 추출하는 태깅 (tagging)과정을 거쳐 부정적, 긍정적 감성의 빈도수를 측정하여 감성사전을 구축하였고, 이에 대한 극성을 측정하여 7개의 감성도메인을 만들었습니다. 두 번째 단계는 데이터에 대한 신뢰성 분석의 단계로서 자기상관관계분석 (Auto-correlation Analysis)과 주성분분석 (PCA: Principal Component Analysis)을 통해 데이터가 실험에 적합한지를 검증하였다. 세 번째 단계에서는 2개의 선형회귀분석 모델로 7개의 감성영역이 미국내 자동차 제조 회사 중 GM, 포드, FCA, 폭스바겐 등 총 4개의 자동차 생산 기업을 선정하여 이들 기업의 성과 즉, 자동차 시장점유율에 어떤 영향을 미치고 있는 지 실험하였다. 그 결과, 우리는 4,815개의 부정적인 어휘들과 2,021개의 긍정적인 감성어휘들을 추출하여 감성사전을 구축하였으며, 구축된 감성사전을 바탕으로, 추출되고 분류된 부정적이고 긍정적인 어휘들을 자동차 산업에 관련된 어휘들과 조합하였고, 자기상관분석과 PCA (주성분 분석)를 통해 감성의 특성을 조사하였다. 실험 결과에 따르면, 자기상관분석에 의해서 감성 데이터에 어떤 일정한 패턴이 존재한다는 것이 발견되었고, 각각의 감성 영역의 감성이 자기상관성이 있으며, 감성의 시계열성 또한 관찰되었다. PCA에 의한 결과로서, 7개 감성영역이 부정성, 긍정성, 중립성을 주성분으로 연결되어 있음을 확인할 수 있었다. 자기상관분석과 PCA를 통한 VOC 감성 데이터에 대한 신뢰성을 바탕으로 2개의 선형회귀분석 모델을 구축하여 실험을 진행하였다. 첫 번째 모델은 주성분 분석에서 부정적 감성의 Sadness, Anger, Fear와 긍정적 감성도메인인 Delight, Satisfaction을 독립변수로 선정하고, 시장점유율을 종속변수로 선정하여 실행하였고 두 번째 모델은 첫 번째 모델에 주성분이 중립성으로 결과가 나온 Shame, Frustration을 독립변수에 추가하여 중립성을 띠고 있는 감성이 시장 점유율에 유의미한 영향을 미치고 있는 지를 확인하였다. 분석 결과, 각 기업 마다 시장점유율에 유의미한 영향을 미치는 감성들이 존재하고 모델 1과, 모델 2에서의 감성 영향력이 차이가 있음을 발견하였다. 본 연구를 통해, 데이터 상에 나타난 정보를 가진 감성이 과거 값에 기초하여 자동차 시장에서 변화를 수반할 수 있다는 것을 나타내고 있음을 확인하였다. 또한, 우리가 시장 데이터의 가용성을 적용하려고 할 때, 자동차 시장 관련 정보나 감성의 자기상관성을 잘 활용할 수 있다면, 감정 분석에 대한 연구에 큰 기여를 할 수 있을 뿐만 아니라, 실제 시장에서의 비지니스 성과에도 다양한 방법으로 기여할 수 있을 것으로 기대된다.List of Tables iv List of Figures v Abstract 1 1. Introduction 1.1 Back Ground 3 1.2 Necessity of Study 6 1.3 Purpose & Questions 8 1.4 Structure 9 2. Literature Reviews of VOC Analysis 2.1 Importance of VOC 11 2.2 Data Mining 15 2.2.1 Concept & Functionalities 15 2.2.2 Methodologies of Data mining 20 2.3 Text Mining 24 2.4 Sentiment Analysis 26 2.5 Research Trend in Korea 30 3. Methodology 3.1 Research Flow 32 3.2 Proposed Methodologies 34 3.2.1 Sentiment Analysis 34 3.2.2 Auto-correlation Analysis 37 3.2.3 Principal Component Analysis (PCA) 38 3.2.4 Linear Regression 40 4. Experiment & Analysis 4.1 Phase I: Constructing Sentiment Lexicon & 7 Sentiment Domains 43 4.1.1 The Subject of Analysis & Crawling Data 43 4.1.2 Extracting POS Information 44 4.1.3 Review Extracting POS Information 46 4.2 Phase II : Reliability Analysis 49 4.2.1 Auto-correlation Analysis of Sentiment 51 4.2.2 Principal Component Analysis of Sentiment 55 4.3 Phase III : Influence on Automotive Market Share 58 4.3.1 Linear Regression Model 58 4.3.2 Definition of Variables 60 4.3.3 The Result of Linear Regression Analysis 62 5. Conclusion 5.1 Summary of Study 73 5.2 Managerial Implication and Limitation 75 5.3 Future Study 77 References 79Docto

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    A comparison of features for large population speaker identification

    Get PDF
    Bibliography: leaves 95-104.Speech recognition systems all have one criterion in common; they perform better in a controlled environment using clean speech. Though performance can be excellent, even exceeding human capabilities for clean speech, systems fail when presented with speech data from more realistic environments such as telephone channels. The differences using a recognizer in clean and noisy environments are extreme, and this causes one of the major obstacles in producing commercial recognition systems to be used in normal environments. It is the lack of performance of speaker recognition systems with telephone channels that this work addresses. The human auditory system is a speech recognizer with excellent performance, especially in noisy environments. Since humans perform well at ignoring noise more than any machine, auditory-based methods are the promising approaches since they attempt to model the working of the human auditory system. These methods have been shown to outperform more conventional signal processing schemes for speech recognition, speech coding, word-recognition and phone classification tasks. Since speaker identification has received lot of attention in speech processing because of its waiting real-world applications, it is attractive to evaluate the performance using auditory models as features. Firstly, this study rums at improving the results for speaker identification. The improvements were made through the use of parameterized feature-sets together with the application of cepstral mean removal for channel equalization. The study is further extended to compare an auditory-based model, the Ensemble Interval Histogram, with mel-scale features, which was shown to perform almost error-free in clean speech. The previous studies of Elli to be more robust to noise were conducted on speaker dependent, small population, isolated words and now are extended to speaker independent, larger population, continuous speech. This study investigates whether the Elli representation is more resistant to telephone noise than mel-cepstrum as was shown in the previous studies, when now for the first time, it is applied for speaker identification task using the state-of-the-art Gaussian mixture model system

    Term-driven E-Commerce

    Get PDF
    Die Arbeit nimmt sich der textuellen Dimension des E-Commerce an. Grundlegende Hypothese ist die textuelle Gebundenheit von Information und Transaktion im Bereich des elektronischen Handels. Überall dort, wo Produkte und Dienstleistungen angeboten, nachgefragt, wahrgenommen und bewertet werden, kommen natürlichsprachige Ausdrücke zum Einsatz. Daraus resultiert ist zum einen, wie bedeutsam es ist, die Varianz textueller Beschreibungen im E-Commerce zu erfassen, zum anderen können die umfangreichen textuellen Ressourcen, die bei E-Commerce-Interaktionen anfallen, im Hinblick auf ein besseres Verständnis natürlicher Sprache herangezogen werden

    Nodalida 2005 - proceedings of the 15th NODALIDA conference

    Get PDF

    Accessing spoken interaction through dialogue processing [online]

    Get PDF
    Zusammenfassung Unser Leben, unsere Leistungen und unsere Umgebung, alles wird derzeit durch Schriftsprache dokumentiert. Die rasante Fortentwicklung der technischen Möglichkeiten Audio, Bilder und Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt werden um die schriftliche Dokumentation von menschlicher Kommunikation, zum Beispiel Meetings, zu unterstützen, zu ergänzen oder gar zu ersetzen. Diese neuen Technologien können uns in die Lage versetzen Information aufzunehmen, die anderweitig verloren gehen, die Kosten der Dokumentation zu senken und hochwertige Dokumente mit audiovisuellem Material anzureichern. Die Indizierung solcher Aufnahmen stellt die Kerntechnologie dar um dieses Potential auszuschöpfen. Diese Arbeit stellt effektive Alternativen zu schlüsselwortbasierten Indizes vor, die Suchraumeinschränkungen bewirken und teilweise mit einfachen Mitteln zu berechnen sind. Die Indizierung von Sprachdokumenten kann auf verschiedenen Ebenen erfolgen: Ein Dokument gehört stilistisch einer bestimmten Datenbasis an, welche durch sehr einfache Merkmale bei hoher Genauigkeit automatisch bestimmt werden kann. Durch diese Art von Klassifikation kann eine Reduktion des Suchraumes um einen Faktor der Größenordnung 4­10 erfolgen. Die Anwendung von thematischen Merkmalen zur Textklassifikation bei einer Nachrichtendatenbank resultiert in einer Reduktion um einen Faktor 18. Da Sprachdokumente sehr lang sein können müssen sie in thematische Segmente unterteilt werden. Ein neuer probabilistischer Ansatz sowie neue Merkmale (Sprecherinitia­ tive und Stil) liefern vergleichbare oder bessere Resultate als traditionelle schlüsselwortbasierte Ansätze. Diese thematische Segmente können durch die vorherrschende Aktivität charakterisiert werden (erzählen, diskutieren, planen, ...), die durch ein neuronales Netz detektiert werden kann. Die Detektionsraten sind allerdings begrenzt da auch Menschen diese Aktivitäten nur ungenau bestimmen. Eine maximale Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten Daten theoretisch möglich. Eine thematische Klassifikation dieser Segmente wurde ebenfalls auf einer Datenbasis durchgeführt, die Detektionsraten für diesen Index sind jedoch gering. Auf der Ebene der einzelnen Äußerungen können Dialogakte wie Aussagen, Fragen, Rückmeldungen (aha, ach ja, echt?, ...) usw. mit einem diskriminativ trainierten Hidden Markov Model erkannt werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen wie Frage/Antwort­Spielen erweitert werden (Dialogspiele). Dialogakte und ­spiele können eingesetzt werden um Klassifikatoren für globale Sprechstile zu bauen. Ebenso könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz erinnern und versuchen, diese in einer grafischen Repräsentation wiederzufinden. In einer Studie mit sehr pessimistischen Annahmen konnten Benutzer eines aus vier ähnlichen und gleichwahrscheinlichen Gesprächen mit einer Genauigkeit von ~ 43% durch eine graphische Repräsentation von Aktivität bestimmt. Dialogakte könnte in diesem Szenario ebenso nützlich sein, die Benutzerstudie konnte aufgrund der geringen Datenmenge darüber keinen endgültigen Aufschluß geben. Die Studie konnte allerdings für detailierte Basismerkmale wie Formalität und Sprecheridentität keinen Effekt zeigen. Abstract Written language is one of our primary means for documenting our lives, achievements, and environment. Our capabilities to record, store and retrieve audio, still pictures, and video are undergoing a revolution and may support, supplement or even replace written documentation. This technology enables us to record information that would otherwise be lost, lower the cost of documentation and enhance high­quality documents with original audiovisual material. The indexing of the audio material is the key technology to realize those benefits. This work presents effective alternatives to keyword based indices which restrict the search space and may in part be calculated with very limited resources. Indexing speech documents can be done at a various levels: Stylistically a document belongs to a certain database which can be determined automatically with high accuracy using very simple features. The resulting factor in search space reduction is in the order of 4­10 while topic classification yielded a factor of 18 in a news domain. Since documents can be very long they need to be segmented into topical regions. A new probabilistic segmentation framework as well as new features (speaker initiative and style) prove to be very effective compared to traditional keyword based methods. At the topical segment level activities (storytelling, discussing, planning, ...) can be detected using a machine learning approach with limited accuracy; however even human annotators do not annotate them very reliably. A maximum search space reduction factor of 6 is theoretically possible on the databases used. A topical classification of these regions has been attempted on one database, the detection accuracy for that index, however, was very low. At the utterance level dialogue acts such as statements, questions, backchannels (aha, yeah, ...), etc. are being recognized using a novel discriminatively trained HMM procedure. The procedure can be extended to recognize short sequences such as question/answer pairs, so called dialogue games. Dialog acts and games are useful for building classifiers for speaking style. Similarily a user may remember a certain dialog act sequence and may search for it in a graphical representation. In a study with very pessimistic assumptions users are able to pick one out of four similar and equiprobable meetings correctly with an accuracy ~ 43% using graphical activity information. Dialogue acts may be useful in this situation as well but the sample size did not allow to draw final conclusions. However the user study fails to show any effect for detailed basic features such as formality or speaker identity
    corecore