Al Jazeera In 1000 Academic Studies

178. Name: Ahmed M.A. Ali Title: Multi-dialect Arabic Broadcast Speech Recognition (Al Jazeera as a Source) Institution: University of Edinburgh Country: United Kingdom Date: 2018 Language: English Abstract: Dialectal Arabic speech research suffers from the lack of labelled resources and standardized orthography. This thesis is concerned with the following three contributions: 1) Arabic Dialect Identification: We are mainly dealing with Arabic speech without prior knowledge of the spoken dialect. We have two contributions: First, we use crowdsourcing to annotate a multi-dialectal speech corpus collected from Al Jazeera TV channel. We obtained utterance level dialect labels for 57 hours consisting of four major varieties of dialectal Arabic, comprised of Egyptian, Levantine, Gulf or Arabic peninsula, North African or Moroccan from almost 1,000 hours. Second, we build an Arabic dialect identification (ADI) system. 2) Arabic Speech Recognition: We built Arabic automatic speech recognition (ASR) and we create an open research community to advance it. This section aims to creating a framework for Arabic ASR that is publicly available for research, and build a robust Arabic ASR system and reporting a competitive word error rate (WER). 3) The third part regards evaluating dialectal speech with no orthographic rules. Our methods learn from multiple transcribers and align the speech hypothesis to overcome the non-orthographic aspects.

199

Made with FlippingBook Online newsletter