137. Name: Motaz Saad Title: Mining Documents and Sentiments in Cross-lingual Context (Sources: Wikipedia, Euronews, BBC and Al Jazeera) Institution: Université de Lorraine Country: France Date: 2015 Language: French Abstract: The aim of this study is to investigate the sentiments in comparable documents. The data used in this research was collected over two stages. First, the researcher collected English, French and Arabic language comparable corpora from Wikipedia and Euronews websites. He aligned each corpus from these two sources at the document level. He further gathered English and Arabic news documents from local and foreign news agencies. The English language documents were collected from BBC website and the Arabic language documents were collected from Al Jazeera Arabic website. At the analysis level, the study presented a cross-lingual document similarity measures to automatically retrieve and align comparable documents. Then, the researcher employed a cross-lingual sentiment annotation methodology to label the source and the target documents with the identified sentiments. Finally, the study used statistical measures to compare the agreement of sentiments in the source and the target pair of the comparable documents. The methods used to analyze documents in this thesis are completely language independent. Although the focus was particularly on English, French and Arabic languages, these methods can be applied to any language pair.
Made with FlippingBook Online newsletter