Publications by Soha Sultan
2011
Systems Group Master's Thesis, no. 11 ; Department of Computer Science, May 2011
Supervised by: Prof. Donald Kossmann
Supervised by: Prof. Donald Kossmann
We introduce two approaches to augmenting English-Arabic statistical machine translation
(SMT) with linguistic knowledge. The first approach improves SMT by adding linguistically
motivated syntactic features to particular phrases. These added features are based on the English
syntactic information, namely part-of-speech tags and dependency parse trees. We achieved
improvements of 0.2 and 0.6 in BLEU score on two different data sets over the state-of-the-art
SMT baseline system. The second approach improves morphological agreement in machine
translation output through post-processing. Our method uses the projection of the English dependency
parse tree onto the Arabic sentence in addition to the Arabic morphological analysis
in order to extract the agreement relations between words in the Arabic sentence. Afterwards,
classifiers for individual morphological features are trained using syntactic and morphological
information from both the source and target languages. The predicted morphological features
are then used to generate the correct surface forms. Our method achieves a statistically significant
improvement over the baseline system according to human evaluation.
@mastersthesis{abc, abstract = {We introduce two approaches to augmenting English-Arabic statistical machine translation (SMT) with linguistic knowledge. The first approach improves SMT by adding linguistically motivated syntactic features to particular phrases. These added features are based on the English syntactic information, namely part-of-speech tags and dependency parse trees. We achieved improvements of 0.2 and 0.6 in BLEU score on two different data sets over the state-of-the-art SMT baseline system. The second approach improves morphological agreement in machine translation output through post-processing. Our method uses the projection of the English dependency parse tree onto the Arabic sentence in addition to the Arabic morphological analysis in order to extract the agreement relations between words in the Arabic sentence. Afterwards, classifiers for individual morphological features are trained using syntactic and morphological information from both the source and target languages. The predicted morphological features are then used to generate the correct surface forms. Our method achieves a statistically significant improvement over the baseline system according to human evaluation.}, author = {Soha Sultan}, school = {11 }, title = {Applying Morphology to English-Arabic Statistical Machine Translation}, year = {2011} }