03 - Translation Alignment and Machine Learning for Classical Languages

Translation Alignment and Machine Learning for Classical Languages

This presentation will focus on the topic of assisted Translation Alignment for the study of Classical and historical languages. Translation Alignment is an important task in Natural Language Processing (Kay and Roscheisen 1993), and aligned corpora are an especially crucial resource to train computational models (Dou and Neubig 2021), but also for language learning (Palladino et al. 2021), translation studies (Véronis 2000; Pataridze-Kindt 2018), and automatic bilingual lexicon extraction (Yousef 2019). Ancient languages, however, have their separate sets of issues when it comes to establishing translation equivalents and training language models.

We will introduce the Ugarit Translation Alignment Editor, a web environment that facilitates the creation of manually aligned parallel texts at word and sentence level. The project currently hosts 45 languages, including Ancient Greek, Latin, Persian, Arabic, Egyptian, Akkadian, Hebrew, Hittite, and Chinese, and has more than 500 users. The underlying database collects all the translation pairs into a dynamic lexicon available to all users, who can search for a term in the interface and retrieve all available aligned words with their original context.

Ugarit expanded the use of Translation Alignment to communities previously underrepresented in this field, and it is currently used across the world for language teaching, research, translation studies, data visualization, and cultural and lexical studies (Palladino and Yousef 2023; Shamsian 2022; Crane 2019; Shukhoskvili 2017). In the course of the presentation, we will illustrate the various functions of the tool and discuss some of the current uses of parallel corpora for the study of Classical texts. Finally, we will describe how the aligned corpora in Ugarit are being used to train a contextualized multilingual model to perform Translation Alignment of Ancient Greek texts automatically. In the conclusions, we will address current challenges and issues, and describe future plans.


Presenters

Chiara Palladino, Furman University; Anna Muh, University of Washington



  SCS-18