Look at the "grow" heuristic in the slides.This is a copy of our posting in the LinkedIn group "Translation agencies bad practices".Is there some simple way (either heuristically or by modifying the model either one is fine) where weĬould break the independence assumption in Model 1 and allow theĪlignment of a word at position j to be influenced by the word at.How could we force cognates (for a language pair like French/English) to be aligned correctly? (Warning, this is a trick question).What is a short formula which determines the Viterbi alignment of a fixed sentence pair E and F? Suppose you are given Model 1 parameters estimated by someone else.Under what circumstances would we prefer that an English word e is unaligned (note that this question is about gold standard word alignment, not modeling)?Īdvanced Questions about Model 1 (Optional).Under what conditions will an English word e in a particular sentence pair be left unaligned in the Viterbi alignment? What about a French word f?.Think about whether any of the entries in t will not be used. Can you think of a way to initialize that would involve setting some of the parameters in t(e|f) to zero or any other constant without affecting the results? Remember that if N is the number of English types, then t(e|f)=1/N for all e and f.How many entries does t(e|f) have after the initialization (line 1 of the pseudo-code)?.What is the alignment structure modeled by IBM Model 1 in the pseudo-code presented above? Is the structure symmetric with respect to English and Foreign?.You have to start with the t values on slide 41 to do this, and you apply them to just the pair of two word sentences on slide 41. Start by convincing yourself that the incredibly simple estimation you do by running the main loop of the pseudo-code once gives the same results as explicitly enumerating the alignments in slide 41 (the slide where we calculated counts by working on four alignment functions by explicitly enumerating each one). What is this file for? How should you modify it if you switch language directions (translating German to English)? How much support for segmenting and fuzzy matching is there in German or other languages that interest you (see the OmegaT manual)? Compare this with support for segmentation and fuzzy mapping in English. IMPORTANT: look at the mytest-omegat.tmx file located in the main project directory and discuss its contents.
Make sure to use proper punctuation, OmegaT knows how to segment English sentences. Go to the main directory of the project, then the source subdirectory of the project and create a text file called "text1.txt" containing 5 sentences in English (you could use the ones from the Google Translate exercise if you have them).Make a note of where the project was created (the path on disk). Make the source language EN-US or EN-GB (depending on whether you prefer to write in American or British English). Create a new project (see the "instant start" guide to OmegaT, Chapter 2 of the manual, you can find a direct link in Google), call the project "mytest" (without quotes).Do you get the same output as before? Or were your corrections partially or fully adopted?.Take the 5 sentences for which you got bad output from Google Translate.In this part we will again look at Google Translate for the sentences you corrected in exercise 1.Third part: do a basic exercise and discuss some basic questions about Model1 (and, optionally, some harder questions) Second part: do a small translation job using OmegaT Exercise 2 - OmegaT and IBM Model 1 Exercise 2 - OmegaT and IBM Model 1