Problems with project files and segmentation
Thread poster: Viivi
Viivi
Viivi
Finland
Local time: 20:14
Aug 17, 2010

Hello again!

I am a beginner in terms of OmegaT, and I apologise for my lack of tech-savviness. Anyway, I have now managed to get the glossaries to work and did a short test translation, which worked just fine.

Now I have a doc-file with lots of text boxes, pictures and whatnot. I converted it to an odt-file in Open Office Writer, created a new project in OmegaT and this is roughly what I get:

< f0 >U< /f0 >< f1 >n< /f1>< f2>a< /f2>< f3>u< /f3>< f4>t< /f4><
... See more
Hello again!

I am a beginner in terms of OmegaT, and I apologise for my lack of tech-savviness. Anyway, I have now managed to get the glossaries to work and did a short test translation, which worked just fine.

Now I have a doc-file with lots of text boxes, pictures and whatnot. I converted it to an odt-file in Open Office Writer, created a new project in OmegaT and this is roughly what I get:

< f0 >U< /f0 >< f1 >n< /f1>< f2>a< /f2>< f3>u< /f3>< f4>t< /f4>< f5>h< /f5>< f6>o< /f6>< f7>r< /f7>< f8>i< /f8>< f9>z< /f9>< f10>e< /f10>< f11>d < /f11>< f12>c< /f12>< f13>ha< /f13>< f14>ng< /f14>< f15>e< /f15>< f16>s < /f16>< f17>o< /f17>< f18>r < /f18>< f19>mo< /f19>< f20>d< /f20>< f21>ifi< /f21>< f22>c< /f22>< f23>a< f24>ti< /f24>< f25>o< /f25>< f26>n < /f26>< f27>t< /f27>< f28>o < /f28>< f29>t< /f29>< f30>h< /f30>< f31>i< /f31>< f32>s < /f32>< f33>s< /f33>< f34>ys< /f34>< f35>t< /f35> ...

(I had to add some spaces after the < so you can see what it looks like to me.)

That particular segment/sentence is supposed to be about unauthorized changes etc. It is obvious I cannot translate like this.

Does anybody have any ideas what is wrong with my documents/project?

I am using OmegaT 2.1.7_1. The source language is EnUS and target language Finnish. I have four doc-files, which I have converted into odt-files. I have not yet tried to add glossaries or translation memories to the project. I simply added the project files. I am assuming the source files are too fancy somehow...? I suspect they might even have been originally something else than Word documents.

Is there any way to translate these with the help of OmegaT?
Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 20:14
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Nothing wrong with OmegaT, actually Aug 17, 2010

Is your file really .doc? The text looks typical to .docx (MS Word 2007), which is a really lousy format. If you can obtain the original .docx file, do it and try translating with the latest build of OmegaT (1.8.0). Otherwise, if you have Microsoft Office, try exporting the file to RTF and converting back to .doc, then to ODT.

Also search the Yahoo! group of OmegaT, the topic of tag reduction has been discussed.


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 13:14
Russian to English
+ ...
A couple of notes Aug 17, 2010

These things are known in the trade as "tag soup." Apart from what esperantisto wrote, let me add that you also get a lot of this junk if a document has been converted from a PDF--especially when there are lots of graphic elements, text boxes, etc., which yours has. Note that esperantiso was referring to build 2.1.8.0--he left out the 2, which might confuse you.

If your document is not a .docx, but rather a conversion from PDF, and esperantisto's instructions don't help, you should
... See more
These things are known in the trade as "tag soup." Apart from what esperantisto wrote, let me add that you also get a lot of this junk if a document has been converted from a PDF--especially when there are lots of graphic elements, text boxes, etc., which yours has. Note that esperantiso was referring to build 2.1.8.0--he left out the 2, which might confuse you.

If your document is not a .docx, but rather a conversion from PDF, and esperantisto's instructions don't help, you should ask the client for the original file. (This is one reason that people charge extra for translating from PDFs.) Even though I have ABBYY PDF Converter, which does a pretty good job, I have found that converting PDFs is no good for translating in a CAT tool, because of the tag soup. I usually strip it down to "text" (via Adobe Reader), translate it, and then reformat it. But something with as many graphics as yours has would be quite time consuming for me, given my level of expertise with Word/OOo Writer.

good luck
Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 20:14
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
The last resort Aug 17, 2010

If no fancy formatting is required, reset everything to the style default formatting (select text, press Ctrl+M in OOo Writer).

 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Problems with project files and segmentation






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »