Patents in pdf images: how to convert them for use with CAT tools? Thread poster: Silvia Barra (X)
| Silvia Barra (X) Italy Local time: 14:04 English to Italian + ...
Hi to all, I've asked this question to many colleagues but never received a satisfactory answer, in the sense that no one knew the solution. Maybe there are some news... I often translate patents, which I receive in a pdf image (often from a fax). I'd like to translate them using Trados or similar programs, but I don't know if it is possible to convert them in a useful form and how to do that. Some documents can be extracted quite successfully by OCR, but the vast majority of them can not.... See more Hi to all, I've asked this question to many colleagues but never received a satisfactory answer, in the sense that no one knew the solution. Maybe there are some news... I often translate patents, which I receive in a pdf image (often from a fax). I'd like to translate them using Trados or similar programs, but I don't know if it is possible to convert them in a useful form and how to do that. Some documents can be extracted quite successfully by OCR, but the vast majority of them can not. Programs like Infix that allow editing of pdf files are not useful with pdf images. Do you have any trick, or program or solution to this problem? Thanks in advance Silvia ▲ Collapse | | |
I am afraid that in the case of scanned images the only way is OCR. Of course the quality of the result depends on the quality of the input image. | | | Erik Freitag Germany Local time: 14:04 Member (2006) Dutch to German + ...
Dear Silvia, the obvious solution used by many translators is of course OCR. You say that the "vast majority" of your texts can't be OCRed. Why is that? If OCR doesn't help, then I can't think of another technical solution. Maybe outsourcing to a typist is a valid option for you? Regards, Erik PS.: I have very good experiences with Abby FineReader. OCR of patent texts via fax: no problem.
[Bearbeitet am 2009-01-07 14:23 GMT] | | |
Silvia Barra wrote: Hi to all, I've asked this question to many colleagues but never received a satisfactory answer, in the sense that no one knew the solution. Maybe there are some news... I often translate patents, which I receive in a pdf image (often from a fax). I'd like to translate them using Trados or similar programs, but I don't know if it is possible to convert them in a useful form and how to do that. Some documents can be extracted quite successfully by OCR, but the vast majority of them can not. Programs like Infix that allow editing of pdf files are not useful with pdf images. Do you have any trick, or program or solution to this problem? Silvia Dear Silvia, for the past 8 years now I have been using ABBYY FineReader Professional as my main OCR program and have been and still am very pleased with it. I started with version 5 and now I am using version 9 and it has kept improving over time. FineReader has the wonderful capacity to transform pdfs into editable format (Word, Excel, rtf or txt files). Therefore, FineReader would have been my main tool for transforming pdfs into something directly translatable. Nevertheless, six weeks ago I came across another program, called Able2Extract which, at least for some pdfs, works better for me than FineReader. For instance, FineReader can't or won't read protected pdfs, whereas Able2Extract can. Able2Extract is also more accurate in preserving the initial layout (together with background images) of the original pdf. Accuracy in text recognition is the strongest point FineReader has over Able2Extract, otherwise the latter would have replaced it as main pdf transformer. If both programs should fail in transforming the pdf into something editable, then I would use this workaround (being aware that it is rather cumbersome): I would print out the pdf and then I would scan the printed text with FineReader to turn it into editable text. In hope this would answer your questions, I wish you the best of luck for 2009. Bogdan
[Edited at 2009-01-07 14:23 GMT] | |
|
|
Heinrich Pesch Finland Local time: 15:04 Member (2003) Finnish to German + ... I use Abby Finereader | Jan 7, 2009 |
I scan page by page selecting only the text, without the line-numbers. After scanning I remove the line-brakes. If you are lucky the ocr will produce a file which needs no changes prior to translation. Often though the quality of the pdf is very poor (consisting of images from faxed documents). Then it is better you translate them without CAT, just typing the translation into a new document. Don't forget that the pdf is the initial document, so check especially all numbers bef... See more I scan page by page selecting only the text, without the line-numbers. After scanning I remove the line-brakes. If you are lucky the ocr will produce a file which needs no changes prior to translation. Often though the quality of the pdf is very poor (consisting of images from faxed documents). Then it is better you translate them without CAT, just typing the translation into a new document. Don't forget that the pdf is the initial document, so check especially all numbers before delivery to the customer. And don't forget to charge for the time you spend on conversion. Regards Heinrich ▲ Collapse | | | Why not get the source text directly? | Jan 7, 2009 |
If the patents originate from the US, you could get the text directly from the US Patent and Trademark Office: http://www.uspto.gov/main/patents.htm Of course, this approach won't work, if the patent has been just applied for and therefore not yet published. | | | Peter Manda (X) Local time: 08:04 German to English + ...
At least for a more recent patent, if the patent is a US patent and it has been filed with the USPTO, you can obtain the text of the patent from the website. I suspect that this same may be true of patents filed with the WPTO and at the patent office in Munich (and probably other patent agencies). I would suggest that if you really want to get a hold of a non-pdf'd copy, you (a) check with the client; and then (b) check with the issuing office. There may be a fee for obtaining the non-PDF'... See more At least for a more recent patent, if the patent is a US patent and it has been filed with the USPTO, you can obtain the text of the patent from the website. I suspect that this same may be true of patents filed with the WPTO and at the patent office in Munich (and probably other patent agencies). I would suggest that if you really want to get a hold of a non-pdf'd copy, you (a) check with the client; and then (b) check with the issuing office. There may be a fee for obtaining the non-PDF'd copy, and I think that that fee certainly is something your agency or client should cover. ▲ Collapse | | | Heinrich Pesch Finland Local time: 15:04 Member (2003) Finnish to German + ...
Peter Manda wrote: At least for a more recent patent, if the patent is a US patent and it has been filed with the USPTO, you can obtain the text of the patent from the website. I suspect that this same may be true of patents filed with the WPTO and at the patent office in Munich (and probably other patent agencies). I would suggest that if you really want to get a hold of a non-pdf'd copy, you (a) check with the client; and then (b) check with the issuing office. There may be a fee for obtaining the non-PDF'd copy, and I think that that fee certainly is something your agency or client should cover. I once used a copy of a patent from the patent office site instead of the faxed patent from the agency, but too late found out that there were changes and additions by the author. Bad mistake! | |
|
|
Silvia Barra (X) Italy Local time: 14:04 English to Italian + ... TOPIC STARTER Did not think about it! | Jan 8, 2009 |
Thank you: I did not think about the patents source! I tried with the patent I'm currently translating and I found it, so I can use my Cat tool (It's faster). I agree with Heinrich: in fact this very patent has some modified pages with respect of version on the website, but I carefully control the two versions, so no problem. As for OCRs, I'll see for AbbyFinereader. Until now I used other OCR but some documents I received w... See more Thank you: I did not think about the patents source! I tried with the patent I'm currently translating and I found it, so I can use my Cat tool (It's faster). I agree with Heinrich: in fact this very patent has some modified pages with respect of version on the website, but I carefully control the two versions, so no problem. As for OCRs, I'll see for AbbyFinereader. Until now I used other OCR but some documents I received were very difficult to read also by "human eyes", so with OCR was a disaster. Anyway, thank you for your always precious suggestions and experiences Good night. Silvia ▲ Collapse | | | Silvia Barra (X) Italy Local time: 14:04 English to Italian + ... TOPIC STARTER
Silvia Barra wrote: As for OCRs, I'll see for AbbyFinereader. I've tried Abbyy Finereader: it's a great software indeed! It recognised almost completely a pdf image file that had a bad resolution and was of bad aspect for other OCR. Thank you for the suggestion! Have a nice day (and a nice weekend)! Silvia | | | Silvia Barra (X) Italy Local time: 14:04 English to Italian + ... TOPIC STARTER Also for French documents? | Feb 2, 2009 |
Silvia Barra wrote: I've tried Abbyy Finereader: it's a great software indeed! It recognised almost completely a pdf image file that had a bad resolution and was of bad aspect for other OCR. Thank you for the suggestion! Have a nice day (and a nice weekend)! Silvia Unfortunately the trial time expired before I can testing the software skills in French. Do someone use it for that language? Thanks Silvia | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Patents in pdf images: how to convert them for use with CAT tools? TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |