Deleting spaces in Chinese word documents
Thread poster: Iris Kleinophorst
Iris Kleinophorst
Iris Kleinophorst  Identity Verified
Germany
Local time: 11:41
Chinese to German
+ ...
Oct 14, 2013

Hi

does anyone know a tool, function or regex to delete unnecessary spaces between Chinese characters resp. Chinese characters and Arabic numbers/Latin letters, e.g. in scanned PDF files? That is, a document with numerous other expressions where the spaces have to be kept, so that search and replace of spaces in Word does not work?

TIA
Iris


 
lbone
lbone  Identity Verified
China
Local time: 17:41
Member (2006)
English to Chinese
+ ...
regex case by case Oct 14, 2013

I think this is a work that needs the time by some person rather than by one or several expressions. Chinese characters are not easy to define simply by a regular expression (standard regular expressions are not supported by Microsoft Word), and sometimes blank spaces are meaningful such as in the title:

  第十四回 林如海捐馆扬州城 贾宝玉路谒北静王

Besides Chinese characters, spaces, digits, English letters and common English punctuation marks, t
... See more
I think this is a work that needs the time by some person rather than by one or several expressions. Chinese characters are not easy to define simply by a regular expression (standard regular expressions are not supported by Microsoft Word), and sometimes blank spaces are meaningful such as in the title:

  第十四回 林如海捐馆扬州城 贾宝玉路谒北静王

Besides Chinese characters, spaces, digits, English letters and common English punctuation marks, there are also Korean/Japanese characters, non-standard/double-byte symbols. You will need to judge and handle spaces involved separately.

Iris Kleinophorst wrote:

Hi

does anyone know a tool, function or regex to delete unnecessary spaces between Chinese characters resp. Chinese characters and Arabic numbers/Latin letters, e.g. in scanned PDF files? That is, a document with numerous other expressions where the spaces have to be kept, so that search and replace of spaces in Word does not work?

TIA
Iris
Collapse


 
Lawrence Lam
Lawrence Lam  Identity Verified
China
Local time: 17:41
English to Chinese
+ ...
wildcards Nov 2, 2013

Iris Kleinophorst wrote:

Hi

does anyone know a tool, function or regex to delete unnecessary spaces between Chinese characters resp. Chinese characters and Arabic numbers/Latin letters, e.g. in scanned PDF files? That is, a document with numerous other expressions where the spaces have to be kept, so that search and replace of spaces in Word does not work?

TIA
Iris


Finding and replacing characters using wildcards.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Deleting spaces in Chinese word documents






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »