I think this is a work that needs the time by some person rather than by one or several expressions. Chinese characters are not easy to define simply by a regular expression (standard regular expressions are not supported by Microsoft Word), and sometimes blank spaces are meaningful such as in the title:
第十四回 林如海捐馆扬州城 贾宝玉路谒北静王
Besides Chinese characters, spaces, digits, English letters and common English punctuation marks, there are also Korean/Japanese characters, non-standard/double-byte symbols. You will need to judge and handle spaces involved separately.
Iris Kleinophorst wrote:
Hi
does anyone know a tool, function or regex to delete unnecessary spaces between Chinese characters resp. Chinese characters and Arabic numbers/Latin letters, e.g. in scanned PDF files? That is, a document with numerous other expressions where the spaces have to be kept, so that search and replace of spaces in Word does not work?
TIA
Iris
▲
Collapse