UTF8 range for Chinese (Localization)

Midahalo ya kiufundi » Localization »
UTF8 range for Chinese
Track this topic

UTF8 range for Chinese

Uwekaji wa uzi: Samuel Murray

Samuel Murray

Uholanzi
Local time: 12:54
Mwanachama(2006)
Kiingereza hadi Kiafrikana
+ ...

May 4, 2021

Hello everyone

I have a file in which some segments contain Chinese characters. I need to identify these segments, so I'm hoping I can use a search for the specific Unicode characters that are Chinese. Can anyone clarify for me what is the UTF8 character range for Chinese characters?

Thanks
Samuel

Added: found it, under "CJK scripts and symbols" here:
https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane

However, I discovered that searching for the presence of all of these characters would be very inefficient, so instead I converted all my source text to one character per line and then removed duplicate lines, to get a list of all characters used in the source text. Then I just deleted non-Chinese characters, and thus had a much smaller list of characters to search (and no need to search hexadecimally either).

[Edited at 2021-05-04 11:00 GMT] ▲ Collapse

esperantisto

Local time: 13:54
Mwanachama(2006)
Kiingereza hadi Kirusi
+ ...

SITE LOCALIZER

My range

May 4, 2021

Here is the range that I use (even though you have already found, maybe, it will be handy):

Code:

⺮-𰻞

Samuel Murray

Uholanzi
Local time: 12:54
Mwanachama(2006)
Kiingereza hadi Kiafrikana
+ ...

KIANZISHI MADA

@Esperantisto

May 4, 2021

esperantisto wrote:
Here is the range that I use (even though you have already found, maybe, it will be handy)...

Thanks, I'll give that a try as well (then I can use regex).

As it happens, my source text contained only about 1000 distinct Chinese characters, so testing for each of them one by one across 2000 segments was doable and took about 20 seconds only (not including the time it took to script it in AutoIt, of course). I'm curious if a regex approach would be quicker (not counting preprocessing time).

LIZ LI

Uchina
Local time: 18:54
Kifaransa hadi Mandarini/Kichina
+ ...

@Samuel

May 4, 2021

Here's a free conversion page between UTF8 & Chinese:
https://www.ip138.com/utf8/

Copy & paste the source for Chinese > UTF8 in the UPPER dialog box, then click the 1st green button below;
OR
Copy & paste the source for UTF8 > Chinese in the LOWER dialog box, then click the 2nd green button below.

If you want to do it manually, you may also try http://www.mytju.com/classcode/tools/encode_utf8.asp

[Edited at 2021-05-04 13:24 GMT]

[Edited at 2021-05-04 13:24 GMT] ▲ Collapse

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Msimamizi(wa) wa mdahalo huu
Maya Gorgoshidze	[Call to this topic]
Mahmoud Akbari	[Call to this topic]

You can also contact site staff by submitting a support request »

UTF8 range for Chinese

Forum rules

Help and orientation

Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators. Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way. More info »

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Uwekaji wa hivi punde | MMM | Masharti | Wasimamizi | Fahamumsingi ya makala.

Your current localization setting

Kiswahili

Select a language

More languages...

UTF8 range for Chinese

UTF8 range for Chinese

You have native languages that can be verified

Your current localization setting

Select a language