Ambrose Li
Typefaces used: Truculenta Condensed, Barriecito and Lato

Apple’s “Sin-Ji” voice is a case of cultural insensitivity

(updated )

In MacOS X or at least versions of MacOS X that I’ve used the Cantonese voice is a female voice named “Sin-Ji”.

The voice speaks pretty decent Cantonese, but it can’t speak English, which makes the voice pretty useless let’s face it, in today’s world it’s not possible to avoid using at least a few English words. The voice also can’t handle phonetic input,‍[Note 1] so we can’t even input English as phonetics.

Many Cantonese words also have no established orthography. Without support for phonetics, the voice is pretty useless even if sticking to pure Cantonese were feasible.


Then there is the name: What kind of name is “Sin-Ji”?

It doesn’t look Cantonese.

It doesn’t sound Cantonese if you played the sample sentence.

In Hong Kong the embodiment of the Cantonese-speaking place Ji isn’t even likely in a name, unless we’re talking about a Mandarin name.

The name “Sin-Ji” has always struck me as out of place: Every voice Apple provides has a name typical of or at least plausible in‍[Note 2] the language the voice speaks, except their Cantonese voice which has a typical Hong Kong accent but is given a mysterious, undecipherable name that’s neither Cantonese nor typical of Hong Kong.


Today it dawned on me: “Sin-Ji” is the Japanese pronunciation of Cindy.‍[Note 3]

Cindy, of course, is a pretty common name in Hong Kong. Had Apple named the voice “Cindy” instead (especially if they had made it capable of speaking English I don’t care if it’s good UK English or English with a heavy accent), that name would have come off as perfectly appropriate.

I cannot understand why Apple failed to see why “Sin-Ji” is wrong; it’s wrong on so many levels:

  1. “Cindy” doesn’t look Asian, therefore it must be spelt “Sin-Ji” even though you’d not find a “Sin-Ji” in Hong Kong.
  2. All Asians are the same, therefore an English name used in a bilingual Cantonese/English-speaking place should be romanized as Japanese.
  3. All Japanese romanization systems are the same, therefore “Sin-Ji” is fine even though as Japanese it should be spelt either Sin-Di (Kunrei) or Shin-Ji (Hepburn).
  4. The voice is “Cantonese”, therefore its inability to speak English must be fine even though any realistic use case will require synthesizing a mix of Cantonese and English.
  5. If a word can’t be written in Chinese characters, not being able to synthesize it must be fine even though tons of Cantonese words have no established orthography.‍[Note 4]

If this is not cultural insensitivity I don’t know what it is.

Okay, maybe it’s bad requirements analysis.

Notes

  1. Not in any standard transcription but in Apple’s own idiosyncratic system. See Apple Computer, Inc., “Techniques for Customizing Synthesized Speech,” last modified September 5, 2006, https://​developer​.apple​.com/​library/​archive/​documentation/​User​Experience/​Conceptual/​Speech​Synthesis​Programming​Guide/​Fine​Tuning/​Fine​Tuning​.html#//​apple​_​ref/​doc/​uid/​TP40004365​-CH5​-SW6, Table 3-1.
  2. There’s no such thing as a typical name in Chinese or Korean (they’re constructed like Indigenous or Anglo-Saxon names), but the names Apple gave to the two Mandarin voices, Ya-Ling and Ting-Ting, are both plausible (e.g., 雅玲 (*Yălíng) ‘Elegant-Bright’, 婷婷 (Tíngting) ‘Graceful’).
  3. Sin (シン, usually romanized as shin) is the Cin- in Cindy. Ji (ジ) would be the -dy in Cindy. The di sound does not actually exist in Japanese, but on the d row the letter at the i position (ヂ, in theory di and romanized as di in Kunrei) is pronounced ji.
  4. Apple’s synthesizer also mispronounces words that have multiple pronunciations. Even if a word can be written there’s no guarantee you’ll get correct synthesized speech.