Hunting down the similar words between Japanese & Turkish

It is a common myth in Turkey that the Japanese and the Turks shared a common root at some point in history. There are several reasons for this belief: the historical proximity of pro-Turkic tribes to the Japanese is an important reason, and the similarity of the languages is also an argument. There were even some linguists who proposed that Asian languages had a common root, known as the Altaic language theory. However, this theory has been abandoned by the linguistic community, at least to my knowledge. The contemporary perspective is that the two languages don’t share a common root.

However, there are some strange similarities between the two languages. There is even a book[7] aimed to compare Japanese-Turkish words to establish the link between them. Like many people, I wanted to learn a new language during the pandemic. Intrigued by this topic, I opted to learn Japanese, and experience the language. Learning Japanese is entertaining at the beginning, and the difficulty increases exponentially. Nonetheless, I have observed peculiar similarities between languages. It is worth investigating the similarities between the two languages and checking the claims.

Disclaimer: I am not a linguistic expert, and my Japanese knowledge is limited. However, I have access to reliable sources to investigate Turkish etymology. I can only give insights on one end - Turkish, as the Japanese language remains a mystery to me. If you are Japanese, a linguist, or familiar with both languages, kindly note that my opinions are those of a person with computer science background. If you have any ideas or corrections, do not hesitate to share them with me.

Arguments supporting the claim: #

1. Chinese influence on old Turkish and Japanese #

Japanese has an extensive set of characters known as Kanji. Kanji are the characters adapted from Chinese scripts. There are approximately 3000 Kanji characters, and various methods by which the Japanese have integrated these characters into their language, ultimately making them a core part of Japanese. Even though Turkish doesn’t use Kanji, it wouldn’t be surprising if Chinese influence was present in old Turkish. Therefore, some words mentioned in this blog are more likely to have Chinese roots rather than Japanese.

2. Similarities in Japanese and Turkish culture #

Turks and Japanese share cultural similarities, as highlighted in this Quora response[8]. However, it’s important to note that these similarities do not necessarily mean that the two nations are fundamentally the same. Turkish culture is a blend of various cultures from across Eurasia. It is possible that these similarities are merely coincidental, considering the numerous differences between the two cultures.

Counter-arguments against the claim: #

1. Lack of historical evidence #

The historical record provides limited evidence to support significant interaction between these two nations. While a relationship might have existed centuries ago, the absence of documentation makes this hypothesis speculative. Without more corroborative artifacts or accounts, assertions of past connections remain unverified and require further empirical investigation.

2- Lots of dissimilarities between languages #

This is a strong stance in the discussion. A person familiar with one language and learning the other can easily see that the differences far outweigh the similarities. Even if there was a common ground, it has long since been abandoned. The similarities have vanished over time, and the languages have diverged significantly. Languages are constantly evolving, transforming so much over the centuries that speakers from different eras would struggle to understand each other.

Dataset #

I used two dictionaries to bridge the gap between languages: English-Japanese and English-Turkish. The reason for this approach is that an open-source Japanese-Turkish dictionary is not available. Thus, English is the intermediary language between two languages.

To briefly introduce those dictionaries:

  • The EN-TR dictionary contains 38,249 pairs, including sayings/phrases and word pairs. Some English words have multiple Turkish equivalents, which are either synonyms or closely related terms.

  • The EN-JP dictionary consists of 13,712 pairs, with a portion dedicated to sayings and phrases. Only a small number of English words have more than one Japanese equivalent. It appears that the dictionary’s authors did not include synonyms, resulting in mostly one-to-one mappings.

Data preparation #

To identify similar words, the initial step is to match semantically equivalent words. I used English as the intermediary language, pairing it with Turkish and Japanese to create a roughly aligned JP-TR dictionary.

In Japanese, most of the alphabet consists of syllables, with exceptions being vowels i, e, o, u, a and the consonant n. Turkish also uses syllables, but they serve as an intermediary stage between letters and words.

The Turkish alphabet differs from the Latin alphabet with vowels ö, ü, ı and consonants ç, ş, ğ, while lacking the letters w, q, x. The letter ş can correspond to the Japanese letter し (shi), and ç can correspond to ち (chi).

What type of words we are looking? #

Historical records indicate that Turkic communities migrated from East Asia between the 6th and 11th centuries, known as the Turkic migration. This factor must be taken into account when filtering out words, as definitions from later periods are likely to differ. We will focus terms related to the sun, world, earth, humans and societal relationships, primitive warfare equipment, animals, water, and similar concepts. For instance, technological terms won’t match. If they do, it’s due to Western influence on both languages. Examples of this include cosmetic or chemical terms.

Methodology #

Let’s move on to sequence matching. Over the years, this concept has been explored across various disciplines, including bioinformatics. Scientists in this field aim to find similarities between DNA sequences using a straightforward yet powerful technique known as Global Sequence Alignment. This dynamic programming method, developed by Saul B. Needleman and Christian D. Wunsch, involves creating a matrix with paired characters along the x and y axes. Each cell is then filled by examining the left, top, and top-left diagonal cells. A gap score is added to the left and top values, while the diagonal score determines if there is a match between character pairs.

After filling all the cells, we will start tracing back. Starting from the bottom right; the direction will be the highest score on up, left, or diagonal up left. For each step, the highest score will be the direction. If a diagonal has a higher score, there is a character pair. If not, we move with dashes for the respective sequence. The traceback finishes when we reach the top-left.

Let"s make an example with words karaくらい (kurai).

Scores for:

match = 1, mismatch = -2 gap = -1

We put kuroi on the x-axis and kara on the y-axis. Each time you move left in the first row, you insert a gap, so -1 is added to the value on its left. The same process applies to the first column.

K U R O I
0 -1 -2 -3 -4 -5
K -1
A -2
R -3
A -4

Let’s examine the second row, second column:

K
0 -1
K -1 X

We consider the values from the top, left, and top-left diagonals. Recall that moving up or left indicates a gap, while a diagonal move involves a letter comparison:

  • Up: -1 - 1 = -2
  • Left: -1 - 1 = -2
  • Diagonal (both characters are K - match): 0 + 1 = 1

Since the diagonal match yields the highest value, the maximum value is 1. Therefore, we fill that cell with 1:

K
0 -1
K -1 1

Applying the same logic to the second row, the scores are as follows. Since there are no subsequent letters that match K, we decrease each cell score by one across the row:

K U R O I
0 -1 -2 -3 -4 -5
K -1 1 0 -1 -2 -3
A -2
R -3
A -4

Same thing for the third row:

K U R O I
0 -1 -2 -3 -4 -5
K -1 1 0 -1 -2 -3
A -2 0 -1 -2 -3 -4
R -3
A -4

In the third row, R will match, so we increment the score for that cell while decreasing the rest:

K U R O I
0 -1 -2 -3 -4 -5
K -1 1 0 -1 -2 -3
A -2 0 -1 -2 -3 -4
R -3 -1 -2 0 -1 -2
A -4

And the final table looks like this:

K U R O I
0 -1 -2 -3 -4 -5
K -1 1 0 -1 -2 -3
A -2 0 -1 -2 -3 -4
R -3 -1 -2 0 -1 -2
A -4 -2 -3 -1 -2 -3

The table is complete. Let’s trace back:

K U R O I
X
K
A
R
A

Each leftward movement on the x-axis represents a dash in the word from the column. Each upward movement represents a dash in the word from the row. The result is:

–AR-AK = KA-RA–

IO-RU-K = K-UR-OI

That looks much better! By employing this method, we achieved pairwise matching. We will repeat this process for every possible word pair. To finalize, we need to convert these matchings into a score to identify the best matches. A simple approach would be to use length of matches / length of sequence to generate a score.

This method resulted in fairly accurate matches. However, some words in Japanese and Turkish are borrowed from Western languages. To filter these out, I applied the algorithm to English-Turkish and English-Japanese pairs. When a match occurred between Japanese and Turkish words, I filtered them out if they matched the English word at a rate above 60%. This process removed Western words like caffeine, catamaran, and tomography, which originate from foreign languages.

The one missing element, in my opinion, is the Chinese connection. Unfortunately, I lack knowledge of Chinese, which means I can’t effectively filter out words with Chinese origins.

Remarks #

When you see a dash (-) in words, it is pronounced as “-mek; -mak,” which means “to.” For example, yap means “do,” and yap- (yapmak) means “to do.”

Only words with Turkish roots are considered. Words with origins in Arabic, Persian (quite common), French, or Latin (less common) are excluded.

The explanations below do not confirm or prove the connection. They merely show a strong similarity based on the algorithm, sufficient for me to remove irrelevant ones, and so forth.

RESULTS #

şimdi(imdi) - 今(ima) #

meaning: now

The word şimdi is two words “” + “imdi” which translates to “just now”. The “imdi” root has another root “amtı” or “amdı” which has a shorter word “am”. The “am” root is not used in modern Turkish, at least to say “now”, although some sources indicate it has been used in other Turkish dialects. The Japanese 今(ima) match semantically and they are very close vocally.

First occurence: Orkhon inscriptions (A.D. 735) - ilim amtı kanı (Where is my country now)

kanı & 感情 (kanjou) #

Meaning: sentiment

The word kanı has a root “kan-”, meaning to believe; to satisfy. The Japanese 感 (kan) means feeling, sensation, or admiration. They are semantically and phonetically similar.

First occurence: Türkische Turfantexte 1-9, Before A.D. 900 - közünürteki küsüşleri kandı ((his) desires in this world has satisfied.)

koyu & 濃い (koi) #

Meaning: dark or deep (colour)

This is known as similarity, semantically equivalent, and phonetically close. It is a strong match

hun & 本 (hon) #

Meaning: book

hun” is not used as book in modern Turkish, and it was fairly hard to verify this Turkish word. In Japanese, it is one of the earliest words you will learn and use extensively. I could only find it in the dictionary I used and an online source, that was all.

tanrı & 天父 (tenshu) | tanrısal & 天来 (tenrai) #

Meaning: (1) divine (2) god

I assume there are other Asian languages that use similar vocals to define celestial powers. For example, in Chinese, it is 上帝 (Shàngdì). The actual word is tengri in old Turkish, it has softened to tanrı. It is another strong match.

tomdaş & 友達 (tomodachi) #

Meaning: friend

“tomdaş” does not have a presence in modern Turkish, instead they say “arkadaş, yoldaş”. Some dictionaries have it as a reference. The Japanese word has a presence in modern Japanese.

dost & 同志 (doushi), 同士 (doushi) #

Meaning: close friend

This word is another good one, having the meaning close friend in both languages, but this word might not be a correct match. It is because the known root of this word is guessed to be (old) Persian, which means maybe Turkish people learned this word after they moved from East Asia. It is possible this is an Avestan word, an old Persian language that dates back to A.D. 200-400

First occurence: Kutadgu Bilig, 1069 - öz asġı tiler dostḳa birme köŋül (Don"t give your heart to a friend who takes care of his/her own interests)

tartış- & 戦い (tatakai), 戦う (tatakau) #

Meaning: to discuss/debate/conflict (Turkish), to fight or war (Japanese)

The Turkish word has the root “tart-”, meaning to measure. The Japanese word is used as battle, war, or to fight, to compete. The words are used in different contexts, although they are semantically close and phonetically similar. This is another mediocre link.

First occurence: Kaşgarî, Divan-i Lugati"t-Türk, 1073 - ol meniŋ birle ya tartışdı (He discussed/competed with me on bow streching) Another occurence: Kul Mes"ud, Kelile ve Dimne terc., Before A.D. 1347 - bu dartışık bunlar arasında ulaldı, savaşa başladılar. (The dispute between those, they started to fight)

terazi & 釣り合い (tsuriai) #

Meaning: balance - scale

This word is used in daily Turkish. It is derived from “tart-” too.

First occurence: Albert von Le Coq, Türkische Manichäica aus Chotscho, 1919 - “tanmış üzütüg tutupan tarazuk içinte olgurtur tiyür.”

art & 後 (ato) #

Meaning : back - rear

It is used in daily Japanese and also modern Turkish (art arda, ardıl, ardından)

First occurence: “Divan-i Lugati"t-Türk” A.D. 1073

yansı- & 反射 (hansha) #

Meaning: reflect

They are weakly connected but worth mentioning. The root of it is “yan-” or “yanıt” which means to return (back) and response. There is also a verb which is 射す(sasu) - 差す (sasu), which means to shine. What I found interesting, is another word called “yansıla-” means go against, and 反(han) alone means anti, there are also words that utilizes 反(han) in a sense to express being against, for example, 反対(hantai) means opposition. I believe this is open to discussion, again weakly connected.

First occurence: Kutadgu Bilig, A.D. 1069 - cevāb birse sözke yanut kılsa söz

kok- & 嗅ぐ (kagu) | koku & 芳香 (houkou) #

Meaning: (1) smell and (2) to smell

This word is derived from “kok-”, which means to smell, the Japanese word 嗅ぐ (kagu) is also quite similar to it

First occurence: Divan-i Lugati"t-Türk, 1073 - et ḳōḳdı (the meat smell)

koru- & 囲う (kakou) #

Meaning: to protect(Turkish), to enclose;to surround(Japanese)

This is a mediocre match, although it is close enough to mention. As far as I understand, while the Japanese word is used more physically, the Turkish word is used in abstract meaning as well.

First occurence: Orkhon inscriptions, A.D. 735 - “korıgu éki üç kişilegin tezip bardı” (the guard fled with two or three men)

korku & 恐怖 (kyoufu) #

Meaning: fear

The word korku is derived from “kork-”, which means “to fear”. The root of “kork-” is coined as “koru-” (to protect). Well, the one who needs to be protected has something to fear? Surely, a nice link. The Japanese 怖 (kowa) means frightening, 怖い(kowai) means scary, 恐れる(osoreru) means fear. None of them links up with “to protect”, so it is only a mediocre-weak connection.

First occurence: Orkhon inscriptions, A.D. 735 - “öküş tiyin neke korkur biz” (Why are we afraid of their crowd?)

yabani & 野蛮 (yaban) #

Meaning: savage, wild

A direct example where you can find other sources on the internet. Both mean savage and has the same phonetics. The “yaban” word has a synonym “vahşi” which is also encountered in the script.

First occurence: Kutadgu Bilig, A.D. 1069 - yā vahşī bolup men biyābānda yügrü (or I will be a savage and walk in wilderness)

okur & 読者 (dokusha) #

Meaning: reader

This word is used in modern Turkish. The root of it is “oku-”, meaning “to read”.

First occurence: Albert von Le Coq Türkische Manichäica aus Chotscho, 1919 (source before 900) - bu emig iki kata okıyu tegintim. (I tried to read this magic twice)

uy- & 応 (ou) | uygun & 応分 (oubun) #

Meaning: to suit, to fit

Another type of word that matches, although semantic meaning differs a little. In Japanese, 応 (ou) means agreement while “uy-” means to fit. 応分 (oubun) means according to one"s abilities; appropriate; reasonable​, a close match to “uygun”.

First occurence: Şine-Usu[4][5] - ben Seleŋge keçe uḏu yorıdım. (I crossed the Selenge river and walked afterward)

soğu- & 寒 (samu), 冷める (sameru) #

Meaning: cold, getting cold

This is another mediocre match. The pair has it’s presence in their respective languages. The 冷や (hiya) means cold water, and there are some predictions that “soğu-” has a root “su” (water) in Turkish

First occurence: Divan-i Lugati"t-Türk, 1073 - suw soġıdı (…) er soġundı.

ses & 声 (sei) #

Meaning: voice

They are semantically the same, although the “ses” word may be derived from noise reflection.

First occurence: Darir, Anternâme (1390) - “itler daχı ürmez ses sem yok”

buyruk & 武器 (buki) #

Meaning: order (in Turkish) - weapon (in Japanese)

This is not a direct match, although the root of the Turkish word has an interesting match. “buyur-” is the root of the word, which means to order; to command. The Japanese “武” means the art of war; martial arts; military arts; military force. It is just reasonable to think that at early ages, the one who holds the weapon, gives the order. Well, both words might be saved because of archaism. Therefore I wanted to add it here.

First occurence: Orkhon inscriptions (A.D. 735) - türgeş kaġan buyruḳı (Commander of Türgeş ruler)

doğuştan & 当然 (douzen) #

Meaning: naturally, inborn

There is another word 東洋 (touyou), which means east. “Doğu” means “east”, and this might be survived through the years. The unlinking part is, 東 is pronounced higashi or azuma instead of tou. The pair is a weak match, although further research would be helpful to clarify.

First occurence: Maitrisimit Nom Bitig, Before A.D. 1000 - uluġ tamularda tuġdımız … amtı bo kiçig tamularda tuġmış erür (We were born in big hells, now we"re born in these little hells)

yar- & 割る(waru) 割れる(wareru) #

Meaning: to split

Another verb similarity. The word is used in modern Turkish. Both words are used same meaning, but they are vocally differ. It is a mediocre match.

First occurence: Irk Bitig, Before A.D. 900 - adıġıŋ karnı yarılmiş, toŋuzuŋ azıgı sınmiş

Final Thoughts #

Out of 10,000+ words, a couple of matches are expected, but the words mentioned above seem more than a coincidence. Who knows, maybe they are linked through the Chinese, since early Turkish tribes have lived close to the Chinese, and the Japanese have transitioned a lot of words from Chinese. Both languages may have similar words without encountering each other as communities. Maybe it was that at some point in history, those tribes had lived close to each other, enough to share a common ground between languages. It is not only vocabulary, comparing grammar is essential to evaluate the closeness of two languages, but I already feel that I am walking on a thin line, so it is better to stop here. The effectiveness of the sequence-matching algorithm to observe close words is quite remarkable. After all, it was meant to be used for DNA matching but ended up in a completely different area. Hopefully, you enjoyed the reading. Thank you for reading, see you next time.

Sources: #

[1] Japanese meanings : https://jisho.org/

[2] Turkish etymology : https://www.nisanyansozluk.com

[3] Turkish dictionary : https://sozluk.gov.tr/

[4] Şine-Usu scripts : https://belleten.gov.tr/tam-metin/2556

[5] Zwei uigurische Runeninschriften in der Nord-Mongolei, aufgefunden und mit Transskription, Uebersetzung und Bemerkungen veröffentlicht von G.J. Ramstedt : This source is not available on the Internet, although Turkish articles mention this as a reference.

[6] Altaic: Rise and Fall of a Linguistic Hypothesis https://www.youtube.com/watch?v=z0zkHH6ZOEk

[7] Türkçe ve Japonca’nın akrabalığı https://www.amazon.com.tr/T%C3%9CRK%C3%87E-VE-JAPONCANIN-AKRABALI%C4%9EI-Kolektif/dp/9758839934

[8] What are the similarities between Turkish and Japanese culture? https://www.quora.com/What-are-the-similarities-between-Turkish-and-Japanese-culture