Hunting down the similar words between Japanese&Turkish

It is a common myth in Turkey that the Japanese and the Turks shared a common root at some point in history. There are several reasons for this belief: the historical proximity of pro-Turkic tribes to the Japanese is an important reason, and the similarity of the languages is also an argument. There were even some linguists who proposed that Asian languages had a common root, known as the Altaic language theory. However, this theory has been abandoned by the linguistic community, at least to my knowledge. The contemporary perspective is that the two languages don’t share a common root.

However, there are some strange similarities between the two languages. There is even a book[7] that aimed to compare Japanese-Turkish words to establish the link between them. Intrigued by this topic, I opted to learn Japanese and try it myself. During the pandemic, I, like many others, started learning something new while having time at home. I found it fascinating that I was learning Japanese faster than German (pun intended). However, I cannot conclude that Japanese is similar to Turkish just because it is easier to learn than German. Maybe it’s only because I’m at the beginning of learning Japanese. Nonetheless, I have observed peculiar similarities between languages. Therefore, I wanted to investigate further by searching for shared vocabulary between the two languages to find any common ground.

Disclaimer: I am not a linguistic expert, and my knowledge of Japanese is limited. However, I have access to reliable sources to investigate Turkish etymology. I can only give insights on one end - Turkish, as the Japanese language remains a mystery to me. If you are Japanese, a linguist, or familiar with both languages, kindly note that my opinions are those of a person with computer science background. If you have any ideas or corrections, do not hesitate to share them with me.

Supportive arguments: Link to heading

1- Chinese influence on old Turkish and Japanese Link to heading

Japanese has an extensive set of kanji, which has been taken from Chinese scripts. There are 3000~ kanji out there, and there are different ways Japanese people immigrated those words into their languages, in the end, making a core part of Japanese. I believe nobody would be shocked if the Chinese had influenced old Turkish. I think, for some of the words I will share with you at the end, there is a high likelihood of those words having Chinese roots rather than Japanese. Probably one needs to read for years to develop a solid understanding.

2- Similarities in Japanese and Turkish culture Link to heading

Stepping away from the languages, both nations have strange similarities, that look strange when you think of it. You can find a number of them in this quora response[8]. Surely it is not easy to say -ehm- “Those cultural similarities project that two nations are the same at some point.” Well, Turkish culture is a mix of everything in Eurasia. Maybe they happen to be similar, or maybe given the number of differences, they are just coincidences.

Counter-arguments: Link to heading

1- Lack of historical evidence Link to heading

Stepping away from the languages, both nations have strange similarities. You can find some examples in this Quora response[8]. Of course, it is not easy to say -ehm- “Those cultural similarities project that two nations are the same at some point.” Perhaps they share some similarities, but when compared to the distinct cultural differences, it can be just a coincidence.

2- Lots of dissimilarities between languages Link to heading

This is a strong stance on the discussion. After I decided to learn Japanese, I also think there are a lot of differences between Turkish and Japanese. So, I believe that even though there is a common ground, that ground has broken for a long time, the similarities vanished over time, and languages part away in different directions. Languages are dynamic. They shift inevitably now and then, and in the end, two people who speak the same language in different centuries can’t understand each other, assuming time travel is possible.

Anyway, I take this as a practice for my pleasure, and since I am a computer engineer who knows sequence matching algorithms, let’s go and find out those words!

Dataset Link to heading

To match words in between, I first found two dictionaries, English-Japanese and English-Turkish. Unfortunately, I couldn’t find a reliable Japanese-Turkish dictionary. Therefore, using English as an Intermediary language is the best option. Here are the ENJP and ENTR dictionaries. After collecting them, I used the library pdfplumber to extract word pairs. The first analysis shows that:

There are 38.249 pairs in the EN-TR dictionary, a part of it is sayings and phrases, while most of it is word pairings. Some English words have more than one equivalent of a Turkish word. They are synonyms or close matches.

There are 13.712 pairs in the EN-JP dictionary. Only a portion of the dictionary is sayings or phrases. A small portion of the English words consists of more than one Japanese term. I assume the authors of the dictionary didn’t include synonyms. In other words, most of the EN-JP dictionary is a one-to-one mapping.

Language preparation Link to heading

In order to derive similar words, the first thing to do is to match the semantically equivalent words. To do that, I used English as the root language and Turkish and Japanese as pairs, so that we would have a roughly aligned JP-TR dictionary.

In Japanese, most of the alphabet is syllables, thus having a consonant and a vowel. Exceptions are vowels, i e o u a and n. Turkish also has syllables, but syllables resemble an intermediary stage between letters and words.

Turkish alphabet differs from the Latin alphabet in sounds ö, ü, ı and ç, ş and ğ, and not having w, q, and x. We are lucky that ş is a sound in Japanese shi, although to pronounce it correctly, a Japanese should write shi-e, ç is chi, again with chi-e. ğ is a difficult one. You can think of it as the sound of the breeze. Japanese doesn’t have “v” and “l” consonants (violin - “baiorin”, lion - “raion”).

What type of words we are looking? Link to heading

Written history marks that Turkic communities departed from East Asia between the 6th and 11th centuries, known as Turkic migration. This is a factor in filtering out words. Definitions from later eras are expected to be different. We will be checking words supposedly defined in a prehistoric sense: definitions around the sun, world, earth, human, primitive war, animals, water, and so on. For example, technological terms won’t pair. If they do, they pair because of Western influence on both languages. There are a lot of foreign words, written in Katakana if they are Japanese, or they are “turkishized”. Some examples of this would be cosmetic or chemical terms.

Methodology Link to heading

After an extensive explanation, it is time to move to sequence matching. Now, it is algorithms o’clock. Sequence matching has been investigated through the years from different disciplines. Bioinformatics is one of them. Those scientists who want to extract similarities between DNA and RNA have a simple yet powerful method called Global Sequence Alignment. It is a dynamic programming approach to comparing biological sequences, developed by Saul B. Needleman and Christian D. Wunsch. The idea is to create a matrix having pair characters on the x and y-axis of the matrix, then fill each cell by checking the left, up, and diagonal top-left. The gap score is added to the left and up values, while the diagonal score will decide whether there is a match between character pairs.

After filling all the cells, we will start tracing back. Starting from the bottom-right; the direction will be the highest score on up, left, or diagonal up left. For each step, the highest score will be the direction. If a diagonal has a higher score, there is a character pair, if not, we will move with dashes for the respective sequence. The traceback will be finished when we reach the top-left.

Let"s make an example with words kara (Turkish) – kurai (Japanese).

Scores for:

match = 1,

mismatch = -2

gap = -1

We put kuroi on the x-axis and kara on the y-axis. Then we fill the first row and column as shown below. Each time you go left on the first row, you put a gap, so it will -1 will be added to it’s left value. The same goes for the first column too.

K U R O I
0 -1 -2 -3 -4 -5
K -1
A -2
R -3
A -4

Then lets check the second row-second column:

K
0 -1
K -1 X

We consider up, left, and up-left diagonals. Remember that up or left means a dash, while diagonal means a letter comparison:

-1-1 = -2 (up) -1-1 = -2 (left) 0+1(both characters are K - match) = 1

Since the matching case has the highest value, the maximum value will be 1, therefore the value to fill that cell is 1

K
0 -1
K -1 1

You can fill the second row now.

K U R O I
0 -1 -2 -3 -4 -5
K -1 1 0 -1 -2 -3
A -2
R -3
A -4

And third row

K U R O I
0 -1 -2 -3 -4 -5
K -1 1 0 -1 -2 -3
A -2 0 -1 -2 -3 -4
R -3
A -4

And the final table looks like this:

K U R O I
0 -1 -2 -3 -4 -5
K -1 1 0 -1 -2 -3
A -2 0 -1 -2 -3 -4
R -3 -1 -2 0 -1 -2
A -4 -2 -3 -1 -2 -3

The table has finished. Let’s trace back

K U R O I
X
K 🡤
A
R 🡤
A

Each time you go left on the x-axis will be a dash on the word on the column. Each time you go up there will be a dash on the word on the row. The result is

–AR-AK = KA-RA–

IO-RU-K = K-UR-OI

That looks way better! By doing this we made a pairwise matching. We will be doing this to every possible word pair. To finish, we need to convert it into a score to find the best matchings. A naive version would use length of matches/length of sequence to have a score. I found out later that some rules in matching can increase quality.

This approach led to quite a good match, but there are similar words in Japanese and Turkish taken from Western languages. To filter out, I used the algorithm for English-Turkish and English-Japanese pairs. Whenever there is a match of Japanese-Turkish words, I filter them if they match with the English word above a percentage (60%+). Doing that removed the Western words, for example, caffeine, catamaran, and tomography. Those words are from a foreign language.

The only thing I believe is missing is the Chinese connection. Unfortunately, I have no knowledge of Chinese, so I can’t effectively eliminate the words with Chinese roots.

RESULTS Link to heading

  • When you see a dash(-) in words, it is pronounced as “-mek;-mak”, meaning “to”. So yap means do and yap- (yapmak) means to do
  • Only the words which have a Turkish root are considered. If a word has it"s root in Arabic, Persian (quite common), French, or Latin (less common), they are excluded.
  • The below explanations don’t verify or prove the connection. They only show a strong similarity given the upper algorithm, enough for me to remove irrelevant ones and so on.
şimdi(imdi) - 今(ima) Link to heading

meaning: now

The word şimdi is two words “” + “imdi” which translates to “just now”. The “imdi” root has another root “amtı” or “amdı” which has a shorter word “am”. The “am” root is not used in modern Turkish, at least to say “now”, although some sources indicate it has been used in other Turkish dialects. The Japanese 今(ima) match semantically and they are very close vocally.

First occurence: Orkhon inscriptions (A.D. 735) - ilim amtı kanı (Where is my country now)

kanı & 感情 (kanjou) Link to heading

Meaning: sentiment

The word kanı has a root “kan-”, meaning to believe; to satisfy. The Japanese 感 (kan) means feeling, sensation, or admiration. They are semantically and phonetically similar.

First occurence: Türkische Turfantexte 1-9, Before A.D. 900 - közünürteki küsüşleri kandı ((his) desires in this world has satisfied.)

koyu & 濃い (koi) Link to heading

Meaning: dark or deep (colour)

This is known as similarity, semantically equivalent, and phonetically close. It is a strong match

hun & 本 (hon) Link to heading

Meaning: book

hun” is not used as book in modern Turkish, and it was fairly hard to verify this Turkish word. In Japanese, it is one of the earliest words you will learn and use extensively. I could only find it in the dictionary I used and an online source, that was all.

tanrı & 天父 (tenshu) | tanrısal & 天来 (tenrai) Link to heading

Meaning: (1) divine (2) god

I assume there are other Asian languages that use similar vocals to define celestial powers. For example, in Chinese, it is 上帝 (Shàngdì). The actual word is tengri in old Turkish, it has softened to tanrı. It is another strong match.

tomdaş & 友達 (tomodachi) Link to heading

Meaning: friend

“tomdaş” does not have a presence in modern Turkish, instead they say “arkadaş, yoldaş”. Some dictionaries have it as a reference. The Japanese word has a presence in modern Japanese.

dost & 同志 (doushi), 同士 (doushi) Link to heading

Meaning: close friend

This word is another good one, having the meaning close friend in both languages, but this word might not be a correct match. It is because the known root of this word is guessed to be (old) Persian, which means maybe Turkish people learned this word after they moved from East Asia. It is possible this is an Avestan word, an old Persian language that dates back to A.D. 200-400

First occurence: Kutadgu Bilig, 1069 - öz asġı tiler dostḳa birme köŋül (Don"t give your heart to a friend who takes care of his/her own interests)

tartış- & 戦い (tatakai), 戦う (tatakau) Link to heading

Meaning: to discuss/debate/conflict (Turkish), to fight or war (Japanese)

The Turkish word has the root “tart-”, meaning to measure. The Japanese word is used as battle, war, or to fight, to compete. The words are used in different contexts, although they are semantically close and phonetically similar. This is another mediocre link.

First occurence: Kaşgarî, Divan-i Lugati"t-Türk, 1073 - ol meniŋ birle ya tartışdı (He discussed/competed with me on bow streching) Another occurence: Kul Mes"ud, Kelile ve Dimne terc., Before A.D. 1347 - bu dartışık bunlar arasında ulaldı, savaşa başladılar. (The dispute between those, they started to fight)

terazi & 釣り合い (tsuriai) Link to heading

Meaning: balance - scale

This word is used in daily Turkish. It is derived from “tart-” too.

First occurence: Albert von Le Coq, Türkische Manichäica aus Chotscho, 1919 - “tanmış üzütüg tutupan tarazuk içinte olgurtur tiyür.”

art & 後 (ato) Link to heading

Meaning : back - rear

It is used in daily Japanese and also modern Turkish (art arda, ardıl, ardından)

First occurence: “Divan-i Lugati"t-Türk” A.D. 1073

yansı- & 反射 (hansha) Link to heading

Meaning: reflect

They are weakly connected but worth mentioning. The root of it is “yan-” or “yanıt” which means to return (back) and response. There is also a verb which is 射す(sasu) - 差す (sasu), which means to shine. What I found interesting, is another word called “yansıla-” means go against, and 反(han) alone means anti, there are also words that utilizes 反(han) in a sense to express being against, for example, 反対(hantai) means opposition. I believe this is open to discussion, again weakly connected.

First occurence: Kutadgu Bilig, A.D. 1069 - cevāb birse sözke yanut kılsa söz

kok- & 嗅ぐ (kagu) | koku & 芳香 (houkou) Link to heading

Meaning: (1) smell and (2) to smell

This word is derived from “kok-”, which means to smell, the Japanese word 嗅ぐ (kagu) is also quite similar to it

First occurence: Divan-i Lugati"t-Türk, 1073 - et ḳōḳdı (the meat smell)

koru- & 囲う (kakou) Link to heading

Meaning: to protect(Turkish), to enclose;to surround(Japanese)

This is a mediocre match, although it is close enough to mention. As far as I understand, while the Japanese word is used more physically, the Turkish word is used in abstract meaning as well.

First occurence: Orkhon inscriptions, A.D. 735 - “korıgu éki üç kişilegin tezip bardı” (the guard fled with two or three men)

korku & 恐怖 (kyoufu) Link to heading

Meaning: fear

The word korku is derived from “kork-”, which means “to fear”. The root of “kork-” is coined as “koru-” (to protect). Well, the one who needs to be protected has something to fear? Surely, a nice link. The Japanese 怖 (kowa) means frightening, 怖い(kowai) means scary, 恐れる(osoreru) means fear. None of them links up with “to protect”, so it is only a mediocre-weak connection.

First occurence: Orkhon inscriptions, A.D. 735 - “öküş tiyin neke korkur biz” (Why are we afraid of their crowd?)

yabani & 野蛮 (yaban) Link to heading

Meaning: savage, wild

A direct example where you can find other sources on the internet. Both mean savage and has the same phonetics. The “yaban” word has a synonym “vahşi” which is also encountered in the script.

First occurence: Kutadgu Bilig, A.D. 1069 - yā vahşī bolup men biyābānda yügrü (or I will be a savage and walk in wilderness)

okur & 読者 (dokusha) Link to heading

Meaning: reader

This word is used in modern Turkish. The root of it is “oku-”, meaning “to read”.

First occurence: Albert von Le Coq Türkische Manichäica aus Chotscho, 1919 (source before 900) - bu emig iki kata okıyu tegintim. (I tried to read this magic twice)

uy- & 応 (ou) | uygun & 応分 (oubun) Link to heading

Meaning: to suit, to fit

Another type of word that matches, although semantic meaning differs a little. In Japanese, 応 (ou) means agreement while “uy-” means to fit. 応分 (oubun) means according to one"s abilities; appropriate; reasonable​, a close match to “uygun”.

First occurence: Şine-Usu[4][5] - ben Seleŋge keçe uḏu yorıdım. (I crossed the Selenge river and walked afterward)

soğu- & 寒 (samu), 冷める (sameru) Link to heading

Meaning: cold, getting cold

This is another mediocre match. The pair has it’s presence in their respective languages. The 冷や (hiya) means cold water, and there are some predictions that “soğu-” has a root “su” (water) in Turkish

First occurence: Divan-i Lugati"t-Türk, 1073 - suw soġıdı (…) er soġundı.

ses & 声 (sei) Link to heading

Meaning: voice

They are semantically the same, although the “ses” word may be derived from noise reflection.

First occurence: Darir, Anternâme (1390) - “itler daχı ürmez ses sem yok”

buyruk & 武器 (buki) Link to heading

Meaning: order (in Turkish) - weapon (in Japanese)

This is not a direct match, although the root of the Turkish word has an interesting match. “buyur-” is the root of the word, which means to order; to command. The Japanese “武” means the art of war; martial arts; military arts; military force. It is just reasonable to think that at early ages, the one who holds the weapon, gives the order. Well, both words might be saved because of archaism. Therefore I wanted to add it here.

First occurence: Orkhon inscriptions (A.D. 735) - türgeş kaġan buyruḳı (Commander of Türgeş ruler)

doğuştan & 当然 (douzen) Link to heading

Meaning: naturally, inborn

There is another word 東洋 (touyou), which means east. “Doğu” means “east”, and this might be survived through the years. The unlinking part is, 東 is pronounced higashi or azuma instead of tou. The pair is a weak match, although further research would be helpful to clarify.

First occurence: Maitrisimit Nom Bitig, Before A.D. 1000 - uluġ tamularda tuġdımız … amtı bo kiçig tamularda tuġmış erür (We were born in big hells, now we"re born in these little hells)

yar- & 割る(waru) 割れる(wareru) Link to heading

Meaning: to split

Another verb similarity. The word is used in modern Turkish. Both words are used same meaning, but they are vocally differ. It is a mediocre match.

First occurence: Irk Bitig, Before A.D. 900 - adıġıŋ karnı yarılmiş, toŋuzuŋ azıgı sınmiş

Final Thoughts Link to heading

Out of 10,000+ words, a couple of matches are expected, but the words mentioned above seem more than a coincidence. Who knows, maybe they are linked through the Chinese, since early Turkish tribes have lived close to the Chinese, and the Japanese have transitioned a lot of words from Chinese. Both languages may have similar words without encountering each other as communities. Maybe it was that at some point in history, those tribes had lived close to each other, enough to share a common ground between languages. It is not only vocabulary, comparing grammar is essential to evaluate the closeness of two languages, but I already feel that I am walking on a thin line, so it is better to stop here. The effectiveness of the sequence-matching algorithm to observe close words is quite remarkable. After all, it was meant to be used for DNA matching but ended up in a completely different area. Hopefully, you enjoyed the reading. Thank you for reading, see you next time.

Sources: Link to heading

[1] Japanese meanings : https://jisho.org/

[2] Turkish etymology : https://www.nisanyansozluk.com

[3] Turkish dictionary : https://sozluk.gov.tr/

[4] Şine-Usu scripts : https://belleten.gov.tr/tam-metin/2556

[5] Zwei uigurische Runeninschriften in der Nord-Mongolei, aufgefunden und mit Transskription, Uebersetzung und Bemerkungen veröffentlicht von G.J. Ramstedt : This source is not available on the Internet, although Turkish articles mention this as a reference.

[6] Altaic: Rise and Fall of a Linguistic Hypothesis https://www.youtube.com/watch?v=z0zkHH6ZOEk

[7] Türkçe ve Japonca’nın akrabalığı https://www.amazon.com.tr/T%C3%9CRK%C3%87E-VE-JAPONCANIN-AKRABALI%C4%9EI-Kolektif/dp/9758839934

[8] What are the similarities between Turkish and Japanese culture? https://www.quora.com/What-are-the-similarities-between-Turkish-and-Japanese-culture