Skip to the content.

Languages/Scripts supported in different versions of Tesseract

Languages

LangCode Language 3.02 3.04 4.00 4.0.0 4.0.0 4.0.0
        Nov. 2016 tessdata tessdata_best tessdata_fast
               
afr Afrikaans x x x x x x
amh Amharic   x x x x x
ara Arabic x x x x x x
asm Assamese   x x x x x
aze Azerbaijani   x x x x x
aze_cyrl Azerbaijani - Cyrilic x x x x x x
bel Belarusian x x x x x x
ben Bengali x x x x x x
bod Tibetan   x x x x x
bos Bosnian   x x x x x
bre Breton     x x x x
bul Bulgarian x x x x x x
cat Catalan; Valencian x x x x x x
ceb Cebuano   x x x x x
ces Czech x x x x x x
chi_sim Chinese - Simplified x x x x x x
chi_tra Chinese - Traditional x x x x x x
chr Cherokee x x x x x x
cos Corsican       x x x
cym Welsh   x x x x x
dan Danish x x x x x x
dan_frak Danish - Fraktur (contrib) x x        
deu German x x x x x x
deu_frak German - Fraktur (contrib) x x        
deu_latf German (Fraktur Latin)     x x x x
dzo Dzongkha   x x x x x
ell Greek, Modern (1453-) x x x x x x
eng English x x x x x x
enm English, Middle (1100-1500) x x x x x x
epo Esperanto x x x x x x
equ Math / equation detection module x x   x x x
est Estonian x x x x x x
eus Basque x x x x x x
fao Faroese       x x x
fas Persian   x x x x x
fil Filipino (old - Tagalog)       x x x
fin Finnish x x x x x x
fra French x x x x x x
frk German - Fraktur (now deu_latf) x x x x x x
frm French, Middle (ca.1400-1600) x x x x x x
fry Western Frisian       x x x
gla Scottish Gaelic       x x x
gle Irish   x x x x x
glg Galician x x x x x x
grc Greek, Ancient (to 1453) (contrib) x x x x x x
guj Gujarati   x x x x x
hat Haitian; Haitian Creole   x x x x x
heb Hebrew x x x x x x
hin Hindi x x x x x x
hrv Croatian x x x x x x
hun Hungarian x x x x x x
hye Armenian       x x x
iku Inuktitut   x x x x x
ind Indonesian x x x x x x
isl Icelandic x x x x x x
ita Italian x x x x x x
ita_old Italian - Old x x x x x x
jav Javanese   x x x x x
jpn Japanese x x x x x x
kan Kannada x x x x x x
kat Georgian   x x x x x
kat_old Georgian - Old   x x x x x
kaz Kazakh   x x x x x
khm Central Khmer   x x x x x
kir Kirghiz; Kyrgyz   x x x x x
kmr Kurmanji (Kurdish - Latin Script)     x x x x
kor Korean x x x x x x
kor_vert Korean (vertical)     x x x x
kur Kurdish (Arabic Script)   x        
lao Lao   x x x x x
lat Latin   x x x x x
lav Latvian x x x x x x
lit Lithuanian x x x x x x
ltz Luxembourgish     x x x x
mal Malayalam x x x x x x
mar Marathi   x x x x x
mkd Macedonian x x x x x x
mlt Maltese x x x x x x
mon Mongolian     x x x x
mri Maori     x x x x
msa Malay x x x x x x
mya Burmese   x x x x x
nep Nepali   x x x x x
nld Dutch; Flemish x x x x x x
nor Norwegian x   x x x x
oci Occitan (post 1500)   x x x x x
ori Oriya   x x x x x
osd Orientation and script detection module x x x x x x
pan Panjabi; Punjabi   x x x x x
pol Polish x x x x x x
por Portuguese x x x x x x
pus Pushto; Pashto   x x x x x
que Quechua     x x x x
ron Romanian; Moldavian; Moldovan x x x x x x
rus Russian x x x x x x
san Sanskrit   x x x x x
sin Sinhala; Sinhalese   x x x x x
slk Slovak x x x x x x
slk_frak Slovak - Fraktur (contrib) x x        
slv Slovenian x x x x x x
snd Sindhi     x x x x
spa Spanish; Castilian x x x x x x
spa_old Spanish; Castilian - Old x x x x x x
sqi Albanian x x x x x x
srp Serbian x x x x x x
srp_latn Serbian - Latin   x x x x x
sun Sundanese     x x x x
swa Swahili x x x x x x
swe Swedish x x x x x x
syr Syriac   x x x x x
tam Tamil x x x x x x
tat Tatar     x x x x
tel Telugu x x x x x x
tgk Tajik   x x x x x
tgl Tagalog (new - Filipino) x x x      
tha Thai x x x x x x
tir Tigrinya   x x x x x
ton Tonga     x x x x
tur Turkish x x x x x x
uig Uighur; Uyghur   x x x x x
ukr Ukrainian x x x x x x
urd Urdu   x x x x x
uzb Uzbek   x x x x x
uzb_cyrl Uzbek - Cyrilic   x x x x x
vie Vietnamese x x x x x x
yid Yiddish   x x x x x
yor Yoruba     x x x x

Scripts

  Script 3.02 3.04 4.00 4.0.0 4.0.0 4.0.0
        Nov 2016 tessdata tessdata_best tessdata_fast
arab Arabic       x x x
armn Armenian       x x x
beng Bengali       x x x
cans Canadian_Aboriginal       x x x
cher Cherokee       x x x
cyrl Cyrillic       x x x
deva Devanagari       x x x
ethi Ethiopic       x x x
frak Fraktur       x x x
geor Georgian       x x x
grek Greek       x x x
gujr Gujarati       x x x
guru Gurmukhi       x x x
hans HanS (Han simplified)       x x x
hans-vert HanS_vert (Han simplified vertical)       x x x
hant HanT (Han traditional)       x x x
hant-vert HanT_vert (Han traditional vertical)       x x x
hang Hangul       x x x
hang-vert Hangul_vert (Hangul vertical)       x x x
hebr Hebrew       x x x
jpan Japanese       x x x
jpan-vert Japanese_vert (Japanese vertical)       x x x
knda Kannada       x x x
khmr Khmer       x x x
laoo Lao       x x x
latn Latin       x x x
mlym Malayalam       x x x
mymr Myanmar       x x x
orya Oriya(Odia)       x x x
sinh Sinhala       x x x
syrc Syriac       x x x
taml Tamil       x x x
telu Telugu       x x x
thaa Thaana       x x x
thai Thai       x x x
tibt Tibetan       x x x
viet Vietnamese       x x x

For detalls about the languages that each Script.traindata file supports, see the files that end with langs.txt (e.g. Latin.langs.txt) here.