How to transliterate Cyrillic to Latin using Python 2.7? - not correct translation output -


i trying transliterate cyrillic latin excel file. working bottom , can not figure out why isn't working.
when try translate simple text string, python outputs "eeeee eee" instead of correct translation. how can fix give me right translation?? have been trying figure out day!

symbols = (u"абвгдеёзийклмнопрстуфхъыьэАБВГДЕЁЗИЙКЛМНОПРСТУФХЪЫЬЭ",            u"abvgdeezijklmnoprstufh'y'eabvgdeezijklmnoprstufh'y'e")  tr = {ord(a):ord(b) a, b in zip(*symbols)}  text = u'Добрый Ден' print text.translate(tr)  >>eeeeee eee 

i appreciate help!

your source input wrong. entered source , text literals, python did not read right unicode codepoints.

instead, suspect pythonioencoding variable has been set error handler set replace. causes python replace codepoints does not recognize question marks. all cyrillic input treated not-recognized.

as result, codepoint in translation map 63, question mark, mapped last character in symbols[1] (which expected behaviour dictionary comprehension 1 unique key):

>>> unichr(63) u'?' >>> unichr(69) u'e' 

the same problem applies text unicode string; too consists of question marks. translation mapping replaces each letter e:

>>> u'?????? ???'.translate({63, 69}) u'eeeeee eee' 

you need either avoid entering cyrillic literal characters or fix input method.

in terminal, function of codec terminal (or windows console) supports. configure correct codepage (windows) or locale (posix systems) input , output encoding supports cyrillic (utf-8 best).

in python source file, tell python encoding used string literals codec comment @ top of file.

avoiding literals means using unicode escape sequences:

symbols = (     u'\u0430\u0431\u0432\u0433\u0434\u0435\u0451\u0437\u0438\u0439\u043a\u043b\u043c'     u'\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u044a\u044b\u044c\u044d'     u'\u0410\u0411\u0412\u0413\u0414\u0415\u0401\u0417\u0418\u0419\u041a\u041b\u041c'     u'\u041d\u041e\u041f\u0420\u0421\u0422\u0423\u0424\u0425\u042a\u042b\u042c\u042d',     u"abvgdeezijklmnoprstufh'y'eabvgdeezijklmnoprstufh'y'e" ) tr = {ord(a):ord(b) a, b in zip(*symbols)}  text = u'\u0414\u043e\u0431\u0440\u044b\u0439 \u0414\u0435\u043d'  print text.translate(tr) 

Comments

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -