Home Python Unicode Casting on Variable Bug
Reply: 2

Python Unicode Casting on Variable Bug

user2287463 Published in 2018-02-13 22:41:13Z

I've found out this weird python2 behavior related to unicode and variable:

>>> u"\u2730".encode('utf-8').encode('hex')

This is the expected result I need, but I want to dynamically control the first part ("u\u2730")

>>> type(u"\u2027")
<type 'unicode'>

Good, so the first part is casted as unicode. Now declaring a string variable and casting it to unicode:

>>> a='20'
>>> b='27'
>>> myvar='\u'+a+b.decode('utf-8')
>>> type(myvar)
<type 'unicode'>
>>> print myvar

It seems that now I can use the variable in my original code, right?

>>> myvar.encode('utf-8').encode('hex')

The results, as you can see, is not the original one. It seems that python is treating 'myvar' as string instead of unicode. Do I miss something?

Anyway, my final goal is to loop Unicode from \u0000 to \uFFFF, cast them as string and cast the string as HEX. Is there an easy way?

juanpa.arrivillaga Reply to 2018-02-14 07:35:49Z

You are confusing the Unicode escape sequence with an the \u characters. It's like confusing r"\n" (or "\\n") with an actual newline. You want to usecodecs.raw_unicode_escape_decode decode the str with 'unicode_escape':

>>> import codecs
>>> a='20'
>>> b='27'
>>> myvar='\u'+a+b.decode('utf-8')
>>> myvar
>>> myvar.decode('unicode_escape')
(u'\u2027', 6)
>>> print(myvar.decode('unicode_escape')[0])
Mark Tolonen
Mark Tolonen Reply to 2018-02-14 05:17:19Z

unichr() in Python 2 or chr() in Python 3 are the ways to construct a character from a number. \uxxxx escapes codes can only be typed directly in code.

Python 2:

>>> a='20'
>>> b='27'
>>> unichr(int(a+b,16))

Python 3:

>>> a='20'
>>> b='27'
>>> chr(int(a+b,16))
You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.299338 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO