Search on blog:

Python: What means error UnicodeDecodeError: 'charmap' codec can't decode byte XXX in position XXX

Error UnicodeDecodeError can means you or some code try to convert bytes to string using wrong encoding.

Usually Python uses encoding UTF-8 to read text from file or from network

fh = open(...)
text = fh.read()
r = requests.get(...)
text = r.content

but sometimes file or web page may keep text in different encoding. On Windows sometimes you can get text in Latin (alias ISO-8859) or similar encoding. It can also uses CP1250 (alias WIN-1250) for filenames on disk. Also old web page created on Windows were using Latin (alias ISO-8859). And sometimes web page may send information that it uses utf-8 but someone put text from file which was using different encoding (ie. Latin).

And then you need to use one of these encodings in

fh = open(..., encode='latin-1')
text = fh.read()
r = requests.get(..., encode='latin-1')
text = r.content

But first you have to recognize which encoding you need. If error shows some code - like can't decode byte 0x81 - then you can check it with different encoding

b'\x81'.decode('utf8')    # alias `utf-8`

b'\x81'.decode('cp1250')  # alias `win1250` (`cp` means `CodePage`)

# UnicodeDecodeError
b'\x81'.decode('latin')   #  alias `iso8859,   iso-8859`
b'\x81'.decode('latin1')  #  alias `iso8859_1, iso-8859-1, L1`
b'\x81'.decode('latin2')  #  alias `iso8859_2, iso-8859-2, L2`

b'\x81'.decode('iso8859')
b'\x81'.decode('iso8859-1')
b'\x81'.decode('iso8859-2')

List of all encodings you can find in Python documentation: codecs standard-encodings

There is also module chardet <https://chardet.readthedocs.io/en/latest/usage.html>__ which can get bytes (text before encoding) and it tries to recognize in what encoding are chars in bytes. Module requests uses this module but sometime even this doesn't help. It can use wrong encoding because it respects official standards instead of respecting more popular situation on internet.

Example which saves Polish native chars in file using encoding Latin2 but later it reads it using standard UTF-8 and it gets error UnicodeDecodeError

# save Polish native chars as Latin2 (Central Europe)
fh = open('output.txt', 'w', encoding='latin2')
fh.write("ęóąśłżźćń")
fh.close()

# load text as default UTF-8
fh = open('output.txt')
text = fh.read()
fh.close()

# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte

import chardet

fh = open('output.txt', "rb")
text = fh.read()
result = chardet.detect(text)
print(result)
fh.close()

Example which read file in bytes mode and use chardet to try to recognize encoding and read it

# save Polish native chars as Latin2 (Central Europe)
fh = open('output.txt', 'w', encoding='latin2')
fh.write("ęóąśłżźćń")
fh.close()

import chardet

fh = open('output.txt', "rb")
text = fh.read()
fh.close()

result = chardet.detect(text)
print(result)

# {'encoding': 'IBM855', 'confidence': 0.2844332309487475, 'language': 'Russian'}

text = text.decode(result['encoding'])
print(text)

print(text, encode='latin1')

Result

{'encoding': 'IBM855', 'confidence': 0.2844332309487475, 'language': 'Russian'}

Жз▒Х│┐╝Ты

It gives wrong result but maybe for longer text it would give better result.


There is another situation which can get this error. It can happend when we print text on console which use different encoding then utf-8 and it has problem to display it. This situation was popular for Windows console which was using encoding Latin or CP1250. It needed some changes in changes in registers.

Problem can be also when script is redirected or used without console and it can't get information what encoding is used in console. And then it may need encoding also in


Notes:

PyPNG struct.error

W trakcie zapisu do pliku PNG z użycie PyPNG trafiłem na błąd struct.error.

bitdepth=16 pozwala przechować wartości od 0 do 65535 (inaczej (256*256)-1 )

Jeśli wartość jest większa niż 65535 to pojawia się komunikat:

struct.error: 'H' format requires 0 <= number <= 65535

Jeśli wartość jest mniejsza niż …

« Page: 1 / 1 »