• character-encoding
  • utf-8
  • unicode

UTF-8 is very sensitive in that text in other encodings often cannot be consumed by mistake. This is a feature, not a bug, as it forces you to know which encoding you are actually manipulating, which avoids a number of pesky bugs further downstream in your processing pipeline.

Typically, the OP simply wants Python not to choke on a stray \xff byte some megabytes into the file, instead of actually understand what this error means.

However, the proper solution is to investigate the actual data, and decide whether to use a different encoding from your Python code, fix the file manually, ask the person who created the file, maybe revert a previous incorrect encoding step (mojibake), or, in the worst case, discard the data which cannot be processed without knowing what it is supposed to represent.