This patch replaces the previous broken approach to TOC string decoding
that used `.encode().decode('unicode_escape')` with proper parsing of
the escape sequences cdrdao is known to generate.
The new parser is also lenient with invalid escape sequences, that can
occur due to improper escaping in cdrdao. See:
https://github.com/cdrdao/cdrdao/issues/32
Latin-1:
This new parsing method should work for Latin-1 strings for both old and
new versions of cdrdao, as long as those strings don't trigger the
improper escaping issues in upstream cdrdao.
This has been verified with the album Diorama from the Danish black
metal band MØL.
MS-JIS:
This new parsing method should also work for MS-JIS strings as long as
the .toc file was generated by cdrdao 1.2.5+ and the strings don't
trigger improper escaping issues in upstream cdrdao.
Unfortunately, I don't have any CD with CD-Text in MS-JIS, so I could
not verify this.
cdrdao versions before 1.2.5 will still cause whipper to produce
mojibake (garbled characters) when reading MS-JIS CD-Text, as those
versions do not encode strings in UTF-8.
Other encodings:
As far as I know, CD-Text only supports officially ASCII, Latin-1 and
MS-JIS, but I wouldn't be surprised if there are unofficial encodings
out there, given the strange strings I've seen in some bug reports.
If you have a CD with garbled CD-Text, please submit a bug report
indicating the performer, album name, language and attach the .toc file
so that the produced strings can be compared to the expected text.
Fixes https://github.com/whipper-team/whipper/issues/169
Signed-off-by: Alicia Boya García <ntrrgc@gmail.com>
- Removed unused code not portable due to buffer() use
- raw_input() does not exist in Python 3
- Fixed octal constant syntax for Python 3
- Fixed TypeError
- Replace if not exists: makedirs(path) with single call: using makedirs(path, exist_ok=True)
- Class inherits from object, can be safely removed from bases in python3: pylint's useless-object-inheritance (W0235) check
Signed-off-by: JoeLametta <JoeLametta@users.noreply.github.com>
This commit also includes:
- whitespace / code formatting fixes
- slight syntax related changes: except <exception_name>, e -> except <exception_name> as e
- 3 pointless instructions instances have been rewritten [sorted] (spotted by semi-automatic check)
The unrelated changes shouldn't have any real impact on whipper's behaviour.
Some of this seems to be debug code which has been left in, some of it
seems to just be old code that was commented out and never put back in
and probably just forgotten about. Either way, we use git for a reason,
so there's no need for these code snippets to stick around. The code
history can be inspected and old code retrieved that way.