우분투 python 2.4 CJKCodecs
#!/usr/bin/env python import sys encodings = ('utf-7 utf-8 utf-16 utf-16-be utf-16-le ' 'euc-kr cp949 johab iso-2022-kr ' 'cp932 shift-jis shift-jisx0213 shift-jis-2004 euc-jp ' 'euc-jisx0213 euc-jis-2004 iso-2022-jp iso-2022-jp-1 ' 'iso-2022-jp-2 iso-2022-jp-3 iso-2022-jp-ext iso-2022-jp-2004 ' 'big5 cp950 ' 'gb2312 gbk big5hkscs hz gb18030') encodings = encodings.split(' ') def test(encoding, cause_segv=False): try: if cause_segv: [x for x in [unichr(x) for x in xrange(0, 0x10ffff+1)] if x.encode(encoding, errors).decode(encoding, errors) == x] else: [x for x in [unichr(x) for x in xrange(0, 0x10ffff+1)] if x.encode(encoding, errors).decode(encoding) == x] return True except UnicodeError, exc: if not cause_segv: print exc return False def consistency_test(encoding): chars = [unichr(x) for x in xrange(0, 0x10ffff+1)] for char in chars: enc = char.encode(encoding, errors) dec = enc.decode(encoding) if dec != char and enc != echar: return False return True def exc_test_run(): print 'Unwanted exception test' for encoding in encodings: sys.stdout.write(encoding+': ') sys.stdout.flush() if test(encoding): if consistency_test(encoding): print 'OK' else: print 'No exceptions, but got x.encode(e).decode(e) != x' def segv_test_run(): print 'Segfault test' for encoding in encodings: sys.stdout.write(encoding+': ') sys.stdout.flush() test(encoding, True) print 'OK' errors, echar = 'ignore', '' print 'Error policy: ignore' exc_test_run() errors, echar = 'replace', '?' print 'Error policy: replace' exc_test_run() segv_test_run() errors, echar = 'ignore', '' print 'Error policy: ignore' segv_test_run()
우분투 버그 중에 위의 코드로 일본어 코덱 지원이 이상하다는 버그가 있어서, python 2.4에서 지원하는 모든 인코딩을 위에서처럼 순서를 약간 바꾸어 테스트 해 보았습니다.
Error policy: ignore Unwanted exception test utf-7: 'utf7' codec can't decode bytes in position 0-3: code pairs are not supported utf-8: OK utf-16: 'utf16' codec can't decode bytes in position 2-3: unexpected end of data utf-16-be: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data utf-16-le: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data euc-kr: OK cp949: OK johab: OK iso-2022-kr: 'iso2022_kr' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence cp932: No exceptions, but got x.encode(e).decode(e) != x shift-jis: No exceptions, but got x.encode(e).decode(e) != x shift-jisx0213: OK shift-jis-2004: OK euc-jp: No exceptions, but got x.encode(e).decode(e) != x euc-jisx0213: OK euc-jis-2004: OK iso-2022-jp: 'iso2022_jp' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-1: 'iso2022_jp_1' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-2: 'iso2022_jp_2' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-3: 'iso2022_jp_3' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-ext: 'iso2022_jp_ext' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-2004: 'iso2022_jp_2004' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence big5: OK cp950: No exceptions, but got x.encode(e).decode(e) != x gb2312: OK gbk: No exceptions, but got x.encode(e).decode(e) != x big5hkscs: OK hz: 'hz' codec can't decode byte 0x7e in position 0: incomplete multibyte sequence gb18030: Traceback (most recent call last): File "./test3.py", line 50, in ? exc_test_run() File "./test3.py", line 36, in exc_test_run if test(encoding): File "./test3.py", line 17, in test [x for x in [unichr(x) for x in xrange(0, 0x10ffff+1)] RuntimeError: unicode mapping invalid
결과는 위와 같고, 우분투 Dapper의 Python 2.4.2 에서 입니다. 다른 배포판에서는 결과가 어떤지가 궁금하고, 왜 이런 문제가 있는지도 질문 드리고 싶습니다.
FreeBSD 7.0-current Feb-25
FreeBSD 7.0-current Feb-25 2006
Python 2.4.2 에서 위 log와 동일한 결과를 보였습니다.
역시 python이라면 bbs.python.or.kr나 #perky에서 말씀해보심이 어떨지요.
cjk codec maintainer시기도 한걸로 알고 있고...
역시 psf member이신 장혜식님께... :)
\(´∇`)ノ \(´∇`)ノ \(´∇`)ノ \(´∇`)ノ
def ed():neTdiVeR in range(thEeArTh)
--------------------------------------------------------------------------------
\(´∇`)ノ \(´∇`)ノ \(´∇`)ノ \(´∇`)ノ
def ed():neTdiVeR in range(thEeArTh)
제가 사용한 win32
제가 사용한 win32 python 2.4.2는 UCS4 빌드가 아니라서 다음 코드로 대체했습니다.
그리고 다음과 같은 결과를 얻었습니다.
몇 가지 제가 아는 것은,
- 토끼군
제가 올렸던 테스트 코드가 신통치 않았던 것이군요.
제가 올렸던 테스트 코드가 신통하지 않았던 것이었군요. 위의 토끼군님의 수정된 코드로 테스트를 하여 다음과 같은 결과를 얻었습니다. 제 경우는 세그폴트가 나는 것은 없었습니다.
Error policy: ignore Unwanted exception test utf-7: x = 0xdc00: 'utf7' codec can't decode bytes in position 0-3: code pairs are not supported utf-8: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 utf-16: x = 0xd800: 'utf16' codec can't decode bytes in position 2-3: unexpected end of data utf-16-be: x = 0xd800: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data utf-16-le: x = 0xd800: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data euc-kr: OK cp949: OK johab: OK iso-2022-kr: x = 0x1b: 'iso2022_kr' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence cp932: No exceptions, but got x.encode(e).decode(e) != x when x = 0xa2 shift-jis: No exceptions, but got x.encode(e).decode(e) != x when x = 0xa5 shift-jisx0213: OK shift-jis-2004: OK euc-jp: No exceptions, but got x.encode(e).decode(e) != x when x = 0xa5 euc-jisx0213: OK euc-jis-2004: OK iso-2022-jp: x = 0x1b: 'iso2022_jp' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-1: x = 0x1b: 'iso2022_jp_1' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-2: x = 0x1b: 'iso2022_jp_2' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-3: x = 0x1b: 'iso2022_jp_3' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-ext: x = 0x1b: 'iso2022_jp_ext' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-2004: x = 0x1b: 'iso2022_jp_2004' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence big5: OK cp950: No exceptions, but got x.encode(e).decode(e) != x when x = 0xa2 gb2312: OK gbk: No exceptions, but got x.encode(e).decode(e) != x when x = 0x30fb big5hkscs: OK hz: x = 0x7e: 'hz' codec can't decode byte 0x7e in position 0: incomplete multibyte sequence gb18030: No exceptions, but got x.encode(e).decode(e) != x when x = 0x30fb Error policy: replace Unwanted exception test utf-7: x = 0xdc00: 'utf7' codec can't decode bytes in position 0-3: code pairs are not supported utf-8: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 utf-16: x = 0xd800: 'utf16' codec can't decode bytes in position 2-3: unexpected end of data utf-16-be: x = 0xd800: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data utf-16-le: x = 0xd800: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data euc-kr: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 cp949: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 johab: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 iso-2022-kr: x = 0x1b: 'iso2022_kr' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence cp932: No exceptions, but got x.encode(e).decode(e) != x when x = 0xa2 shift-jis: No exceptions, but got x.encode(e).decode(e) != x when x = 0xa5 shift-jisx0213: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 shift-jis-2004: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 euc-jp: No exceptions, but got x.encode(e).decode(e) != x when x = 0xa5 euc-jisx0213: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 euc-jis-2004: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 iso-2022-jp: x = 0x1b: 'iso2022_jp' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-1: x = 0x1b: 'iso2022_jp_1' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-2: x = 0x1b: 'iso2022_jp_2' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-3: x = 0x1b: 'iso2022_jp_3' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-ext: x = 0x1b: 'iso2022_jp_ext' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence iso-2022-jp-2004: x = 0x1b: 'iso2022_jp_2004' codec can't decode byte 0x1b in position 0: incomplete multibyte sequence big5: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 cp950: No exceptions, but got x.encode(e).decode(e) != x when x = 0xa2 gb2312: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 gbk: No exceptions, but got x.encode(e).decode(e) != x when x = 0x30fb big5hkscs: No exceptions, but got x.encode(e).decode(e) != x when x = 0x10000 hz: x = 0x7e: 'hz' codec can't decode byte 0x7e in position 0: incomplete multibyte sequence gb18030: No exceptions, but got x.encode(e).decode(e) != x when x = 0x30fb Segfault test utf-7: OK utf-8: OK utf-16: OK utf-16-be: OK utf-16-le: OK euc-kr: OK cp949: OK johab: OK iso-2022-kr: OK cp932: OK shift-jis: OK shift-jisx0213: OK shift-jis-2004: OK euc-jp: OK euc-jisx0213: OK euc-jis-2004: OK iso-2022-jp: OK iso-2022-jp-1: OK iso-2022-jp-2: OK iso-2022-jp-3: OK iso-2022-jp-ext: OK iso-2022-jp-2004: OK big5: OK cp950: OK gb2312: OK gbk: OK big5hkscs: OK hz: OK gb18030: OK Error policy: ignore Segfault test utf-7: OK utf-8: OK utf-16: OK utf-16-be: OK utf-16-le: OK euc-kr: OK cp949: OK johab: OK iso-2022-kr: OK cp932: OK shift-jis: OK shift-jisx0213: OK shift-jis-2004: OK euc-jp: OK euc-jisx0213: OK euc-jis-2004: OK iso-2022-jp: OK iso-2022-jp-1: OK iso-2022-jp-2: OK iso-2022-jp-3: OK iso-2022-jp-ext: OK iso-2022-jp-2004: OK big5: OK cp950: OK gb2312: OK gbk: OK big5hkscs: OK hz: OK gb18030: OK ---- I paint objects as I think them, not as I see them. Ubuntu Dapper user / Ubuntu KoreanTeam / Lanuchpad karma 3940
----
I paint objects as I think them, not as I see them.
atie's minipage
댓글 달기