sb-external-formats
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:This is noodling towards a new external format implementation in SBCL. Currently it lives right here, outside the SBCL source tree. I'm not working on this right now, so if someone wants to run with it, they should feel free. Patches that move things along are welcome.
SB-EXTERNAL-FORMATS

This is noodling towards a new external format implementation in SBCL.
Currently it lives right here, outside the SBCL source tree. I'm not
working on this right now, so if someone wants to run with it, they
should feel free. Patches that move things along are welcome.

* Tasks
  - UTF-16 error handling
  - UTF-16 surrogate policy:
    Allow writing valid surrogate pairs. Signal an error only for
    invalid pairs -- but allow writing them with a restart. (Since we
    allow surrogates as lisp-characters, it would seem silly to
    complain about writing a valid pair of them. This means reading them
    back will canonicalize the representation, but that seems sane.)
  - UTF-8 surrogate policy
  - Missing encodings:
    - CP 932, CP 936, CP 949, CP 950
    - EBCDIC-US
    - EUC-JP
    - GBK
    - SHIFT_JIS
    - UCS-2
    - UCS-4
    - UTF-32
    - UTF-8b
    - X-MAC-CYRILLIC
  - Document ENCODE-STRING
  - Add :NULL-TERMINATE to ENCODE-STRING
  - Implement ENCODE-STRING-INTO
  - Implement DECODE-OCTETS-INTO
  - DECODE-OCTETS to work on null-terminated buffers of unknown size
  - Proper test suite: not sure if eg. DECODED- and ENCODED-LENGTH current work
  - Sort out BOM API:
    - Consider (OPEN ... :EXTERNAL-FORMAT :UTF-16) which should
      check BOM immediately, but not later, but later reads should not.
    - Consider decoding parts of a binary file -- BOM might need to be
      checked multiple times.
    - How to best support writing a BOM? Maybe just WRITE-BOM?

  - Documentation strings for encodings should say what languages / areas they
    are appropriate for.
  - Name external-format objects using the user-supplied name, and unless it is
    the canonical name print the canonical name in parentheses when describing the
    format.
  - Naming convention: case-insensitive strings?

* Design
  Octet sources and sinks can be either vectors or saps.

  Character sources and sinks are strings. The API level is
  responsible for pulling out the storage vector.

  Encoding objects contain the methods for computing the length,
  encoding, and decoding. Also length estimation?

  External-format Objects contains an encoding, a selection of line
  ending style, and a replacement character.

  Error functions may return either the character code of the
  replacement character, or NIL to indicate that it is to be skipped.

* Open Questions
  How to do more complex error handling? Ie. decoding invalid UTF-8
  sequences into strings describing those sequences.

  Seems like the handler would need to know the position where the
  failure occurred. With that and CONTINUE the final result can be
  amended as needed. Something along the lines of my original
  multi-char replacements might be more efficient, but this seems like
  a tangential issue.

  Would also like to be able to ABORT at the position of the invalid
  char/octets.

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。