Discussion:
safer Unicode parsing
(too old to reply)
Bruno Haible
2016-11-17 18:14:49 UTC
Permalink
Raw Message
Hi,

The unistr/* modules and, with them, libunistring contain code for parsing
byte streams to Unicode characters. Currently it is optimized for speed
at the expense of safety (garbage in - garbage out). The code for safer parsing
(garbage in - error code) is already present for years, but needs to be
explicitly enabled through -DCONFIG_UNICODE_SAFETY.

I'm proposing to make this the default. It's time to favour safety over speed:
* We've seen so many CVEs in the past years, due to loose checking.
There is a theoretical possibility that such loose checking in libunistring
might be used to construct security attacks.
* No one nowadays will say "let's go back from UTF-8 to ASCII" because of
the small performance overhead of a few added machine instructions.

libunistring passes "make check" with these changes.


2016-11-17 Bruno Haible <***@clisp.org>

Enable Unicode decoder safety unconditionally.
* lib/unistr.in.h (u32_mbtouc_unsafe): Assume CONFIG_UNICODE_SAFETY.
* lib/unistr/u8-mblen.c (u8_mblen): Likewise.
* lib/unistr/u8-mbtouc-unsafe.c (u8_mbtouc_unsafe): Likewise.
* lib/unistr/u8-mbtouc-unsafe-aux.c (u8_mbtouc_unsafe_aux): Likewise.
* lib/unistr/u8-prev.c (u8_prev): Likewise.
* lib/unistr/u8-strmblen.c (u8_strmblen): Likewise.
* lib/unistr/u8-strmbtouc.c (u8_strmbtouc): Likewise.
* lib/unistr/u16-mblen.c (u16_mblen): Likewise.
* lib/unistr/u16-mbtouc-unsafe.c (u16_mbtouc_unsafe): Likewise.
* lib/unistr/u16-mbtouc-unsafe-aux.c (u16_mbtouc_unsafe_aux): Likewise.
* lib/unistr/u16-prev.c (u16_prev): Likewise.
* lib/unistr/u16-strmblen.c (u16_strmblen): Likewise.
* lib/unistr/u16-strmbtouc.c (u16_strmbtouc): Likewise.
* lib/unistr/u32-mblen.c (u32_mblen): Likewise.
* lib/unistr/u32-mbtouc-unsafe.c (u32_mbtouc_unsafe): Likewise.
* lib/unistr/u32-prev.c (u32_prev): Likewise.
* lib/unistr/u32-next.c (u32_next): Likewise.
* lib/unistr/u32-strmblen.c (u32_strmblen): Likewise.
* lib/unistr/u32-strmbtouc.c (u32_strmbtouc): Likewise.
* lib/uniconv/u8-conv-to-enc.c (u8_conv_to_encoding): Likewise.
* lib/uniconv/u8-strconv-to-enc.c (u8_strconv_to_encoding): Likewise.
* tests/unistr/test-u16-prev.c (check_invalid): Enable the
CONFIG_UNICODE_SAFETY tests unconditionally.
* tests/unistr/test-u32-mblen.c (main): Likewise.
* tests/unistr/test-u32-mbtouc.h (test_function): Likewise.
* tests/unistr/test-u32-prev.c (check_invalid): Likewise.
* tests/unistr/test-u32-next.c (main): Likewise.
* tests/unistr/test-u32-strmblen.c (main): Likewise.
* tests/unistr/test-u32-strmbtouc.c (main): Likewise.
* lib/unistr/u8-check.c (u8_check): Remove old dead code.
* lib/unistr/u8-mbtouc.c (u8_mbtouc): Likewise.
* lib/unistr/u8-mbtouc-aux.c (u8_mbtouc_aux): Likewise.
* lib/unistr/u8-mbtoucr.c (u8_mbtoucr): Likewise.
* lib/unistr/u8-uctomb.c (u8_uctomb): Likewise.
* lib/unistr/u8-uctomb-aux.c (u8_uctomb_aux): Likewise.
* lib/unistr/u16-check.c (u16_check): Update comment.
* NEWS: Mention the changes that callers should be aware of.
Paul Eggert
2016-11-17 18:47:44 UTC
Permalink
Raw Message
-#if FULL_SAFETY || CONFIG_UNICODE_SAFETY
Since this removes the only use of FULL_SAFETY, its definition in
tests/unistr/test-u32-mbtouc.c should also be removed.

Otherwise the change looks good to me; thanks.
Bruno Haible
2016-11-19 13:14:37 UTC
Permalink
Raw Message
Post by Paul Eggert
-#if FULL_SAFETY || CONFIG_UNICODE_SAFETY
Since this removes the only use of FULL_SAFETY, its definition in
tests/unistr/test-u32-mbtouc.c should also be removed.
Good catch; thanks. Pushed with this additional change.

Bruno
--
In memoriam Farhád Asdaqí <http://www.iranhrdc.org/english/publications/reports/3149-a-faith-denied-the-persecution-of-the-baha-is-of-iran.html?p=29>
Loading...