Astral Plane characters in Erlang JSON/RFC4627 implementation
This page is a mirrored copy of an article originally posted on the (now sadly defunct) LShift blog; see the archive index here.
Fri, 16 November 2007
Sam Ruby examines support for astral-plane characters in various JSON implementations. His post prompted me to check my Erlang implementation of rfc4627. I found that for astral plane characters in utf-8, utf-16, or utf-32, everything worked properly, but the RFC4627-mandated surrogate-pair “\uXXXX” encodings broke. A few minutes hacking later, and:
Eshell V5.5.5 (abort with ^G) 1> {ok, Utf8Encoded, []} = rfc4627:decode(”\”\\u007a\\u6c34\\ud834\\udd1e\”"). {ok,<<122,230,176,180,240,157,132,158>>,[]} 2> xmerl_ucs:from_utf8(Utf8Encoded). [122,27700,119070] 3> rfc4627:encode(Utf8Encoded). [34,122,230,176,180,240,157,132,158,34] 4>
Much better.
You can get the updated code from github.com/tonyg/erlang-rfc4627.
Comments
On 1 June, 2009 at 12:20 pm,
wrote:On 12 June, 2009 at 12:42 am,
wrote:By contrast,
~/dev/erlang-rfc4627$ erl -pa ebin Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false] Eshell V5.7.2 (abort with ^G) 1> rfc4627:decode(”\”\\u007a\\u6c34\\ud834\\udd1e\”"). {ok,<<122,230,176,180,240,157,132,158>>,[]} 2>
I can’t reproduce the problem. Perhaps it’s a cut-and-paste error?
root@testbed2:~/packages/erlang-rfc4627# erl
Erlang R13B (erts-5.7.1) [source] [64-bit] [smp:3:3] [rq:3] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.7.1 (abort with ^G)
1> {ok, Utf8Encoded, []} =
1> rfc4627:decode(āā\u007a\u6c34\ud834\udd1eā”).
* 2: illegal character
1>