Commits · eed8b691783264013142ed0273e08f5a7f913569 · Anton / libtcg

Jan 24, 2020

Similar to g_string_free(), optionally return the underlying char*.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20200110153039.1379601-10-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

164c374b

Aug 21, 2019

json: Move switch 'fall through' comment to correct place · 6f0dd6c5

Philippe Mathieu-Daudé authored 5 years ago


Reported by GCC9 when building with CFLAG -Wimplicit-fallthrough=2:

  qobject/json-parser.c: In function ‘parse_literal’:
  qobject/json-parser.c:492:24: error: this statement may fall through [-Werror=implicit-fallthrough=]
    492 |     case JSON_INTEGER: {
        |                        ^
  qobject/json-parser.c:524:5: note: here
    524 |     case JSON_FLOAT:
        |     ^~~~

Correctly place the 'fall through' comment.

Reported-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190719131425.10835-2-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

6f0dd6c5

Jun 11, 2019

qemu-common: Move qemu_isalnum() etc. to qemu/ctype.h · 856dfd8a

Markus Armbruster authored 5 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-3-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

856dfd8a

Mar 26, 2019

json: Fix off-by-one assert check in next_state() · 19e8ff48

Liam Merwick authored 6 years ago


The assert checking if the value of lexer->state in next_state(),
which is used as an index to the 'json_lexer' array, incorrectly
checks for an index value less than or equal to ARRAY_SIZE(json_lexer).
Fix assert so that it just checks for an index less than the array size.

Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <1553169472-25325-1-git-send-email-liam.merwick@oracle.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

19e8ff48

Jan 24, 2019

json: Fix % handling when not interpolating · bbc0586c

Christophe Fergeau authored 6 years ago

Commit 8bca4613 added support for %% in json strings when interpolating,
but in doing so broke handling of % when not interpolating.

When parse_string() is fed a string token containing '%', it skips the
'%' regardless of ctxt->ap, i.e. even it's not interpolating.  If the
'%' is the string's last character, it fails an assertion.  Else, it
"merely" swallows the '%'.

Fix parse_string() to handle '%' specially only when interpolating.

To gauge the bug's impact, let's review non-interpolating users of this
parser, i.e. code passing NULL context to json_message_parser_init():

* tests/check-qjson.c, tests/test-qobject-input-visitor.c,
  tests/test-visitor-serialization.c

  Plenty of tests, but we still failed to cover the buggy case.

* monitor.c: QMP input

* qga/main.c: QGA input

* qobject_from_json():

  - qobject-input-visitor.c: JSON command line option arguments of
    -display and -blockdev

    Reproducer: -blockdev '{"%"}'

  - block.c: JSON pseudo-filenames starting with "json:"

    Reproducer: https://bugzilla.redhat.com/show_bug.cgi?id=1668244#c3



  - block/rbd.c: JSON key pairs

    Pseudo-filenames starting with "rbd:".

Command line, QMP and QGA input are trusted.

Filenames are trusted when they come from command line, QMP or HMP.
They are untrusted when they come from from image file headers.
Example: QCOW2 backing file name.  Note that this is *not* the security
boundary between host and guest.  It's the boundary between host and an
image file from an untrusted source.

Neither failing an assertion nor skipping a character in a filename of
your choice looks exploitable.  Note that we don't support compiling
with NDEBUG.

Fixes: 8bca4613
Cc: qemu-stable@nongnu.org
Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
Message-Id: <20190102140535.11512-1-cfergeau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
[Commit message extended to discuss impact]
Signed-off-by: Markus Armbruster <armbru@redhat.com>

bbc0586c

Dec 13, 2018

json: Fix to reject duplicate object member names · 00382fa8

Markus Armbruster authored 6 years ago


The JSON parser happily accepts duplicate object member names.  The
last value wins.  Reproducer #1:

    $ qemu-system-x86_64 -qmp stdio
    {"QMP": {"version": {"qemu": {"micro": 93, "minor": 0, "major": 3},
    "package": "v3.1.0-rc3-7-g87a45d86ed"}, "capabilities": []}}
    {'execute':'qmp_capabilities'}
    {"return": {}}
    {'execute':'blockdev-add','arguments':{'driver':'null-co',
     'node-name':'foo','node-name':'bar'}}
    {"return": {}}
    {'execute':'query-named-block-nodes'}
    {"return": [{ [...] "node-name": "bar" [...] }]}

Reproducer #2 is iotest 229.

Fix the parser to reject duplicates, and fix iotest 229 not to use
them.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181206121743.20762-1-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Trailing whitespace tidied up]
Signed-off-by: Markus Armbruster <armbru@redhat.com>

00382fa8

Oct 26, 2018

qobject: Catch another straggler for use of qdict_put_str() · 73969720

Philippe Mathieu-Daudé authored 6 years ago


Patch created mechanically by rerunning:

  $  spatch --sp-file scripts/coccinelle/qobject.cocci \
            --macro-file scripts/cocci-macro-file.h \
            --dir . --in-place

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20180705155811.20366-2-f4bug@amsat.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

73969720

Sep 24, 2018

json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP · 1e960b46

Markus Armbruster authored 6 years ago


The lexer ignores whitespace like this:

         on whitespace      on non-ws   spontaneously
    IN_START --> IN_WHITESPACE --> JSON_SKIP --> IN_START
                    ^    |
                     \__/  on whitespace

This accumulates a whitespace token in state IN_WHITESPACE, only to
throw it away on the transition via JSON_SKIP to the start state.
Wasteful.  Go from IN_START to IN_START on whitespace directly,
dropping the whitespace character.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180831075841.13363-7-armbru@redhat.com>

1e960b46

json: Eliminate lexer state IN_ERROR · 2ce4ee64

Markus Armbruster authored 6 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180831075841.13363-6-armbru@redhat.com>

2ce4ee64

json: Nicer recovery from lexical errors · 0f07a5d5

Markus Armbruster authored 6 years ago


When the lexer chokes on an input character, it consumes the
character, emits a JSON error token, and enters its start state.  This
can lead to suboptimal error recovery.  For instance, input

    0123 ,

produces the tokens

    JSON_ERROR    01
    JSON_INTEGER  23
    JSON_COMMA    ,

Make the lexer skip characters after a lexical error until a
structural character ('[', ']', '{', '}', ':', ','), an ASCII control
character, or '\xFE', or '\xFF'.

Note that we must not skip ASCII control characters, '\xFE', '\xFF',
because those are documented to force the JSON parser into known-good
state, by docs/interop/qmp-spec.txt.

The lexer now produces

    JSON_ERROR    01
    JSON_COMMA    ,

Update qmp-test for the nicer error recovery: QMP now reports just one
error for input %p instead of two.  Also drop the newline after %p; it
was needed to tease out the second error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180831075841.13363-5-armbru@redhat.com>
[Conflict with commit ebb4d82d resolved]

0f07a5d5

json: Make lexer's "character consumed" logic less confusing · c0ee3afa

Markus Armbruster authored 6 years ago


The lexer uses macro TERMINAL_NEEDED_LOOKAHEAD() to decide whether a
state transition consumes the input character.  It returns true when
the state transition is defined with the TERMINAL() macro.  To detect
that, it checks whether input '\0' would have resulted in the same
state transition, and the new state is not IN_ERROR.

Why does that even work?  For all states, the new state on input '\0'
is either IN_ERROR or defined with TERMINAL().  If the state
transition equals the one we'd get for input '\0', it goes to IN_ERROR
or to the argument of TERMINAL().  We never use TERMINAL(IN_ERROR),
because it makes no sense.  Thus, if it doesn't go to IN_ERROR, it
must be defined with TERMINAL().

Since this isn't quite confusing enough, we negate the result to get
@char_consumed, and ignore it when @flush is true.

Instead of deriving the lookahead bit from the state transition, make
it explicit.  This is easier to understand, and a bit more flexible,
too.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180831075841.13363-4-armbru@redhat.com>

c0ee3afa

json: Clean up how lexer consumes "end of input" · 852dfa76

Markus Armbruster authored 6 years ago


When the lexer isn't in its start state at the end of input, it's
working on a token.  To flush it out, it needs to transit to its start
state on "end of input" lookahead.

There are two ways to the start state, depending on the current state:

* If the lexer is in a TERMINAL(JSON_FOO) state, it can emit a
  JSON_FOO token.

* Else, it can go to IN_ERROR state, and emit a JSON_ERROR token.

There are complications, however:

* The transition to IN_ERROR state consumes the input character and
  adds it to the JSON_ERROR token.  The latter is inappropriate for
  the "end of input" character, so we suppress that.  See also recent
  commit a2ec6be7 "json: Fix lexer to include the bad character in
  JSON_ERROR token".

* The transition to a TERMINAL(JSON_FOO) state doesn't consume the
  input character.  In that case, the lexer normally loops until it is
  consumed.  We have to suppress that for the "end of input" input
  character.  If we didn't, the lexer would consume it by entering
  IN_ERROR state, emitting a bogus JSON_ERROR token.  We fixed that in
  commit bd3924a3.

However, simply breaking the loop this way assumes that the lexer
needs exactly one state transition to reach its start state.  That
assumption is correct now, but it's unclean, and I'll soon break it.
Clean up: instead of breaking the loop after one iteration, break it
after it reached the start state.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180831075841.13363-3-armbru@redhat.com>

852dfa76

json: Fix lexer for lookahead character beyond '\x7F' · 2a96042a

Markus Armbruster authored 6 years ago


The lexer fails to end a valid token when the lookahead character is
beyond '\x7F'.  For instance, input

    true\xC2\xA2

produces the tokens

    JSON_ERROR     true\xC2
    JSON_ERROR     \xA2

This should be

    JSON_KEYWORD   true
    JSON_ERROR     \xC2
    JSON_ERROR     \xA2

instead.

The culprit is

    #define TERMINAL(state) [0 ... 0x7F] = (state)

It leaves [0x80..0xFF] zero, i.e. IN_ERROR.  Has always been broken.
Fix it to initialize the complete array.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180831075841.13363-2-armbru@redhat.com>

2a96042a

Aug 24, 2018

json: Update references to RFC 7159 to RFC 8259 · 37aded92

Markus Armbruster authored 6 years ago


RFC 8259 (December 2017) obsoletes RFC 7159 (March 2014).

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180823164025.12553-59-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

37aded92

json: Support %% in JSON strings when interpolating · 8bca4613

Markus Armbruster authored 6 years ago


The previous commit makes JSON strings containing '%' awkward to
express in templates: you'd have to mask the '%' with an Unicode
escape \u0025.  No template currently contains such JSON strings.
Support the printf conversion specification %% in JSON strings as a
convenience anyway, because it's trivially easy to do.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-58-armbru@redhat.com>

8bca4613

json: Improve safety of qobject_from_jsonf_nofail() & friends · 16a48599

Markus Armbruster authored 6 years ago


The JSON parser optionally supports interpolation.  This is used to
build QObjects by parsing string templates.  The templates are C
literals, so parse errors (such as invalid interpolation
specifications) are actually programming errors.  Consequently, the
functions providing parsing with interpolation
(qobject_from_jsonf_nofail(), qobject_from_vjsonf_nofail(),
qdict_from_jsonf_nofail(), qdict_from_vjsonf_nofail()) pass
&error_abort to the parser.

However, there's another, more dangerous kind of programming error:
since we use va_arg() to get the value to interpolate, behavior is
undefined when the variable argument isn't consistent with the
interpolation specification.

The same problem exists with printf()-like functions, and the solution
is to have the compiler check consistency.  This is what
GCC_FMT_ATTR() is about.

To enable this type checking for interpolation as well, we carefully
chose our interpolation specifications to match printf conversion
specifications, and decorate functions parsing templates with
GCC_FMT_ATTR().

Note that this only protects against undefined behavior due to type
errors.  It can't protect against use of invalid interpolation
specifications that happen to be valid printf conversion
specifications.

However, there's still a gaping hole in the type checking: GCC
recognizes '%' as start of printf conversion specification anywhere in
the template, but the parser recognizes it only outside JSON strings.
For instance, if someone were to pass a "{ '%s': %d }" template, GCC
would require a char * and an int argument, but the parser would
va_arg() only an int argument, resulting in undefined behavior.

Avoid undefined behavior by catching the programming error at run
time: have the parser recognize and reject '%' in JSON strings.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-57-armbru@redhat.com>

16a48599

json: Keep interpolation state in JSONParserContext · ada74c3b

Markus Armbruster authored 6 years ago


The recursive descent parser passes along a pointer to
JSONParserContext.  It additionally passes a pointer to interpolation
state (a va_alist *) as needed to reach its consumer
parse_interpolation().

Stuffing the latter pointer into JSONParserContext saves us the
trouble of passing it along, so do that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-56-armbru@redhat.com>

ada74c3b

json: Clean up headers · 86cdf9ec

Markus Armbruster authored 6 years ago


The JSON parser has three public headers, json-lexer.h, json-parser.h,
json-streamer.h.  They all contain stuff that is of no interest
outside qobject/json-*.c.

Collect the public interface in include/qapi/qmp/json-parser.h, and
everything else in qobject/json-parser-int.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-54-armbru@redhat.com>

86cdf9ec

qobject: Drop superfluous includes of qemu-common.h · 812ce33e

Markus Armbruster authored 6 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-53-armbru@redhat.com>

812ce33e

json: Make JSONToken opaque outside json-parser.c · abe7c206

Markus Armbruster authored 6 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-52-armbru@redhat.com>

abe7c206

json: Unbox tokens queue in JSONMessageParser · a2731e08

Markus Armbruster authored 6 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-51-armbru@redhat.com>

a2731e08

json: Streamline json_message_process_token() · 8d3265b3

Markus Armbruster authored 6 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-50-armbru@redhat.com>

8d3265b3

json: Enforce token count and size limits more tightly · da09cfbf

Markus Armbruster authored 6 years ago


Token count and size limits exist to guard against excessive heap
usage.  We check them only after we created the token on the heap.
That's assigning a cowboy to the barn to lasso the horse after it has
bolted.  Close the barn door instead: check before we create the
token.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-49-armbru@redhat.com>

da09cfbf

qjson: Have qobject_from_json() & friends reject empty and blank · dd98e848

Markus Armbruster authored 6 years ago


The last case where qobject_from_json() & friends return null without
setting an error is empty or blank input.  Callers:

* block.c's parse_json_protocol() reports "Could not parse the JSON
  options".  It's marked as a work-around, because it also covered
  actual bugs, but they got fixed in the previous few commits.

* qobject_input_visitor_new_str() reports "JSON parse error".  Also
  marked as work-around.  The recent fixes have made this unreachable,
  because it currently gets called only for input starting with '{'.

* check-qjson.c's empty_input() and blank_input() demonstrate the
  behavior.

* The other callers are not affected since they only pass input with
  exactly one JSON value or, in the case of negative tests, one error.

Fail with "Expecting a JSON value" instead of returning null, and
simplify callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-48-armbru@redhat.com>

dd98e848

json: Assert json_parser_parse() consumes all tokens on success · 5d50113c

Markus Armbruster authored 6 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-47-armbru@redhat.com>

5d50113c

json: Fix streamer not to ignore trailing unterminated structures · f9277915

Markus Armbruster authored 6 years ago


json_message_process_token() accumulates tokens until it got the
sequence of tokens that comprise a single JSON value (it counts curly
braces and square brackets to decide).  It feeds those token sequences
to json_parser_parse().  If a non-empty sequence of tokens remains at
the end of the parse, it's silently ignored.  check-qjson.c cases
unterminated_array(), unterminated_array_comma(), unterminated_dict(),
unterminated_dict_comma() demonstrate this bug.

Fix as follows.  Introduce a JSON_END_OF_INPUT token.  When the
streamer receives it, it feeds the accumulated tokens to
json_parser_parse().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-46-armbru@redhat.com>

f9277915

json: Fix latent parser aborts at end of input · e06d008a

Markus Armbruster authored 6 years ago


json-parser.c carefully reports end of input like this:

    token = parser_context_pop_token(ctxt);
    if (token == NULL) {
        parse_error(ctxt, NULL, "premature EOI");
        goto out;
    }

Except parser_context_pop_token() can't return null, it fails its
assertion instead.  Same for parser_context_peek_token().  Broken in
commit 65c0f1e9, and faithfully preserved in commit 95385fe9.
Only a latent bug, because the streamer throws away any input that
could trigger it.

Drop the assertions, so we can fix the streamer in the next commit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-45-armbru@redhat.com>

e06d008a

qjson: Fix qobject_from_json() & friends for multiple values · 2a4794ba

Markus Armbruster authored 6 years ago


qobject_from_json() & friends use the consume_json() callback to
receive either a value or an error from the parser.

When they are fed a string that contains more than either one JSON
value or one JSON syntax error, consume_json() gets called multiple
times.

When the last call receives a value, qobject_from_json() returns that
value.  Any other values are leaked.

When any call receives an error, qobject_from_json() sets the first
error received.  Any other errors are thrown away.

When values follow errors, qobject_from_json() returns both a value
and sets an error.  That's bad.  Impact:

* block.c's parse_json_protocol() ignores and leaks the value.  It's
  used to to parse pseudo-filenames starting with "json:".  The
  pseudo-filenames can come from the user or from image meta-data such
  as a QCOW2 image's backing file name.

* vl.c's parse_display_qapi() ignores and leaks the error.  It's used
  to parse the argument of command line option -display.

* vl.c's main() case QEMU_OPTION_blockdev ignores the error and leaves
  it in @err.  main() will then pass a pointer to a non-null Error *
  to net_init_clients(), which is forbidden.  It can lead to assertion
  failure or other misbehavior.

* check-qjson.c's multiple_values() demonstrates the badness.

* The other callers are not affected since they only pass strings with
  exactly one JSON value or, in the case of negative tests, one
  error.

The impact on the _nofail() functions is relatively harmless.  They
abort when any call receives an error.  Else they return the last
value, and leak the others, if any.

Fix consume_json() as follows.  On the first call, save value and
error as before.  On subsequent calls, if any, don't save them.  If
the first call saved a value, the next call, if any, replaces the
value by an "Expecting at most one JSON value" error.  Take care not
to leak values or errors that aren't saved.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-44-armbru@redhat.com>

2a4794ba

json: Improve names of lexer states related to numbers · 4d400661

Markus Armbruster authored 6 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-43-armbru@redhat.com>

4d400661

json: Replace %I64d, %I64u by %PRId64, %PRIu64 · 53a0d616

Markus Armbruster authored 6 years ago


Support for %I64d got added in commit 2c0d4b36 "json: fix PRId64 on
Win32".  We had to hard-code I64d because we used the lexer's finite
state machine to check interpolations.  No more, so clean this up.

Additional conversion specifications would be easy enough to implement
when needed.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-42-armbru@redhat.com>

53a0d616

json: Leave rejecting invalid interpolation to parser · f7617d45

Markus Armbruster authored 6 years ago


Both lexer and parser reject invalid interpolation specifications.
The parser's check is useless.

The lexer ends the token right after the first bad character.  This
tends to lead to suboptimal error reporting.  For instance, input

    [ %04d ]

produces the tokens

    JSON_LSQUARE  [
    JSON_ERROR    %0
    JSON_INTEGER  4
    JSON_KEYWORD  d
    JSON_RSQUARE  ]

The parser then yields an error, an object and two more errors:

    error: Invalid JSON syntax
    object: 4
    error: JSON parse error, invalid keyword
    error: JSON parse error, expecting value

Dumb down the lexer to accept [A-Za-z0-9]*.  The parser's check is now
used.  Emit a proper error there.

The lexer now produces

    JSON_LSQUARE  [
    JSON_INTERP   %04d
    JSON_RSQUARE  ]

and the parser reports just

    JSON parse error, invalid interpolation '%04d'

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-41-armbru@redhat.com>

f7617d45

json: Pass lexical errors and limit violations to callback · 84a56f38

Markus Armbruster authored 6 years ago


The callback to consume JSON values takes QObject *json, Error *err.
If both are null, the callback is supposed to make up an error by
itself.  This sucks.

qjson.c's consume_json() neglects to do so, which makes
qobject_from_json() null instead of failing.  I consider that a bug.

The culprit is json_message_process_token(): it passes two null
pointers when it runs into a lexical error or a limit violation.  Fix
it to pass a proper Error object then.  Update the callbacks:

* monitor.c's handle_qmp_command(): the code to make up an error is
  now dead, drop it.

* qga/main.c's process_event(): lumps the "both null" case together
  with the "not a JSON object" case.  The former is now gone.  The
  error message "Invalid JSON syntax" is misleading for the latter.
  Improve it to "Input must be a JSON object".

* qobject/qjson.c's consume_json(): no update; check-qjson
  demonstrates qobject_from_json() now sets an error on lexical
  errors, but still doesn't on some other errors.

* tests/libqtest.c's qmp_response(): the Error object is now reliable,
  so use it to improve the error message.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-40-armbru@redhat.com>

84a56f38

json: Treat unwanted interpolation as lexical error · 2cbd15aa

Markus Armbruster authored 6 years ago


The JSON parser optionally supports interpolation.  The lexer
recognizes interpolation tokens unconditionally.  The parser rejects
them when interpolation is disabled, in parse_interpolation().
However, it neglects to set an error then, which can make
json_parser_parse() fail without setting an error.

Move the check for unwanted interpolation from the parser's
parse_interpolation() into the lexer's finite state machine.  When
interpolation is disabled, '%' is now handled like any other
unexpected character.

The next commit will improve how such lexical errors are handled.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-39-armbru@redhat.com>

2cbd15aa

json: Rename token JSON_ESCAPE & friends to JSON_INTERP · 61030280

Markus Armbruster authored 6 years ago


The JSON parser optionally supports interpolation.  The code calls it
"escape".  Awkward, because it uses the same term for escape sequences
within strings.  The latter usage is consistent with RFC 8259 "The
JavaScript Object Notation (JSON) Data Interchange Format" and ISO C.
Call the former "interpolation" instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-38-armbru@redhat.com>

61030280

json: Don't create JSON_ERROR tokens that won't be used · 269e57ae

Markus Armbruster authored 6 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-37-armbru@redhat.com>

269e57ae

json: Don't pass null @tokens to json_parser_parse() · ff281a27

Markus Armbruster authored 6 years ago


json_parser_parse() normally returns the QObject on success.  Except
it returns null when its @tokens argument is null.

Its only caller json_message_process_token() passes null @tokens when
emitting a lexical error.  The call is a rather opaque way to say json
= NULL then.

Simplify matters by lifting the assignment to json out of the emit
path: initialize json to null, set it to the value of
json_parser_parse() when there's no lexical error.  Drop the special
case from json_parser_parse().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-36-armbru@redhat.com>

ff281a27

json: Redesign the callback to consume JSON values · 62815d85

Markus Armbruster authored 6 years ago


The classical way to structure parser and lexer is to have the client
call the parser to get an abstract syntax tree, the parser call the
lexer to get the next token, and the lexer call some function to get
input characters.

Another way to structure them would be to have the client feed
characters to the lexer, the lexer feed tokens to the parser, and the
parser feed abstract syntax trees to some callback provided by the
client.  This way is more easily integrated into an event loop that
dispatches input characters as they arrive.

Our JSON parser is kind of between the two.  The lexer feeds tokens to
a "streamer" instead of a real parser.  The streamer accumulates
tokens until it got the sequence of tokens that comprise a single JSON
value (it counts curly braces and square brackets to decide).  It
feeds those token sequences to a callback provided by the client.  The
callback passes each token sequence to the parser, and gets back an
abstract syntax tree.

I figure it was done that way to make a straightforward recursive
descent parser possible.  "Get next token" becomes "pop the first
token off the token sequence".  Drawback: we need to store a complete
token sequence.  Each token eats 13 + input characters + malloc
overhead bytes.

Observations:

1. This is not the only way to use recursive descent.  If we replaced
   "get next token" by a coroutine yield, we could do without a
   streamer.

2. The lexer reports errors by passing a JSON_ERROR token to the
   streamer.  This communicates the offending input characters and
   their location, but no more.

3. The streamer reports errors by passing a null token sequence to the
   callback.  The (already poor) lexical error information is thrown
   away.

4. Having the callback receive a token sequence duplicates the code to
   convert token sequence to abstract syntax tree in every callback.

5. Known bug: the streamer silently drops incomplete token sequences.

This commit rectifies 4. by lifting the call of the parser from the
callbacks into the streamer.  Later commits will address 3. and 5.

The lifting removes a bug from qjson.c's parse_json(): it passed a
pointer to a non-null Error * in certain cases, as demonstrated by
check-qjson.c.

json_parser_parse() is now unused.  It's a stupid wrapper around
json_parser_parse_err().  Drop it, and rename json_parser_parse_err()
to json_parser_parse().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-35-armbru@redhat.com>

62815d85

json: Have lexer call streamer directly · 037f2440

Markus Armbruster authored 6 years ago


json_lexer_init() takes the function to process a token as an
argument.  It's always json_message_process_token().  Makes the code
harder to understand for no actual gain.  Drop the indirection.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-34-armbru@redhat.com>

037f2440

json-parser: simplify and avoid JSONParserContext allocation · e8b19d7d

Marc-André Lureau authored 6 years ago


parser_context_new/free() are only used from json_parser_parse(). We
can fold the code there and avoid an allocation altogether.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180719184111.5129-9-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180823164025.12553-33-armbru@redhat.com>

e8b19d7d

json: remove useless return value from lexer/parser · 7c1e1d54

Marc-André Lureau authored 6 years ago


The lexer always returns 0 when char feeding. Furthermore, none of the
caller care about the return value.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180326150916.9602-10-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180823164025.12553-32-armbru@redhat.com>

7c1e1d54