YATT::Lite::XHF::Syntax - Extended Header Fields (XHF) format.
require YATT::Lite::XHF; my $parser = YATT::Lite::XHF->new(string => <<'END'); # Taken from http://docs.ansible.com/YAMLSyntax.html#yaml-basics name: Example Developer job: Developer skill: Elite employed: 1 foods[ - Apple - Orange - Strawberry - Mango ] languages{ ruby: Elite python: Elite dotnet: Lame } name: hkoba languages{ yatt: Elite? } END # read() returns one set of parsed result by one paragraph, separated by \n\n+. # In array context, you will get a flattened list of items in one paragraph. # (It may usually be a list of key-value pairs, but you can write other types) # In scalar context, you will get a hash struct. while (my %hash = $parser->read) { print Dumper(\%hash), "\n"; }
Extended Header Fields (XHF) format, which I'm defining here, is a data format based on Email header (and HTTP header) , with extension to carry nested data structures. To load XHF files/strings, use YATT::Lite::XHF.
Note: Although there is a serializer for XHF (YATT::Lite::XHF::Dumper), XHF is specifically designed to help programmers writing test data for unit tests, instead of to be a perfect serializer for perl (ie. XHF doesn't support self-referencing data structures. It is not my design goal). If you want such complex serializer, you should use YAML family, Storable and such instead.
For simplest cases, YAML and XHF may look fairly similar. For example, a hash structure {foo => 1, bar => 2}
can be written in a same way both in YAML and in XHF:
foo: 1 bar: 2
However, if you serialize a structure {x => [1, 2, "3, 4"], y => 5}
, you will notice significant differences.
In XHF, above will be written as:
{ x[ - 1 - 2 - 3, 4 ] y: 5 }
In contrast in YAML, same structure will be written as:
--- x: - 1 - 2 - '3, 4' y: 5
The differences are:
{} []
. YAML uses indents.3, 4
as is. YAML needs to escape it like '3, 4'
.
In XHF, you only need to escape \n
(and leading/trailing SPACE
, TAB
, if you need) for each value-part. In other words, there is no syntax for value-part so you don't need to worry about which characters must be escaped.
Just substitute all "\n"
with "\n "
like s/\n/\n /g
.
eg. { foo => "1\n2\n\n3", bar => 4 }
can be written as:
foo: 1 2 3 bar: 4
Just start value with ":\n"
and follow same escaping rule for "\n"
.
eg. { foo => " x ", bar => "\n\ny\n\n" }
can be written as:
foo: x bar: y
In contrast to value-part, name-part has syntax restriction. name-part of XHF can contain only [[:alnum:]]
, "-"
, "."
, "/"
and some additional chars(see field-name definition in "BNF"). However, you can use two -
items to write name-value pairs interchangeably. So again, whenever you are not sure about allowed char, you can use -
notation and only escape \n
.
# For example, following block: foo: 1 bar: 2 # can be written as following: - foo - 1 - bar - 2
eg. { "foo bar" => "baz" }
can be written as:
{ - foo bar - baz }
And { "\n foo\nbar \n" => "baz" }
can be written as:
{ - foo bar - baz }
For nested elements, same applies.
foo{ x: 1 y: 2 } baz[ - z ] # can be written instead as following: - foo { x: 1 y: 2 } - baz [ - z ] # or even like following: - foo { - x - 1 - y - 2 } - baz [ - z ]
Also, you can put key: value
notation in arrays, like following:
[ foo: 1 bar: 2 ] # above is equal to following [ - foo - 1 - bar - 2 ]
Another important difference (you might notice in previous examples) is at container type selection (array or dict). In XHF, name-value separator determines "type of value" instead of "type of surrounding container".
In XHF, following block
foo: 1 bar: 2
just represents ( foo => 1, bar => 2 )
, which is flattened list of 4 items. This itself do not determine surrounding container type. Then you can choose outermost container type like
my %dict = $parser->read;
or
my @array = $parser->read;
When you call read()
via scalar context, you will get a dictionary (or an error when the block has odd number of items).
my $dict = $parser->read;
In contrast in YAML, :
always means map(dictionary). So, above will be always +{ foo => 1, bar => 2 }
.
Since outermost xhf-block means flattened list, you can use XHF to write down ordered key-value pair list with key duplicates, like following:
foo: 1 foo: 2 foo: 3 bar: x bar: y
If you read above with
my @array = $parser->read;
you can get @array == (foo => 1, foo => 2, foo => 3, bar => 'x', bar => 'y')
exactly.
This is important for some kind of test data (eg. HTTP query parameters and some of Email header fields like "Received"). For example, above is (equivalent of) valid output from following html form in HTTP:
<input type="checkbox" name="foo" value="1"> <input type="checkbox" name="foo" value="2"> <input type="checkbox" name="foo" value="3"> <input type="checkbox" name="bar" value="x"> <input type="checkbox" name="bar" value="y">
Note: currently, nested elements are deserialized as ordinally perl hash and array, so this order/dup-key preservation only works for outermost list.
XHF input stream is delimited by consecutive empty-line(s) "\n\n+"
(like Email header and HTTP header), designed to work well with traditional "paragraph mode" multi-line record format. For more about paragraph mode, see perl -00 and Setting $RS to "" in perldoc.
Note: in XHF, "comment-only" blocks are skipped silently. For example:
foo: 1 bar: 2 # Hey, here is a comment only block! baz: 3 qux: 4
Then this script:
my @records; push @records, $_ while $_ = $parser->read;
will result @records == ({foo => 1, bar => 2}, {baz => 3, qux => 4})
.
In rare case, you may want to prepend optional meta record in single stream. If you really want to do this, you can use "comment only" block to represent empty record and read it with read(skip_comment => 0)
like following:
# This is metainfo. To put test => 1, please remove leading "# " below: # test: 1 # This is body1 foo: 1 bar: 2 # This is body2 foo: 3 bar: 4
Then
if (my @meta = $parser->read(skip_comment => 0)) { # process metainfo. You may get (test => 1). } while (my @content = $parser->read) { # process body1, body2, ... }
Here is a more dense example in XHF:
name: hkoba # (1) You can write a comment line here, starting with '#'. job: Programming Language Designer (self-described;-) skill: Random employed: 0 foods[ - Sushi #(2) here too. You don't need space after '#'. This will be good for '#!' - Tonkatsu - Curry and Rice [ - More nested elements ] ] favorites[ # (3) here also. { title: Chaika - The Coffin Princess # (4) ditto. heroine: Chaika Trabant } { title: Witch Craft Works heroine: Ayaka Kagari # (5) You can use leading "-" for hash key/value too (so that include any chars) - Witch, Witch! - Tower and Workshop! } # (6) You can put NULL(undef) like below. (equal space sharp+keyword) = #null ]
Above will be loaded like following structure:
$VAR1 = { 'foods' => [ 'Sushi', 'Tonkatsu', 'Curry and Rice', [ 'More nested element' ] ], 'job' => 'Programming Language Designer (self-described;-)', 'name' => 'hkoba', 'employed' => '0', 'skill' => 'Random', 'favorites' => [ { 'heroine' => 'Chaika Trabant', 'title' => 'Chaika - The Coffin Princess' }, { 'title' => 'Witch Craft Works', 'heroine' => 'Ayaka Kagari', 'Witch, Witch!' => 'Tower and Workshop!' }, undef ] };
Above will be written in YAML like below (note: inline comments are omitted):
--- employed: 0 favorites: - heroine: Chaika Trabant title: 'Chaika - The Coffin Princess' - 'Witch, Witch!': Tower and Workshop! heroine: Ayaka Kagari title: Witch Craft Works - ~ foods: - Sushi - Tonkatsu - Curry and Rice - - More nested element job: Programming Language Designer (self-described;-) name: hkoba skill: Random
This YAML example clearly shows how you need to escape strings quite randomly, e.g. see above value of $VAR1->{favorites}[0]{title}
. Also the key of $VAR1->{favorites}[1]{'Witch, Witch!'}
is nightmare.
I don't want to be bothered by this kind of escaping. That's why I made XHF.
XHF are parsed one paragraph by one. Each paragraph can contain a set of xhf-item
s. Every xhf-items start from a fresh newline, ends with a newline and is basically formed like one of followings:
<name> <type-sigil> <sep> <value> (name-value pair) <type-sigil> <sep> <value> (standalone value)
type-sigil
defines type of value
. sep
is usually one of logical whitespace chars where space
, tab
and newline
(newline is used for verbatim text). But for block items(dict/array), only newline
is allowed.
Here is all kind of type-sigils:
"name:"
then " "
or "\n"
":"
is for ordinally text with name. MUST be prefixed by name
. sep
can be any of WS.
"-"
then " "
or "\n"
"-"
is for ordinally text without name. CANNOT be prefixed by name
.
(Note: Currently, ","
works same as "-"
. This feature is arguable.)
"{"
then "\n"
"name{"
then "\n"
"{"
is for dictionary block ( { %HASH }
container). Can be prefixed by name
.
MUST be closed by "}\n"
. Number of elements MUST be even.
"["
then "\n"
"name["
then "\n"
"["
is for array block. ( [ @ARRAY ]
container). Can be prefixed by name
.
MUST be closed by "]\n"
"="
then " "
or "\n"
"name="
then " "
or "\n"
"="
is for special values. Can be prefixed by name
.
Currently only #undef
and its synonym #null
is defined.
"#"
"#"
is for embedded comment line. CANNOT be prefixed by name
.
Here is a syntax definition of XHF in extended BNF (roughly following ABNF.)
xhf-block = 1*xhf-item xhf-item = field-pair / single-text / dict-block / array-block / special-expr / comment field-pair = field-name field-value field-name = 1*NAME *field-subscript field-subscript = "[" *NAME "]" field-value = ":" text-payload / dict-block / array-block / special-expr text-payload = ( trimmed-text / verbatim-text ) NL trimmed-text = SPTAB *( 1*NON-NL / NL SPTAB ) verbatim-text = NL *( 1*NON-NL / NL SPTAB ) single-text = "-" text-payload dict-block = "{" NL *xhf-item "}" NL array-block = "[" NL *xhf-item "]" NL special-expr = "=" SPTAB known-specials NL known-specials = "#" ("null" / "undef") comment = "#" *NON-NL NL NL = [\n] NON-NL = [^\n] SPTAB = [\ \t] NAME = [0-9A-Za-z_.-/~!]
field-name can contain /
, .
, ~
and !
. Former two are for file names (path separator and extension separator). Later two (and field-subscript) are incorporated just to help writing test input/output data for YATT::Lite, so these can be arguable for general use.
If field-name is separated by ": "
, its field-value will be trimmed their leading/trailing spaces/tabs. This is useful to handle hand-written configuration files.
But for some software-testing purpose(e.g. templating engine!), this space-trimming makes it impossible to write exact input/output data.
So, when field-sep is NL, field-value is not trimmed.
Currently, I'm not so rigid to reject the use of CRLF. This ambiguity may harm use of XHF as a serialization format, however.
","
can be used in-place of "-"
.This feature also may be arguable for general use.
":"
without name
was valid, but is now deprecated.Previously valid
: bar
which represents ( "" => "bar" )
, is now invalid. Please use two "- "
items like following:
- - bar
XXX: Hmm, should I provide deprecation cycle? Are there someone already used XHF to serialize important data even before having this manual? If so, please contact me. I will add an option to allow this.
Although line-continuation is obsoleted in HTTP headers, line-continuation will be kept valid in XHF spec. This is my preference.
"KOBAYASI, Hiroaki" <hkoba@cpan.org>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.