yoy.be "Why-o-Why"


2017-09-26 10:39  momoa  coding  [permalink]

We've had XML. We've had JSON. There's a thing called YAML. And then there's Protocol Buffers and Thrift and a number of others.

And still, with each there's something is not quite right. So here is yet another proposition, humbly offered for adoption:

Binary. Why binary? Parsing speed matters. There's a belief that binary is not human readable, but:

ASCII control codes. Why ASCII control codes? They're out of use. Except 0x0A (and 0x0D) for new lines and 0x09 for tabs. I've come across a 0x0C and 0x1B when talking to printers, but that's it. And all modern editors know what to do with them. Best case may even be they show them as something foreign, but still they're visibliy right there with the other text.

A list of keys and values. It's tempting to provide structure and clearly indicate which is what, but it's unneccessary. A parser is smart enough to know these come two by two, and to pair them up when handing over to something for processing.

Types of values. A value has a preceding byte denoting what it is, and what rule to follow for the succeeding bytes.

0x02 string: read the string up to the next type byte. If a type byte needs to be actually part of the string, escape it with 0x07

0x03 number:  read a string up to the next type byte and convert it from text notation to something numeric. Depending on the context it may be something specific or variadic. By using the text notation we retain some human readability, and also get an acceptable storage to information ratio (smaller numbers take less bytes)

0x05 boolean true: with nothing more

0x06 boolean false: same as above but with opposite value

0x01 embedded key-value list: treat the following sequence of key-value pairs, delimited by a 0x04 closing type byte, as an embedded list

0x08 array: treat the following as a sequence of values only, delimited by a 0x04 closing type byte

There's no specific type byte assigned to null or undefined, but can be encoded as a single 0x03 without data following it.

Keys are themselves values, typically of type string (0x02). A possible permissible exception in specific contexts may be to encode sparse arrays as an embedded list (0x01) where all keys are of type number (0x03).

And now for a name for it... I know, let's type Jason into IMDB... Sounds nice, and serves as a tribute to the artist. So let the file extension be ".momo" and the MIME type be "application/momoa"