This document assumes a familiarity with the CSV escaping rules as specified in RFC 4180 ,specifically noting that leading and traling spaces within a field should NOT be trimmed when parsing into a string. (Wikipedia)
The CSV and CSVX formats include quotes as a special character therfore when quoting text italicised text is used in their place. For example Hello is a string not containing quotes but "Hello" contains quotes. Where a literal linebreak is needed to be quoted a tilda is used e.g. Line 1~Line 2.
The act of reading a CSVX byte-stream and converting it to a representative in-memory object is referred to as "Parsing", the act of converting that object into a new CSVX byte-stream is referred to as "Rendering".
CSVX, Comma-Seperated-Values-eXtended is an extension to the de-facto CSV standard to include meta data and columnar data typing. As it is an extension to CSV the standard is designed to exhibit backwards and forwards compatibility with the following requirements:
A CSVX Stream is delimited into Blocks, each being a section of the Stream having a particular purpose and laid out and parsed according to strict criteria. This specification dryly explains each block and the layout and parsing rules for each block, with examples where neccesary. These blocks are marked within the stream with the following headers.
Each block is marked in the Stream by it's ASCII literal byte-sequence between two newlines - with the exception of the CSVX header which begins a stream. Parsers may localise newlines but this specification recommends that LF (0x0A) be used. In order to prevent a stream from containing character sequences which mimic the block headings from breaking the Stream the following rules are mandated:
Where NAME corresponds to any of CSVX,META,USER,HEAD and DATA. This allows strings to contain the headers whilst preserving the original CSV escape rules unchanged. It also allows nesting of headers, i.e. [HEAD] will be rendered as [[HEAD]].
The CSVX block contains no internal information save for a version number, this version number is mandatory and a parser encountering a CSVX stream lacking a version number MUST either reject the Stream or fall-back to a basic CSV mode of operation.
00 | CSVX 01 | Version Number
The version number is a string and must be parsed and compared as such (i.e. not as a decimal number), it is expected that future extensions of the specification introduce new version numbers and that CSVX parsers are able to respond to this.
As no other information is enclosed in the CSVX Block the following line in the Stream should be one of the other section headers.
Whilst all sections are optional each section, if included, must still respect the ordering of the above list.
The META section contains information about the Stream itself, it is represented as rows of key,value pairs - CSV escaped where needed. Each key MUST have at least one associated value, a conforming parser should show an error alert when encountering an orphaned key. Keys in the META section that are NULL or not needed can be exluded when rendering a Stream.
The data stored in the META section is specific to the CSVX specification, however a parser should store and reform any unknown key/value pairs to preserve forward-compatibility. To prevent unauthorised specification extension any parser storing such key/value pairs should store them silently and NOT expose them, either by appending them to the USER collection or via any other means. E.g. a parser encountering a 1.0 CSVX Stream with 1.1 Page directives.
CSVX 1.0 META Title,My [CSVX] Stream DateCreated,2008-01-01
A basic CSVX META block, note that the Title parses as My CSVX Stream
All META keys are optional and currently only the following are supported:
| Key | Data Type | Min / Max | Description |
|---|---|---|---|
| Title | string | Max: 64 chars | Short name of the Stream.* |
| Author | string | Max: 64 chars | Name of the Stream creator. |
| Description | string | Max: 256 chars | Descriptive name of the Stream |
| DateCreated | datetime | Date the CSVX Stream was originally formed | |
| DateModified | datetime | Date of most recent change to the stream | |
| UID | string | Max: 256 chars | Unique identifier for the stream - usually a URL pointing to the origin of the stream itself. |
| Session | string | Max: 256 chars | A UID specific to the current user session if used in a client-server system. |
* May only contain ASCII Chars a-z,A-Z,0-9 and _. May not start with a number. May also include a . which parsers may treat as a namespace seperator
The USER section allows extra data regarding the stream to be stored in key,value pairs. The format and rules are as for the META but a key may hold a NULL value by simply including nothing to the right of the first comma.
The USER key/values can be in any format, provided they are CSV escaped where needed. However parsers should store their contents in a string collection and not attempt to determine the datatypes of each value.
CSVX 1.0 META Title,UserData Description,A [CSVX] Stream containing [USER] entries USER Draft Version,2 Edited By,"John,Dave,Chris" Note To John,How do these numbers grab you?
USER Example. Note that the value of 2 shown here would be interpreted as a string.
The HEAD Block containins lines describing the nature of each field / column in the datatable. This expands on the key/value nature of the META and USER sections by allowing multiple rows for each "key". If a row is to be represented without an earlier one being present that row must be included but left blank. Trailing unused rows need not be included
00 | HEAD 01 | Field Names 02 | Field Types
The Field Names line is just a CSV formatted list of names, these act as the
Keys to identify each column, so must be unique in the row. Field names must
not begin with a digit or an underscore.
[Reserved Chars? . as object seperator, _ prefix for versioning?]
[Forbidden Chars? ,\/*-+]
CSVX 1.0 HEAD ID,Name,Age,Gender,Registered DATA 1,John,20,M,1 2,Jane,30,F,1 3,Dave,40,M,0An example CSVX stream with named columns
The Field Types line is a list of data types for each Field.
They are currently formatted as a single type-defining character followed by the number of bytes (NOT characters, in the case of UTF strings) to be used according to the following rules:
Field Types:
| Char | Default Bytecount | Description | Valid DATA Block Characters / Format |
|---|---|---|---|
| b | - | Bit (True / False) | [10] |
| c | - | Currency (Decimal) | [-0123456789.] |
| d | - | Date | ccyy-MM-dd |
| e | - | Epoch (Date + Time) | ccyy-MM-ddTHH:mm:ss.sss |
| f | - | Float | [-0123456789.E] |
| i | 4 | Integer | [-0123456789] |
| s | 32767 | String | CSV escaped |
| t | - | Time | HH:mm:ss.sss |
| u | 4 | Unsigned Integer | [0123456789] |
Field Type Examples
| Code | Description |
|---|---|
| s | A 32767 (max) character string |
| s0 | A 32767 (max) character string |
| s8 | An 8 (max) character string |
| u1 | A one byte unsigned integer (0 to 255) |
| i2 | A two byte signed integer (-32768 to 32767) |
Example:
CSVX 1.0 META Title,Customers HEAD ID,Name,Registered,Country u,s32,b,c2 DATA 1,John,1,GB 2,Jane,,DE 3,Dave,0,DENote that Jane's Registered flag is NULL.
The data section should be a straightforward CSV formatted table of data whose columns and data types are formatted as specified in the HEAD section if included.
Data should only use the "valid" characters as listed in that table and as such floating points should be expressed in the fashion 1.234E5 where the number of digits would be prohibitive to write. A CSVX stream may be parsed and re-rendered to show different encodings without preserving the original text provided no information is lost. The previous example may, therefore, be re-rendered as 123400 whilst still remaining conformant.
CSVX (in relation to computer data storage and transmission formats) and the contents of this document are copyright © 2008 Ric Hardacre BSc.
The author (Ric Hardacre) is not aware of any existing copyright or patents in any country which may be relevant in relation to this specification or the methods involved in the implementation of this specification. Therefore it is expected that this specification is freely implimentable by any developer for use in any software, commercial or otherwise. However the author does not offer protection against legal action arising due to implimentation of this specification nor does the author offer any warranties against data loss, damage or any other losses incurred by any means due to any implementation of this standard.
There are no licensing fees associated with using CSVX in any software. However the author reserves the right to require software that advertises an ability to process or handle CSVX data to comply with the specification as laid out here and may therefore require the removal of said advertising associated with said product if this is not demonstrably the case.
I don't believe that there are any copyrights or patents that cover CSVX, but I may be wrong, so please email me and let me know if you find any. In any case if you use CSVX and Bad Things happen to you, your data, property or pets then please don't blame me! You can write software that uses CSVX and yell it from the rooftops and plaster it all over your website, and it's OK if there's a bug in there that means it doesn't get things quite right - so long as you get around to fixing it. But if you attempt to Embrace and Extend CSVX I can tell you not to use the term "CSVX" anywhere in association with that software.