AQUA analysis tools
The DBAS file format
For its output Aqua uses an "intelligent" format, dubbed DBAS, which tries to
combine readability with ease of computer parsing.
The output uses both upper and lower case letters for readability. Keyword
searches, however, should be case-insensitive.
The output is mostly in fixed-width formats, but this is just to enhance
readability. Parsing should be based on extracting whitespace delimited words,
i.e. a field- rather than column-oriented interpretation should be used.
The 'qdbext' script can be used to extract individual data blocks or items,
with or without their header (see Using qdbext).
Basic introduction
Note: in the text below text examples are shown in bold or italic.
Italic text is variable, bold text should be given as shown.
DBAS files have multiple data blocks, each beginning with
$ DATA Identification_string
and ending with
Most data blocks are formatted as a table. This is indicated by a line such as
$ TABLE #RECORDS i #ITEMS j
appearing as the second line of the block. The integers i and
j correspond to the number of records (lines) and items (colums) of
the table. The third line of the TABLE data block is a line giving column
headings; this line is not included in the record count.
A special data block between
and
explains the various column headings used in the file.
Other lines starting with $ can be treated as comments.
Full description
Meta-info lines
- Every meta-info line begins with "$ ".
Block structure
- The output is structured into blocks, each block beginning with
and ending with
- The meaning of (non-obvious) column headings may be given in a block between
and
- In another application a special one-line statement has been introduced:
which defines an overall identifier, e.g. to uniquely identify a file.
The qdbext script reads and prints this identifier (cf.
Using qdbext).
- Remaining lines beginning with $ are to be treated as comments.
In the Aqua output this involves e.g. the file header, describing the file
and its origin (maybe I should put this in a block also).
- Data output is grouped in blocks between
and
The string data_header can be used to extract only a specified block.
(At present this string can be rather long, to keep the output readable.
Maybe I should use shorter descriptors, and add a separate data header line).
The data type is defined on the second line of the DATA block.
- Remarks may be inserted into DATA blocks by a line formatted as
Data types
- The DATA blocks may contain different types of data. Each type is indicated
by a keyword, which is specified on the second line of the block.
- In the Aqua output two types are used: LIST and TABLE.
LIST gives a list of items, each on a separate line and identified by
an item name.
TABLE gives a table of data organized in rows and colums, each column
identified by an item name.
- In another application I have also introduced a TEXT type: a line of text.
- Currently it is assumed that every block contains only a single type of data.
LIST data type
- The LIST data block follows the $ DATA data_header line with
where i is the number of records (rows) in the block, not counting
the meta-info lines.
- The next lines, up to $ END are the data. The first word of every line
is the item identifier, the remainder of the line is the item contents.
Often the contents will consist of a single word or number, which effectively
turns the LIST data type into a set of name-value pairs.
In another context, though, I have allowed the contents to be a line of text.
This freedom may be restricted in the future.
TABLE data type
- The TABLE data block follows the $ DATA data_header line with
$ TABLE #RECORDS i #ITEMS j
where i is the number of records (rows) and j is the number of
items (colums).
The row count should exclude the meta-info lines.
- The next line gives the column headings. These headings are thought of as
keywords, that can be used in a script to address a specific column.
If necessary, the headers are explained elsewhere in the file.
Currently it is not forbidden to repeat a column heading within a table,
although this is not ideal, obviously.
- The next lines, up to $ END are the row-by-column formatted data.
TEXT data type
- The TEXT data block follows the $ DATA data_header line with
- More than one $ TEXT lines can be included.