AQUA analysis tools

The DBAS file format

For its output Aqua uses an "intelligent" format, dubbed DBAS, which tries to combine readability with ease of computer parsing.

The output uses both upper and lower case letters for readability. Keyword searches, however, should be case-insensitive.

The output is mostly in fixed-width formats, but this is just to enhance readability. Parsing should be based on extracting whitespace delimited words, i.e. a field- rather than column-oriented interpretation should be used.

The 'qdbext' script can be used to extract individual data blocks or items, with or without their header (see Using qdbext).

Basic introduction

Note: in the text below text examples are shown in bold or italic. Italic text is variable, bold text should be given as shown.

DBAS files have multiple data blocks, each beginning with

$ DATA

Identification_string

and ending with

$ END

Most data blocks are formatted as a table. This is indicated by a line such as

$ TABLE #RECORDS

#ITEMS

appearing as the second line of the block. The integers i and j correspond to the number of records (lines) and items (colums) of the table. The third line of the TABLE data block is a line giving column headings; this line is not included in the record count.

A special data block between

$ EXPLAIN

and

$ END

explains the various column headings used in the file.

Other lines starting with $ can be treated as comments.

Full description

Meta-info lines

Every meta-info line begins with "$ ".

Block structure

The output is structured into blocks, each block beginning with
and ending with
The meaning of (non-obvious) column headings may be given in a block between
and
In another application a special one-line statement has been introduced:
which defines an overall identifier, e.g. to uniquely identify a file. The qdbext script reads and prints this identifier (cf. Using qdbext).
Remaining lines beginning with $ are to be treated as comments. In the Aqua output this involves e.g. the file header, describing the file and its origin (maybe I should put this in a block also).
Data output is grouped in blocks between
and
The string data_header can be used to extract only a specified block. (At present this string can be rather long, to keep the output readable. Maybe I should use shorter descriptors, and add a separate data header line). The data type is defined on the second line of the DATA block.
Remarks may be inserted into DATA blocks by a line formatted as

Data types

The DATA blocks may contain different types of data. Each type is indicated by a keyword, which is specified on the second line of the block.
In the Aqua output two types are used: LIST and TABLE. LIST gives a list of items, each on a separate line and identified by an item name. TABLE gives a table of data organized in rows and colums, each column identified by an item name.
In another application I have also introduced a TEXT type: a line of text.
Currently it is assumed that every block contains only a single type of data.

LIST data type

The LIST data block follows the $ DATA data_header line with
where i is the number of records (rows) in the block, not counting the meta-info lines.
The next lines, up to $ END are the data. The first word of every line is the item identifier, the remainder of the line is the item contents. Often the contents will consist of a single word or number, which effectively turns the LIST data type into a set of name-value pairs. In another context, though, I have allowed the contents to be a line of text. This freedom may be restricted in the future.

TABLE data type

The TABLE data block follows the $ DATA data_header line with
where i is the number of records (rows) and j is the number of items (colums). The row count should exclude the meta-info lines.
The next line gives the column headings. These headings are thought of as keywords, that can be used in a script to address a specific column. If necessary, the headers are explained elsewhere in the file. Currently it is not forbidden to repeat a column heading within a table, although this is not ideal, obviously.
The next lines, up to $ END are the row-by-column formatted data.

TEXT data type

The TEXT data block follows the $ DATA data_header line with
More than one $ TEXT lines can be included.