Steve Pedler discovers just how all that data is stored on
your disks
Although knowledge of the structure of files
stored on disk is not necessary in order to use a disk drive, the
subject is an interesting one and information about it is essential
if you wish to carry out certain tasks such as repairing damaged
files or creating boot programs. The following article examines the
structure of various types of disk file, and in the second part of
the article I will present a sector editor enabling you to directly
read and write to disk sectors.
All references to DOS and disk drives in the
article relate to the current Atari standard of 1050 drive and DOS
2.5, unless stated otherwise.
THE DISK ITSELF
A floppy disk consists of a thin, circular piece
of plastic coated with metal oxides which store the data in magnetic
form. As initially supplied the disk is not usable, and the surface
must first be organised to store data, by a process known as
formatting. The surface of a formatted disk is divided into 40
concentric tracks. Each track is in turn divided into 18 (single
density, or 26 enhanced density) sectors, each of which holds 128
bytes of data. Data is therefore packed rather more closely onto an
enhanced density disk, which means that the disk surface must be
higher quality to ensure reliable storage. In fact, the only
difference between disks designated by the manufacturer as single or
double density is that one has been tested for higher quality. Prior
to formatting the drive cannot distinguish between them. It is
important to use a quality disk as formatting a disk designated as
single density with DOS option I will automatically result in an
enhanced density format, which might lead to unreliable data
storage. To specifically format a disk in single density, use DOS
option P.
Once the disk is formatted, the 1050 (but not the
810) drive can distinguish between single and enhanced density and
use the disk accordingly. The 810 drive can use a single density
disk formatted on a 1050, but not an enhanced density one. Note that
DOS 2.OS can read an enhanced density disk in a 1050 drive, but
sectors numbered 720 or greater are invisible to it and files using
these sectors will be unavailable.
SECTOR NUMBERS
From the figures above, you will see that
theoretically a single density disk contains 720 sectors (40 tracks
* 18 sectors per track = 720 sectors) and an enhanced density disk
contains 1040 sectors. Examination of a freshly formatted disk (not
containing DOS files) shows however that you only have 707 or 1010
free sectors respectively. What happened to all those missing
sectors?
On a single density disk, as part of the format
process, eight sectors (361-368) are reserved for the disk directory
and a further sector (360) for the Volume Table of Contents (VTOC).
The structure and use of these sectors is described below. Three
more sectors (1-3) are reserved for the DOS file manager boot file
(see below). Finally, one sector is lost due a discrepancy between
the original version of DOS and the original disk drives. As far as
the drive is concerned, the 720 sectors on the disk are numbered
from 1 to 720, but DOS numbers them from 0-719. The result is that
sector 720 just does not exist as far as DOS is concerned. No doubt
this could have been corrected with later versions of DOS, but then
there would have been a loss of compatibility between the various
versions. Anyway, this makes a total of 13 unavailable sectors,
leaving 707 free for use. (Note that these sectors are only
unavailable within the confines of DOS - you can use any of them in
any way you like by bypassing DOS and doing direct sector-oriented
disk access.)
Although 1040 sectors are present on an enhanced
density disk, due to the file link structure DOS 2.5 cannot use
sector numbers greater than 1023. The reason for this will become
apparent when discussing linked sector files below. Of the 1023
sectors available, 12 are reserved for the directory, VTOC, and DOS
boot file as above. Although sectors numbered 720 or above can be
used by DOS 2.5, to ensure maximum compatibility with DOS 2.OS
sector 720 is marked as unavailable. This leaves 1010 sectors free
for use.
THE DIRECTORY
The directory consists of eight sectors starting
at sector 361. These were chosen, because they are in the middle of
a single density disk and therefore give the shortest average disk
access time. Each directory entry is 16 bytes long, giving eight
entries per sector and a total of 64 entries. The 16 bytes of each
entry are used as follows:
Byte 1 Flag or status byte. The various bits in
this byte, if set, have the following meanings:
bit 0
- special meaning for DOS 2.5 - see below
bit 1 - file created by DOS 2 (if this bit is clear, it is a DOS
1 file)
bits 2 -4 - spare
bit 5
- file is locked
bit 6 - entry in use (i.e. not that the file is OPEN, but that
this directory entry is valid and cannot be used for a new file)
bit 7 - file has been deleted
In most publications the setting of bit 0 of the
status byte is said to indicate that the file is OPEN. However,
under DOS 2.5 if this bit is set it appears to indicate that the
file uses sectors numbered 721 or greater, this file therefore being
unavailable to DOS 2.OS. When doing a directory read, DOS 2.5 will
bracket these files to indicate this to the user. Such files have
the value 3 in the directory entry status byte. (Not 67 as you might
expect from the list of bit values above. If you deliberately change
the value from 3 to 67 using a sector editor, the file will no
longer appear when the directory is read.) The status byte can
therefore contain the following values:
value (decimal) |
meaning
|
3 |
DOS 2.5 file using
sectors numbered 721 or more |
35 |
as above, but file
locked |
66 |
DOS 2 file, entry in use |
98 |
as above, but file
locked |
128 |
file deleted |
When a file is deleted, bit 7 of
the flag byte is set (and all other bits cleared) but the filename
is not removed from the directory. The file data itself is not
erased, but the sectors used by the file are marked in the VTOC as
being available for use again (see below). Under certain conditions
it may be possible to recover a deleted file (e.g. using the DOS 2.5
utility DISKFIX.COM), but probably not if another file has been
written to the disk since the old one was deleted. The new file may
have used the directory space and sectors occupied by the deleted
file, making recovery impossible.
Bytes 2 and 3 |
total number of sectors
used by the file in low and high byte format. |
Bytes 4 and 5 |
sector number of the
first sector in the file, again in low and highbyte format. |
Bytes 6 - 13 |
primary filename. If
this directory space has never been used, this area contains
only zeroes. |
Bytes 14 - 16
|
filename extension (or
zeroes). |
Normally, when you do a directory read you only
get the filename and sector count, plus an asterisk marker if the
file is locked. To get the rest of the information in the directory
entry, you will need to use a sector reader which bypasses DOS and
reads in the entire sector. From BASIC the directory is usually read
using a statement such as: OPEN #1,6,0,"D:*.*". However, DOS 2.5 can
use sector numbers greater than 720, which would not be usable by DOS
2.OS. If you use the following statement: OPEN #1,7,0,"D:*.*", DOS
will bracket any file using sector numbers of 720 or more (e.g. as <FILENAME.EXT>
).
THE VTOC
This is located in sector 360 (single density) or
sectors 360 and 1024 (enhanced density). As indicated above, two VTOC sectors are necessary for an enhanced density disk as one
sector is insufficient to store information about all 1023 sectors.
Its purpose is to provide a map of which sectors are being used to
store files and which are currently free to be used in a new file.
The first five bytes of sector 360 contain miscellaneous
information:
Byte 0 directory type byte. According to the OS
User's Manual, this should always be zero, but appears to be set to
2 under DOS 2.5 and DOS 2.OS.
Bytes 1 and 2 total sector count (in
low and high byte format) on the disk available to DOS. Should equal
707 for single density and 1010 for enhanced density.
Bytes 3 and 4 free sector count. This is the number of currently available (free)
sectors up to a maximum of 707. It is therefore the same number that
appears at the end of a directory read as 'xxx FREE SECTORS' on a
single density (but not an enhanced density) disk. On an enhanced
density disk, the number of additional free sectors is stored in
bytes 122 and 123 of sector 1024.
Starting at byte 10 of sector 360 is the sector
use bitmap. Each byte in the map contains the in-use status of eight
sectors, one bit per sector. On a single density disk, the map
continues to byte 99 of sector 360, but one sector is insufficient
to map all the sectors on an enhanced density disk and so sector
1024 is used as well. Each byte is used as shown:
Byte 10 |
bit |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
sector |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Byte 11 |
bit |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
sector |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
If a bit is clear, the sector is in use; if set, it is available for
a new file. Note that sector zero, although present in the map, does
not exist (see above). The map continues as shown above to byte 99
of sector 360, bit 0 (the rightmost bit) of which represents sector
719. It should be noted that even on an enhanced density disk the
map finishes here, and no more bytes of this sector are used. On
such a disk, the bitmap in sector 1024 starts at byte 0 (not byte 10
as in sector 360). Bit 7 (the leftmost bit) of byte 0 represents
sector 48. The bitmap continues to byte 121, bit 0 of this byte
representing sector 1023. Bytes 122 and 123 store the number of
currently available free sectors in addition to those stored in
bytes 3 and 4 of sector 360. In other words, a freshly formatted
enhanced density disk (without DOS files) will have a total of 1010
free sectors. This number is stored in bytes 1 and 2 of sector 360
and will remain unchanged. Bytes 3 and 4 of sector 360 will contain
the number 707, and bytes 122 and 123 of sector 1024 the number 303
(707 + 303 = 1010). These numbers will be updated as files are saved
and deleted.
Because the bitmap in sector 1024 starts at sector 48, there is a
considerable amount of overlap between the two VTOC sectors. Both
sectors will need to be examined to get the free sector count on a
directory read, and both may need to be updated when a file is
written to disk. This presumably accounts for the considerable
amount of drive head
movement with this version of DOS, which did not happen with DOS 2.OS
or DOS 3.
DISK FILE STRUCTURE.
After all the above (necessary) preliminaries, let us now look at
the structure of files stored on disk. Generally speaking, there are
two main types of file. These are firstly, files created and
maintained by the disk file manager (linked or chained sector files)
and secondly boot program files.
CHAINED SECTOR FILES.
These are the commonest type of file and examples
include those created by BASIC SAVE or LIST commands, the Binary
Save option from DOS, word processor text output, assembler object
files and so on. With this type of file, only the first 125 bytes
(bytes 0 - 124) of each sector contain file data. The remaining
three bytes contain the file link data, which is stored in the
following way:
Byte 125 the most significant six bits of this
byte contain the file number, which corresponds to the position of
the filename in the directory, and will be in the range 0 - 63. The
remaining two bits (bits 0 and 1) plus the whole of byte 126, make
up the 'forward pointer'.
Byte 126 this byte plus two bits from byte
125 is the forward pointer, and contains the sector number of the
next sector in the file. Bit 1 of byte 125 is therefore the most
significant bit of the pointer. 10 bits of pointer can only store a
maximum number of 1023 in binary form and this is why the sectors
numbered from 1024 to 1040 on an enhanced density disk are
unavailable to DOS 2.5. The same amount of pointer was also used on
DOS 3, but note that just one extra bit of pointer would have
allowed a true double density disk drive! Presumably Atari did not
do this when developing the 1050 and DOS 3 in order to maintain
compatibility with previous versions of DOS. However, DOS 3 when
produced was totally incompatible with DOS 2.OS for other reasons!
Byte 127 this byte contains the actual number of data bytes stored
in this sector. For all but the last sector in the file, this should
be 125. The last sector might contain 125 bytes, but this won't
happen unless the file length is an exact multiple of 125.
From this you can see that the disk file manager
finds the first sector of a file from the directory. 125 bytes of
data are loaded from that sector and loading continues from the
sector specified in the link data. This process is repeated until
the forward pointer reads zero, which indicates that this is the
last sector in the file. As each sector is loaded, DOS checks that
the file number (stored in byte 125) is the same as the file entry
position in the directory. If the numbers differ, loading stops and
error 164 (File Number Mismatch) is returned. Although this may seem
a complex process, it does have the advantage that files do not need
to be stored in a string of consecutive sectors, but can be
scattered around the disk if necessary, depending on the
availability of storage space.
There are two special cases of this kind of file we should consider.
Binary files are machine code programs created by the Binary Save
option of DOS (which saves a specified area of memory to disk) or
the object code output from an assembler. The first six bytes of any
such file are known as the file header, and have this format:
Bytes 0 and 1 - both set to 255 (hex $FF). This is
an identifier for a binary file.
Bytes 2 and 3 - the start address in low and high byte format.
Bytes 4 and 5 - the end address, again in low and high byte format.
When you select DOS option L (Binary Load) the start and end
addresses are obtained from the first six bytes of the first sector
of the file, and the program itself loaded into memory, beginning at
the load address and continuing until the end address is loaded. The
Binary Save option of DOS allows you to specify optional
initialization and run addresses. If present, these are appended to
the end of the file. On loading the file, the initialization address
will be loaded into locations 738 and 739 (INITAD) and the run
address into locations 736 and 737 (RUNAD). On completing the load,
control is passed back to the DOS menu if neither of these addresses
have been specified. If an initialization address is present, DOS
performs a machine language JSR instruction to the address contained
in INITAD. The code specified here should end with an RTS
instruction to return control to DOS. If a run address is specified,
DOS will then JSR to this. Either or both (or neither) of these
addresses may be used. Note that they do not need to point to code
within the loaded program - they could be used to call operating
system routines for example, or pass control to BASIC. An
AUTORUN.SYS file is simply a special case of a binary file. After
DOS is booted on powerup, it will look for a file named AUTORUN.SYS
on the disk and load and run it if present. To autorun, the file
must have either an initialization or run address appended.
The second 'special case' is that of a file created by the BASIC
SAVE command. A BASIC program is stored in memory in tokenised form,
whereby the BASIC keywords and variable names are represented by one
byte tokens rather than their full ATASCII form. This has the
advantage of saving considerable amounts of memory, but means that
BASIC must maintain lists of variable names and their current values
so that it knows which token represents which variable. Logically
enough, these are called the variable name and variable value
tables. When a BASIC SAVE is made, the program is saved in tokenised
form and the above tables must be saved with it. In fact, a series
of zero page pointers and several blocks of memory are also saved,
including the following:
1) zero page pointers:
locations |
name |
function
|
128,129 |
LOMEM |
pointer to the lowest
memory location usable by BASIC |
130,131 |
VNTP |
pointer to the beginning of the variable name table |
132,133 |
VNTD |
pointer to the end of the variable name table |
134,135 |
VVTP |
pointer to the beginning of the variable value table |
136,137 |
STMTAB |
pointer to the beginning of the tokenised program |
138,139 |
STMCUR |
pointer to the token in a program line currently
being processed, either during input of a line or when the program
is run |
140,141 |
STARP |
pointer to the beginning of the string and array
storage area, and therefore to the end of the program |
These seven pointers are saved to disk in the order shown, but
before doing so one change is made - the value in LOMEM is
subtracted from each one and the resulting value saved. Since LOMEM
itself is saved first, this means that the first two bytes of the
file are always zero.
2) sections of the tokenised program:
This comprises the following blocks of memory in this order:
the variable name table
the variable value table
the tokenised
program
the immediate mode line
Note that the string/array storage area is not saved, as all
strings and arrays are redimensioned each time the program is run.
When a BASIC LOAD is made, the seven pointers are read in first, and
the value in MEMLO (locations 743,744 - the operating system pointer
to the bottom of free memory) is added to each one. The values in
two more zero page pointers, RUNSTK (142,143 - pointer to a software
stack used by BASIC in processing GOSUB statements and FOR...NEXT
loops) and MEMTOP (144,145 - pointer to the top of memory used by
BASIC, including the string/array area) are set to the value in STARP. Next, 256 bytes directly above the value in LOMEM are
reserved as an output buffer used when BASIC is tokenising a line.
Finally, the variable tables and the tokenised program are read in
to memory immediately following the output buffer.
BOOT PROGRAM FILES
These are machine code programs which are loaded into memory and run
(if desired) by the operating system at powerup. Unlike the binary
files discussed previously they do not require DOS to be present in
memory or on the disk in order to be loaded or run, nor do they need
the presence of BASIC or any other language. The file structure
therefore differs fundamentally from chained sector files. Because
DOS is not used, sector chaining is not needed and boot program
sectors contain 128 bytes of program data and no link data. The
operating system boot loader routine always attempts to load boot
files at powerup starting at sector 1 of drive 1, meaning that
generally speaking there can only be one boot file per disk and this
must consist of a consecutive string of sectors beginning at sector
1. These files do not require a directory entry, and sector usage
need not be indicated in the VTOC. There is an important exception
to these rules, discussed below. As with the binary files discussed
earlier, these files contain a six byte header. The six bytes are
used as follows:
Byte 0 - flags byte. This is not generally used and is usually zero.
Byte 1 - number of sectors to be loaded, including the first sector.
This can range from 1 - 255. If it is zero, 256 sectors will be
loaded. What if the file is longer than 256 sectors? See below for
the explanation.
Bytes 2 and 3 - the load address. The file is read
into memory starting at this address.
Bytes 4 and 5 - the
initialization address.
What exactly happens during the boot process? The procedure is
described in considerable detail in De Re Atari or the Operating System User's Manual, but the following is a brief
outline. Cassette users should note that the process is essentially
similar for the cassette boot process.
As part of the powerup routine, the operating system (OS) checks to
see if a cartridge is present (or built-in BASIC enabled). If so,
the cartridge's 'Allow disk boot' flag is checked, to determine if
the cartridge software permits the disk to be booted (as it would in
the case of BASIC or other languages, but not in most games).
Providing a disk boot is allowed, or if no cartridge is present and
BASIC is disabled, the boot process goes ahead.
Assuming drive 1 is switched on, the OS will attempt to read sector
1 into memory. If it cannot do so - if no disk is in the drive for
example - the boot process is aborted and the message 'BOOT ERROR'
written to the screen. If all is well, the 128 bytes in sector 1 are
read into a specified area of RAM (the cassette buffer in fact). The
first six bytes (the header) are described above. The values in
these bytes are then moved to the following locations:
Byte 0 to location 576 (DFLAGS)
Byte 1 to 577 (DBSECT)
Bytes 2 and 3 to 578,579 (BOOTAD)
Bytes 4 and 5 to 12,13 (DOSINI).
The entire sector (including the header) is then
moved to the area of memory beginning at the address now present in
BOOTAD. The remaining sectors are then read from disk directly into
the memory area following the first sector.
When the load is complete, the OS performs a JSR to the address
contained in BOOTAD, + 6 (i.e. to the first byte of the actual
program). This part of the program need not do anything, but if the
file was longer than 256 sectors any remaining sectors should be
loaded by the part of the program contained here. This part of the
program should end by clearing the 6502 carry flag to indicate a
successful load (even if no further sectors were loaded) or set the
carry flag if the load was unsuccessful.. It must terminate with an
RTS.
The OS will next JSR to the address in DOSINI for program
initialization. Again, this section need do nothing, if so desired.
It must end with an RTS. However, if the booted program is at some
stage to take control of the computer, this section of the program
should store the run (or 'restart') address of the program into
locations 10 and 11 (DOSVEC). If this is not to be the case, DOSVEC
should be left unchanged. On powerup, DOSVEC is set to point to the
memopad (400j800) or self-test (XL/XE) routines. If DOS is booted,
it will change DOSVEC to point to the routine to load the DOS menu.
BASIC will jump through DOSVEC when you type the keyword DOS, and
this explains why, if you call DOS when it has not been booted, you
go into the self-test/memopad routine.
Finally, the OS will pass control to the cartridge software (or
BASIC) if present. If both BASIC and cartridges are absent, the OS
passes control directly to the booted program by jumping through DOSVEC. Booting DOS without a cartridge or BASIC will therefore go
straight to the DOS menu; powering up the machine without cartridge
or disk boot and with BASIC disabled will proceed to the memopad/selftest
routine. Note that whenever the Reset button is pressed, at the end
of the warmstart process the OS will carry out the final two steps
described above.
One special case of booted software is that of DOS itself. Although
DOS is booted into memory on powerup, it actually consists of two
separate files - the three boot sectors (1-3) and the file DOS.SYS.
On powerup, the OS reads in the boot sectors and these will in turn
load DOS.SYS. This
has the advantage that DOS.SYS can be located anywhere the disk,
and can be deleted if required. Otherwise, a string of 40
consecutive sectors would have to be permanently reserved for it,
even if you did not want DOS on a particular disk. However, this does
mean that sector 1 takes on a slightly different format. The six
byte header is the same as before, but the three bytes following the
header are a JMP instruction to the code which loads in DOS.SYS.
Following these three bytes, there are a series of data bytes needed
by DOS. The use of these bytes and their (usual) value is as follows
(bytes 0 - 5 are the file header):
byte |
usual
value
|
function |
0 |
0 |
flagbyte |
1 |
3 |
number of sectors to load |
2,3 |
0,7 |
load address for the three boot sectors |
4,5 |
64,21 |
initialization address |
6,7,8 |
76,20,7 |
JMP instruction to bypass the data bytes (JMP
$0714) |
9 |
3 |
maximum number
of simultaneously open disk files (you can have open files to other
device as well). Each open file is allocated a 128 byte buffer. You
can increase this number to a maximum of seven, but you will lose
128 bytes for every additional buffer. |
10 |
3 |
drive numbers supported - in
this case drives 1 and 2. Up to four drives can be supported, and
each drive is represented by one bit in this byte (bit 0 = drive 1,
bit 1 = drive 2 and so on). Again, this byte can be altered to add
more drives to your system. |
11 |
0 |
buffer allocation
direction (no, I don't know what it means either, but
apparently it should always be zero) |
12,13 |
204,25 |
boot image end address +
1 |
14 |
1 |
if zero, it means that
the file DOS.SYS
is not present on the disk. A nonzero-value means that it is. |
15,16 |
4,0 |
starting sector of the file DOS.SYS in low and high byte format. |
17,18,19 |
125,203,4 |
I am uncertain of the use of these bytes. |
Note that the value of some of these bytes may vary from the above
depending on disk configuration and customisation of DOS. The Disk
File Manager (three boot sectors and the file DOS.SYS) form an
exception to the usual rules for boot programs. Although DOS.SYS
acts to all intents and purposes as a boot file, it has a directory
entry, its sectors are marked as 'in use' in the VTOL and it has a
linked sector structure. The initial three boot sectors however are
a conventional boot file with the slight variation to sector 1
described above.
And that just about completes our discussion of
Atari disk file structure! In order that you may learn a little more
about disk files, I have written a simple sector editor but that
will have to wait for the next issue. See you then!
top