Bubble Bubble

by Gordon Cameron

 

Issue 32

Mar/Apr 88

Next Article >>

<< Prev Article

 

 

 

Gordon Cameron presents a tutorial on bubble sorting with a unique program that allows you to sort lists within word processing files

 

There are many occasions when it would be useful to be able to sort data into either numerical or alphabetical order. Some examples are Telephone Directories, Book/Record lists, software lists etc. Many such files are held on disk and these are of particular interest as they can be easily manipulated. This article will deal with sorting a file into either alphabetical or reverse alphabetical order.


There are two main ways of performing sorts of this kind on disk files (which are usually text files). One is to do operations on the file itself, reading it in a couple of lines at a time say, performing an operation, writing the results to another file, and repeating the procedure over and over but this involves a lot of disk operations. One alternative would be to use a RAMDISK, where the text files could be stored. This increases the speed considerably and, obviously, eliminates disk operation but, despite this, the method is cumbersome and clumsy involving multiple opening and closing of files and only allowing sequential indexing of data.


SORT IN MEMORY FOR SPEED


The method I will use involves copying the source file into memory, in this case into an array, and sorting the data in the array. Finally, the resultant sorted array can be saved to an object file.


Before going on I had better explain some terms before I lose anyone. The source file is the original unsorted file which you wish to sort and the object file will be the file AFTER it has been sorted. A text file is a file which consists of characters usually produced by a word processor or text editor etc. and which can be loaded using either the INPUT or GET commands, depending on the format.


The source file need not be a word-processor file. If you are, for example, writing a software list program, you may save your file directly to disk using either PUT or PRINT. These files can just as easily be sorted using the methods I describe below.


SORTING WORD PROCESSED TEXT


Firstly, we must read the file into the array. The program required is shown in Listing 1. Line 10 dimensions the array FILE$ to hold 10000 characters. This may not be enough or it may be too much for your needs so alter it as you like. 10000 is, however, a good round number to be getting on with. The source text file may have been saved with either PRINT or PUT. The program deals with this, and reads the data into the string FILE$. This may take a while. The program is fairly unique in that it can sort just part of a text file so that lists forming part of a document can be sorted. This facility is often missing from many commercial word processors.


It is important that you mark the beginning and end of the data that you wish to be sorted. Mark the beginning with a `$' symbol followed by RETURN on a separate line. Mark the end of the data with the 'up arrow' symbol, again on a line by itself. The data in between these two markers will be the data that is manipulated and sorted. Anything before is stored in GARBST$, and anything after in GARBEND$. This facility is provided as many word processors (e.g. Superscript, Mini Office II) add their own control characters to the beginning and/or end of your actual text. Using this method we retain these special codes but they are obviously not to be sorted. They are rewritten to the object file at the end of the session, so that the object file will load back in to the word processor.
 

This feature is very valuable as it means you can use a word processor to enter data, the program here to sort the data, and the word processor thereafter to inspect and adjust again if necessary. The above feature is also useful if you only want to sort PART of a file. Simply position the markers round the part you want sorted, and the program will handle the rest. Again, you can alter the sizes of GARBST$ and GARBEND$ at will.


The text to be sorted is stored sequentially in the array, with each screen line of the file occupying 40 spaces in the array. Note this limitation which means that data longer than 40 characters cannot be sorted by this program. Once we have the data from the file in memory we can access any line at will but how do we sort this data ?


DIFFERENT TYPES OF SORT


There are several sorting algorithms but the most common are the Bubble sort, Shell sort and the Quick sort. For this article I will look at the easiest of the three, the Bubble sort.


The best way to explain the sort is by way of a simple example. Only 6 'lines' of data will be used but the method can obviously be extended to much larger files. Listing 2 will sort your data. This program should be used in conjunction with the loader given in Listing 1.


The data in our example is a list of software titles as follows:


Line 1 Kennedy Approach

Line 2 Pawn
Line 3 Star Raiders
Line 4 Phantom
Line 5 Summer Games

Line 6 Decathlon
 

The bubble sort works through the list checking if the first line of data is greater than (in this case, 'higher up' in the alphabet) the next line. If it is, it swaps them otherwise it goes on to the next item. If a reverse alphabetical sort is required then we simply check if the first item is LESS than the second and again swap them if so.


This swapping continues down the 'list' of lines and when the end is reached the last line will be correct (i.e. the last line will contain the 'greatest' or highest in the alphabet text sequence). The procedure is then repeated with each line being checked excluding the last line as it is correct. This is repeated continually until all the values are correctly placed. Note that the first line is contained in FILE$(1, 40), the second in FILE$(41, 80), the third in FILE$(81, 120) and so on so each line in the list must consist of a maximum of 40 characters only.


The Atari can work out with its inbuilt functions whether, for example, 'Phantom' > 'Summer Games' or not, so you don't have to bother with this. Your Atari does this part for you. Using the above algorithm the data transforms in the following way:


Pass 1.1
Is Kennedy Approach (line 1) > Pawn (line 2)? No, so don't swap

 

Pass 1.2
Is Pawn (line 2) > Star Raiders (line 3)? No, so continue


Pass 1.3
Is Star Raiders (line 3) > Phantom (line 4)?
YES! So swap. Phantom which was line 4 now becomes line 3 and Star Raiders becomes line 4
This continues with Star Raiders being compared with Summer Games no swap occurs. Summer Games is greater than Decathlon so a swap occurs and the list after the first pass is as follows:

 

BEFORE

  AFTER
Line 1 Kennedy Approach Line 1
Line 2 Pawn Line 2
Line 4 Phantom Line 3
Line 3 Star Raiders Line 4
Line 6 Decathlon Line 5
Line 5 Summer Games Line 6

 
Note that Summer Games has taken its proper position at the bottom of the list, as it is the 'highest in the alphabet' so in the next pass, we don't need to check this line with the previous one as we KNOW that it is correct. Whereas before we had to make 5 comparisons (1 with 2, 2 with 3, 3 with 4, 4 with 5 and 5 with 6), next time we need only make 4 comparisons as we don't need to compare with the last line.


In the next pass, only one swap occurs, that of Decathlon with Star Raiders. In pass 3 there is again only one swap Decathlon with Phantom and in pass 4 only one swap again Decathlon with Pawn. In pass 5, Decathlon is swapped with Kennedy Approach and this yields the final sorted sequence:

 

AT START

  END
Line 6 Decathlon Line 1
Line 1 Kennedy Approach Line 2
Line 2 Pawn Line 3
Line 4 Phantom Line 4
Line 3 Star Raiders Line 5
Line 5 Summer Games Line 6

 

And that's the bubble sort for you!


SIMPLE BUT EFFECTIVE


The Bubble sort is very simple, but it can be very effective, however there are drawbacks. Notice that, although Summer Games reached its location almost at once, Decathlon took 5 swaps and all 5 Passes and therefore reached its destination very slowly. This demonstrates a possible disadvantage of the Bubble Sort although only one item was out of place, it took many swaps to get it to its destination. Imagine the list being hundreds of lines long with, say, 'AAA' right at the bottom and all other lines in sorted order. What a waste! If there were 500 lines, it would take 499 passes and swaps just to get this line to its correct position!


Although an extreme example, this demonstrates the shortcomings of the bubble sort algorithm but there are alternatives. In a later article I will show you the Shell sort. Bye for now!

 

Listing 1

  AtariLister - requires Java

 

Listing 2

  AtariLister - requires Java

top