“A nearly impenetrable thicket of geekitude…”

ZVR Compressed Text File Reader

Introduction

The Psion series 3a pocket organiser is a wonderful machine. I was particularly impressed when I came across a program by Ewan Paton called vr3a which allowed text files to be read “vertically” on it, i.e., with the machine turned through ninety degrees. A Psion 3a fits neatly into the hand when held this way, and you can read the text in much the same way as you would use a paper book. Your thumb on the space bar flips to the next page.

Unfortunately, the main memory on Psion 3a machines is limited: even the most capacious models only allow for 2MB of main memory. Although you can plug in Solid State Disks (SSDs) to extend this, they are quite expensive.

So, it seemed reasonable to extend the original vr3a program to allow reading of specially prepared compressed text files, thus getting more text into the same space and allowing me to carry around more reading material. The resulting program zvr (along with some documentation and an early compressor in source form) was made available in April 1995 to a few people who had expressed an interest. The file has since found its way onto a number of Psion archive sites.

Some people had problems with compiling the compressor (it needs a 32-bit flat memory model system: any modern Unix, a DOS extender or Win32) and there were a couple of bugs to fix. The latest version of the compressor (now called zvrz) is now separate from the reader, and I’ve provided executables for people who don’t want to get into the business of compiling the compressor themselves, or simply don’t have a suitable compiler.

Download the Reader: ZVR V1.1

To get zvr working on your Psion 3a, follow these steps:

  • Download and install Ewan Paton’s original vr3a application from VERT.ZIP. Try it on some plain text files to make sure it is working.
  • Download the zvr archive and install zvr itself from ZVR110.ZIP. Try zvr on the included ALADDIN.ZVR file to confirm its operation.
  • Discard the compression and decompression program source code provided.

Download the Compressor: ZVRZ V1.2

If you just want to compress some files, and you have a system which I can build executables for, I’ll try and provide an executable of the compressor program here. No documentation is provided: just execute the program with an empty command line and it will tell you all you need to know.

If you’re interested in the compression program itself, or have a system I can’t support, you need to pick up the source archive instead. Included are the zvrz.c main file and a getopt.c and getopt.h in case you don’t have them on your system.

What Can I Read?

The best answer to this question is to find something you want to read that exists in electronic form and compress it yourself using the zvrz program; Project Gutenberg is a good place to start if your taste is in older books. A couple of my favourite classics, run through zvrz and then zipped, are The Time Machine (65KB) and Dr. Jekyll and Mr. Hyde (50KB).

About the Compression Scheme

It’s one of those “well known facts” that English text has about 1.3 bits of entropy per character: i.e., an 8-bit character in a text file contains perhaps 1.3 bits of information. From this, it’s a simple step to saying that we should be able to compress English text by a factor of 8/1.3, or about 6. My compressor manages to reduce most files by about 50% of their original size: a compression factor of only 2.

This looks pretty sloppy until you look at some other aspects of the problem:

  • The decompressor needed to be written in OPL.
  • The decompression process needed to be very fast.
  • Random seeks within the compressed file needed to be possible, to allow rapid movement within large compressed texts.

Now, all modern compression systems compress the input sequence into a sequence of output compression symbols of different length. This means that the more commonly occurring input sequences can be represented by short output symbols (only a few bits in length) with longer output sequences used for less-used sequences in the input, a procedure which can greatly enhance compression. Another technique used is to accumulate an buffer of text into which the compressed text can refer for symbol definitions.

These modern compression techniques often achieve compression factors up to 4:1 on English text, but they were unsuitable for use in zvr:

  • OPL has no facilities for fast manipulation of variable-length bit-fields.
  • Large compression buffers are inappropriate on limited-memory machines. Most importantly, all of these techniques introduce state into the compressed text, which means that it is extremely hard to quickly reposition within it.

The compression scheme understood by zvr is, therefore, well behind the state of the art: the compressed file starts with a dictionary of the 256 possible symbols which might appear in the compressed file. The dictionary simply contains a fixed replacement string for each of these symbols. Decompression is therefore very fast even in OPL, and can start at any point in the compressed file.

The compression algorithm used by zvrz is simply to repeatedly scan the uncompressed file for the most common symbol pair, and define a compression symbol to represent that pair. The result of replacing that pair whenever it appears is then used as data for the next pass. This process repeats until no more compression can take place because no more unused compression symbols are available.

.TCR Files and the Reader Application

At the time I built the zvr program and its compressor, several people had vertical readers; my hope was that one of them might incorporate my ideas into their product and then I could just use that and not bother with zvr myself any more.

This eventually happened: after some discussion with me, Barry Childress took my primitive effort apart and built a much better (but incompatible) variant into his READER program: files with a .TCR extension are for use with READER. Barry’s application is so much better than zvr that it’s the application I use myself now, and as a result zvr is unlikely ever to be changed (by me) again. Steve Litchfield also reviewed READER very positively, and a copy of the reader application is linked to this page.

[Update 2018-01-17: the links in the above paragraph use the Internet Archive Wayback Machine, as the 3lib.ukonline.co.uk site is now defunct.]

The compressed file reading part of READER is only available in the registered version; contact Barry for details.