% PKtoPX.web
%
%  PKtoPX creates a regular pixel file from a packed pixel file.
%
% Preliminary 0.0 version:  May, 1985
% First release, 0.9 version:  8 May 1985
% Updated to reflect new pk format, 2.0 version: 25 July 1985
% Updated again for new pk format, 2.1 version: 15 August 1985
% Raster_pointer array bug fixed, 2.2 version: 19 October 1985
% Fixed documentation, 2.3 version: 17 November 1987
\def\versiondate{30 November 1987}
%
\font\ninerm=cmr9
\let\mc=\ninerm % medium caps for names like PASCAL
\font\logo=logo10 % font used for the METAFONT logo
\def\MF{{\logo META}\-{\logo FONT}}
\def\PASCAL{{\mc Pascal}}
\def\tamu{Texas A\char38 M}
\def\(#1){} % this is used to make section names sort themselves better
\def\9#1{} % this is used for sort keys in the index
\def\title{PKtoPX}
\def\contentspagenumber{0}
\def\topofcontents{\null
  \def\titlepage{F} % include headline on the contents page
  \def\rheader{\mainfont\hfil \contentspagenumber}
  \vfill
  \centerline{\titlefont The {\ttitlefont PKtoPX} processor}
  \vskip 15pt
  \centerline{(Version 2.3, \versiondate)}
  \vfill}
\def\botofcontents{\vfill
  \centerline{\hsize 5in\baselineskip9pt
    \vbox{\ninerm\noindent
    The preparation of this report
    was supported in part by the National Science
    Foundation under grants IST-8201926 and MCS-8300984,
    and by the System Development Foundation. `\TeX' is a
    trademark of the American Mathematical Society.}}}
\pageno=\contentspagenumber \advance\pageno by 1

@* Introduction.
The standard format for the distribution of font raster information for \TeX\
has been \.{PXL} files.  These files are loosely packed, based on a 32-bit
word, and use no forms of compression.  \TeX\ requires dozens of fonts in many
different sizes, with typical installations having hundreds of pixel files
using many megabytes of disk storage.
Distribution of the unwieldy pixel files is also a difficult problem for
microcomputer systems, on which \TeX\ is only just becoming available.
Many boxes of diskettes would be required just to store a basic set of fonts
in three sizes for a three-hundred dot per inch device.
A better format is called for.

This program takes a packed, or \.{PK} file, and converts it into the
standard \.{PXL} format.  This program is intended as a model of a \.{PK}
reading program, and it has been written in such a fashion that it can be
easily inserted into a driver.  The work that is necessary is to essentially
change it from a program to a procedure.  Each module of globals should be
changed to a module of locals, etc.

@ The |banner| string defined here should be changed whenever \.{PKtoPX}
gets modified.

@d banner=='This is PKtoPX, Version 2.3'
         {printed when the program starts}

@ This program is written in standard \PASCAL, except where it is necessary
to use extensions; for example, \.{PKtoPX} must read files whose names
are dynamically specified, and that would be impossible in pure \PASCAL.

@d othercases == others: {default for cases not listed explicitly}
@d endcases == @+end {follows the default case in an extended |case| statement}
@f othercases == else
@f endcases == end

@ Both the input and output come from binary files.  On line interaction
is handled through \PASCAL's standard |input| and |output| files.

@d print_ln(#)==write_ln(output,#)
@d print(#)==write(output,#)

@p program PKtoPX(input, output);
label @<Labels in the outer block@>@/
const @<Constants in the outer block@>@/
type @<Types in the outer block@>@/
var @<Globals in the outer block@>@/
procedure initialize; {this procedure gets things started properly}
  var i:integer; {loop index for initializations}
  begin print_ln(banner);@/
  @<Set initial values@>@/
  end;

@ If the program has to stop prematurely, it goes to the
`|final_end|'.  In addition, if, when unpacking an individual character,
we encounter a character number greater than 127, we go to |bad_char|.
Successful completion of a character goes to |good_char|.

@d final_end=9999 {label for the end of it all}
@d bad_char=9997 {in case of a bad character}
@d good_char=9998 {successful character read}

@<Labels...@>=final_end, bad_char, good_char;

@ These constants determine the maximum length of a file name and the length
of the terminal line, as well as the widest character that can be translated.
@^system dependancies@>

@<Constants...@>=
@!name_length=80; {maximum length of a file name}
@!terminal_line_length=132; {maximum length of an input line}
@!max_div_32=100; {maximum character width divided by 32}

@ Here are some macros for common programming idioms.

@d incr(#) == #:=#+1 {increase a variable by unity}
@d decr(#) == #:=#-1 {decrease a variable by unity}
@d do_nothing == {empty statement}

@ It is possible that a malformed packed file (heaven forbid!) or some other
error might be detected by this program.  Such errors might occur in a deeply
nested procedure, so the procedure called |jump_out| has been added to transfer
to the very end of the program with an error message.

@d abort(#)==begin print_ln(' ',#); jump_out; end

@p procedure jump_out;
begin goto final_end;
end;

@* Pixel file format.
A \.{PXL} file is an expanded raster description of a single font at a
particular resolution.  \.{PXL} files are used by many existing
device-driver programs for dot matrix devices.
All words in a \.{PXL} files are in 32-bit format, with the four lower
bits zero on 36-bit machines. The raster information is contained in a
sequence of binary words which record white pixels as zeros and black
pixels as ones.

The first word of the \.{PXL} file and the last word contain the \.{PXL
ID} which is currently equal to 1001 (decimal).
This first word is followed by a sequence of raster information words
where each line of pixels in the glyphs is represented by one or more
words of binary information. The number of words used to represent each
row of pixels for any particular glyph is fixed and it is set by the value
of |max_m-min_m+1| for that particular glyph. Each white pixel is represented
by a zero and each black pixel is represented by a one in the corresponding bit
positions (the first 32 only of each word on 36-bit machines).
The unused bit positions
toward the end of each set of words for each row of pixels are filled with
zeros.

The font directory follows, occupying a fixed position with respect to the
end of the file (in words 517 through 6 from this end), and assigns 4
words for each of the potential 128 different glyphs that could be
contained in this particular font in the order of their ascending ascii
values (not in the order that the glyphs appear in the raster section,
which may be entirely arbitrary). This means that the first four words are
for the ascii zero glyph.  All four words reserved for any missing glyphs
are set to zero.

The first word of each glyph's directory information contains the pixel
width in the left half-word (the leftmost 16 bits) and the pixel height in
the right half-word (the next 16 bits). These dimensions are those of the
smallest bounding-box, measured in pixels, and they have nothing
necessarily to do with the width and height figures that appear in the
\.{TFM} file.  The \.{TFM} width, measured in \.{FIXes}, where 1 \.{FIX}
is $1/2^{20}$ times the design size, is listed in the fourth word of the
glyph's directory information.

The second word of the glyph's directory information contains the offset
of the glyph's reference point from its upper-left-hand corner of the
bounding box, measured in pixels, with the x-offset in the left half-word
and the y-offset in the right half-word.  These numbers may be negative,
and two's complement representation is used.  Remember that the positive x
direction means `rightward' and positive y is `downward' on the page.

The third word of a glyph's directory information contains the number of the
word in this \.{PXL} file where the raster description for this particular
glyph begins, measured from the first word which is numbered zero.
As mentioned earlier, the fourth word of directory information for each
glyph contains the \.{TFM} width.

The final five words in the \.{PXL} file contain information relation to
the entire file.
The first of these five words is a checksum which should
match the checksum contained in the \.{TFM} file that \TeX\ used in
reference to this font, although, if this checksum is zero, no validity
checking will be done.
The second of these five words is an integer that is 1000 times the
magnification factor at which this font was produced.
The third word contains the design size of the font measured in \.{FIXes}
($2^{-20}$ unmagnified points).
The fourth word contains a pointer to the first word of the font directory.
The fifth and last word of the entire file contains a duplicate of the
\.{PXL} ID as contained in the first word of the file.

@d pxl_id=1001 {current version of \.{PXL} format}

@* Packed file format.
The packed file format is a compact representation of the data contained in a
\.{GF} file.  The information content is the same, but packed (\.{PK}) files
are almost always less than half the size of their \.{GF} counterparts.  They
are also easier to convert into a raster representation because they do not
have a profusion of \\{paint}, \\{skip}, and \\{new\_row} commands to be
separately interpreted.  In addition, the \.{PK} format expressedly forbids
\&{special} commands within a character.  The minimum bounding box for each
character is explicit in the format, and does not need to be scanned for as in
the \.{GF} format.  Finally, the width and escapement values are combined with
the raster information into character ``packets'', making it simpler in many
cases to process a character.

A \.{PK} file is organized as a stream of 8-bit bytes.  At times, these bytes
might be split into 4-bit nybbles or single bits, or combined into multiple
byte parameters.  When bytes are split into smaller pieces, the `first' piece
is always the most significant of the byte.  For instance, the first bit of
a byte is the bit with value 128; the first nybble can be found by dividing
a byte by 16.  Similarly, when bytes are combined into multiple byte
parameters, the first byte is the most significant of the parameter.  If the
parameter is signed, it is represented by two's-complement notation.

The set of possible eight-bit values are separated into two sets, those that
introduce a character definition, and those that do not.  The values that
introduce a character definition comprise the range from 0 to 239; byte values
above 239 are interpreted commands.  Bytes which introduce character
definitions are called flag bytes, and various fields within the byte indicate
various things about how the character definition is encoded.  Command bytes
have zero or more parameters, and can never appear within a character
definition or between parameters of another command, where they would be
interpeted as data.

A \.{PK} file consists of a preamble, followed by a sequence of one or more
character definitions, followed by a postamble.  The preamble command must
be the first byte in the file, followed immediately by its parameters.
Any number of character definitions may follow, and any command but the
preamble command and the postamble command may occur between character
definitions.  The very last command in the file must be the postamble.

@ The packed file format is intended to be easy to read and interpret by
device drivers.  The small size of the file reduces the input/output overhead
each time a font is defined.  For those drivers that load and save each font
file into memory, the small size also helps reduce the memory requirements.
The length of each character packet is specified, allowing the character raster
data to be loaded into memory by simply counting bytes, rather than
interpreting each command; then, each character can be interpreted on a demand
basis.  This also makes it possible for a driver to skip a particular
character quickly if it knows that the character is unused.

@ First, the command bytes shall be presented; then the format of the
Character definitions will be defined.  Eight of the possible sixteen
commands (values 240 through 255) are currently defined; the others are
reserved for future extensions.  The commands are listed below.  Each command
is specified by its symbolic name (e.g., \\{pk\_no\_op}), its opcode byte,
and any parameters.  The parameters are followed by a bracketed number
telling how many bytes they occupy, with the number preceded by a plus sign if
it is a signed quantity.  (Four byte quantities are always signed, however.)

\yskip\hang|pk_xxx1| 240 |k[1]| |x[k]|.  This command is undefined in general;
it functions as a $(k+2)$-byte \\{no\_op} unless special \.{PK}-reading
programs are being used.  \MF\ generates \\{xxx} commands when encountering
a \&{special} string.  It is recommended that |x| be a string having the form
of a keyword followed by possible parameters relevant to that keyword.

\yskip\hang\\{pk\_xxx2} 241 |k[2]| |x[k]|.  Like |pk_xxx1|, but |0<=k<65536|.

\yskip\hang\\{pk\_xxx3} 242 |k[3]| |x[k]|.  Like |pk_xxx1|, but
|0<=k<@t$2^{24}$@>|.  \MF\ uses this when sending a \&{special} string whose
length exceeds~255.

\yskip\hang\\{pk\_xxx4} 243 |k[4]| |x[k]|.  Like |pk_xxx1|, but |k| can be
ridiculously large; |k| musn't be negative.

\yskip\hang|pk_yyy| 244 |y[4]|.  This command is undefined in general; it
functions as a five-byte \\{no\_op} unless special \.{PK} reading programs
are being used.  \MF\ puts |scaled| numbers into |yyy|'s, as a result of
\&{numspecial} commands; the intent is to provide numeric parameters to
\\{xxx} commands that immediately precede.

\yskip\hang|pk_post| 245.  Beginning of the postamble.  This command is
followed by enough |pk_no_op| commands to make the file a multiple
of four bytes long.  Zero through three bytes are usual, but any number
is allowed.
This should make the file easy to read on machines which pack four bytes to
a word.

\yskip\hang|pk_no_op| 246.  No operation, do nothing.  Any number of
|pk_no_op|'s may appear between \.{PK} commands, but a |pk_no_op| cannot be
inserted between a command and its parameters, between two parameters, or
inside a character definition.

\yskip\hang|pk_pre| 247 |i[1]| |k[1]| |x[k]| |ds[4]| |cs[4]| |hppp[4]|
|vppp[4]|.  Preamble command.  Here, |i| is the identification byte of the
file, currently equal to 89.  The string |x| is merely a comment, usually
indicating the source of the \.{PK} file.  The parameters |ds| and |cs| are
the design size of the file in $1/2^{20}$ points, and the checksum of the
file, respectively.  The checksum should match the \.{TFM} file and the
\.{GF} files for this font.  Parameters |hppp| and |vppp| are the ratios
of pixels per point, horizontally and vertically, multiplied by $2^{16}$; they
can be used to correlate the font with specific device resolutions,
magnifications, and ``at sizes''.  Usually, the name of the \.{PK} file is
formed by concatenating the font name (e.g., amr10) with the resolution at
which the font is prepared in pixels per inch multiplied by the magnification
factor, and the letters \.{PK}.  For instance, amr10 at 300 dots per inch
should be named AMR10.300PK; at one thousand dots per inch and magstephalf,
it should be named AMR10.1095PK.

@ We put a few of the above opcodes into definitions for symbolic use by
this program.

@d pk_id = 89 {the version of \.{PK} file described}
@d pk_xxx1 = 240 {\&{special} commands}
@d pk_yyy = 244 {\&{numspecial} commands}
@d pk_post = 245 {postamble}
@d pk_no_op = 246 {no operation}
@d pk_pre = 247 {preamble}

@ The \.{PK} format has two conflicting goals; to pack character raster and
size information as compactly as possible, while retaining ease of translation
into raster and other forms.  A suitable compromise was found in the use of
run-encoding of the raster information.  Instead of packing the individual
bits of the character, we instead count the number of consecutive `black' or
`white' pixels in a horizontal raster row, and then encode this number.  Run
counts are found for each row, from the top of the character to the bottom.
This is essentially the way the \.{GF} format works.
Instead of presenting each row individually, however, let us concatenate all
of the horizontal raster rows into one long string of pixels, and encode this
row.  With knowledge of the width of the bit-map, the original character glyph
can be easily reconstructed.  In addition, we do not need special commands to
mark the end of one row and the beginning of the next.

Next, let us put the burden of finding the minimum bounding box on the part
of the font generator, since the characters will usually be used much more
often than they are generated.  The minimum bounding box is the smallest
rectangle which encloses all `black' pixels of a character.  Let us also
eliminate the need for a special end of character marker, by supplying
exactly as many bits as are required to fill the minimum bounding box, from
which the end of the character is implicit.

Let us next consider the distribution of the run counts.  Analysis of several
dozen pixel files at 300 dots per inch yields a distribution peaking at four,
falling off slowly until ten, then a bit more steeply until twenty, and then
asymptotically approaching the horizontal.  Thus, the great majority of our
run counts will fit in a four-bit nybble.  The eight-bit byte is attractive for
our run-counts, as it is the standard on many systems; however, the wasted four
bits in the majority of cases seems a high price to pay.  Another possibility
is to use a Huffman-type encoding scheme with a variable number of bits for
each run-count; this was rejected because of the overhead in fetching and
examining individual bits in the file.  Thus, the character raster definitions
in the \.{PK} file format are based on the four-bit nybble.

@ The analysis of the pixel files yielded another interesting statistic: fully
37\char`\%\
of the raster rows were duplicates of the previous row.  Thus, the \.{PK}
format allows the specification of repeat counts, which indicate how many times
a horizontal raster row is to be repeated.  These repeated rows are taken out
of the character glyph before individual rows are concatenated into the long
string of pixels.

For elegance, we disallow a run count of zero.  The case of a null raster
description should be gleaned from the character width and height being equal
to zero, and no raster data should be read.  No other zero counts are ever
necessary.  Also, in the absence of repeat counts, the repeat value is set to
be zero (only the original row is sent.)  If a repeat count is seen, it takes
effect on the current row.  The current row is defined as the row on which the
first pixel of the next run count will lie.  The repeat count is set back to
zero when the last pixel in the current row is seen, and the row is sent out.

This poses a problem for entirely black and entirely white rows, however.  Let
us say that the current row ends with four white pixels, and then we have five
entirely empty rows, followed by a black pixel at the beginning of the next
row, and the character width is ten pixels.  We would like to use a repeat
count, but there is no legal place to put it.  If we put it before the white
run count, it will apply to the current row.  If we put it after, it applies
to the row with the black pixel at the beginning.  Thus, entirely white or
entirely black repeated rows are always packed as large run counts (in this
case, a white run count of 54) rather than repeat counts.

@ Now let us turn our attention to the actual packing of the run counts and
repeat counts into nybbles.  There are only sixteen possible nybble values.
We need to indicate run counts and repeat counts.  Since the run counts are
much more common, we will devote the majority of the nybble values to them.
We therefore indicate a repeat count by a nybble of 14 followed by a packed
number, where a packed number will be explained later.  Since the repeat
count value of one is so common, we indicate a repeat one command by a single
nybble of 15.  A 14 followed by the packed number 1 is still legal for a
repeat one count, however.  The run counts are coded directly as packed
numbers.

For packed numbers, therefore, we have the nybble values 0 through 13.  We
need to represent the positive integers up to, say, $2^{31}-1$.  We would
like the more common smaller numbers to take only one or two nybbles, and
the infrequent large numbers to take three or more.  We could therefore
allocate one nybble value to indicate a large run count taking three or more
nybbles.  We do this with the value 0.

@ We are left with the values 1 through 13.  We can allocate some of these, say
|dyn_f|, to be one-nybble run counts.
These will work for the run counts |1..dyn_f|.  For subsequent run
counts, we will use a nybble greater than |dyn_f|, followed by a second nybble,
whose value can run from 0 through 15.  Thus, the two-byte nybble values will
run from |dyn_f+1..(13-dyn_f)*16+dyn_f|.  We have our definition of large run
count values now, being all counts greater than |(13-dyn_f)*16+dyn_f|.

We can analyze our several dozen pixel files and determine an optimal value of
|dyn_f|, and use this value for all of the characters.  Unfortunately, values
of |dyn_f| that pack small characters well tend to pack the large characters
poorly, and values that pack large characters well are not efficient for the
smaller characters.  Thus, we choose the optimal |dyn_f| on a character basis,
picking the value which will pack each individual character in the smallest
number of nybbles.  Legal values of |dyn_f| run from 0 (with no one-byte run
counts) to 13 (with no two-byte run counts).

@ Our only remaining task in the coding of packed numbers is the large run
counts.  We use a scheme suggested by D.~E.~Knuth
@^Knuth, D.~E.@>
which will simply and elegantly represent arbitrarily large values.  The
general scheme to represent an integer |i| is to write its hexadecimal
representation, with leading zeros removed.  Then we count the number of
digits, and prepend one less than that many zeros before the hexadecimal
representation.  Thus, the values from one to fifteen occupy one nybble;
the values sixteen through 255 occupy three, the values 256 through 4095
require five, etc.

For our purposes, however, we have already represented the numbers one
through |(13-dyn_f)*16+dyn_f|.  In addition, the one-nybble values have
already been taken by our other commands, which means that only the values
from sixteen up are available to us for long run counts.  Thus, we simply
normalize our long run counts, by subtracting |(13-dyn_f)*16+dyn_f+1| and
adding 16, and then representing the result according to the scheme above.

@ The final algorithm for decoding the run counts based on the above scheme
might look like this, assuming a procedure called \\{pk\_nyb} is available
to get the next nybble from the file, and assuming that the global
|repeat_count| indicates whether a row needs to be repeated.  Note that this
routine is recursive, but since a repeat count can never directly follow
another repeat count, it can only be recursive to one level.

@<Packed number procedure@>=
function pk_packed_num : integer ;
var i, j, k : integer ;
begin
   i := get_nyb ;
   if i = 0 then begin
      repeat j := get_nyb ; incr(i) ; until j <> 0 ;
      while i > 0 do begin j := j * 16 + get_nyb ; decr(i) ; end ;
      pk_packed_num := j - 15 + (13-dyn_f)*16 + dyn_f ;
   end else if i <= dyn_f then
      pk_packed_num := i
   else if i < 14 then
      pk_packed_num := (i-dyn_f-1)*16+get_nyb+dyn_f+1
   else begin
      if i = 14 then
         repeat_count := pk_packed_num
      else
         repeat_count := 1 ;
      pk_packed_num := pk_packed_num ;
   end ;
end ;

@ For low resolution fonts, or characters with `gray' areas, run encoding can
often make the character many times larger.  Therefore, for those characters
that cannot be encoded efficiently with run counts, the \.{PK} format allows
bit-mapping of the characters.  This is indicated by a |dyn_f| value of
14.  The bits are packed tightly, by concatenating all of the horizontal raster
rows into one long string, and then packing this string eight bits to a byte.
The number of bytes required can be calculated by |(width*height+7) div 8|.
This format should only be used when packing the character by run counts takes
more bytes than this, although, of course, it is legal for any character.
Any extra bits in the last byte should be set to zero.

@ At this point, we are ready to introduce the format for a character
descripter.  It consists of three parts: a flag byte, a character preamble,
and the raster data.  The most significant four nybbles of the flag byte
yield the |dyn_f| value for that character.  (Notice that only values of
0 through 14 are legal for |dyn_f|, with 14 indicating a bit mapped character;
thus, the flag bytes do not conflict with the command bytes, whose upper nybble
is always 15.)  The next bit (with weight 16) indicates whether the first run
count is a black count or a white count, with a one indicating a black count.
For bit-mapped characters, this bit should be set to a zero.  The next bit
(with weight 8) indicates whether certain later parameters (referred to as size
parameters) are given in one-byte or two-byte quantities, with a one indicating
that they are in two-byte quantities.  The last two bits are concatenated on to
the beginning of the length parameter in the character preamble, which will be
explained below.

However, if the last three bits of the flag byte are all set (normally
indicating that the size parameters are two-byte values and that a 3 should be
prepended to the length parameter), then a long format of the character
preamble should be used instead of one of the short forms.

Therefore, there are three formats for the character preamble, and which one
is used depends on the least significant three bits of the flag byte.  If the
least significant three bits are in the range zero through three, the short
format is used.  If they are in the range four through six, the extended short
format is used.  Otherwise, if the least significant bits are all set, then
the long form of the character preamble is used.  The preamble formats are
explained below.

\yskip\hang Short form: |flag[1]| |pl[1]| |cc[1]| |tfm[3]| |dm[1]| |w[1]|
|h[1]| |hoff[+1]| |voff[+1]|.
If this format of the character preamble is used, the above
parameters must all fit in the indicated number of bytes, signed or unsigned
as indicated.  Almost all of the standard \TeX\ font characters fit; the few
exceptions are fonts such as \.{aminch}.

\yskip\hang Extended short form: |flag[1]| |pl[2]| |cc[1]| |tfm[3]| |dm[2]|
|w[2]| |h[2]| |hoff[+2]| |voff[+2]|.  Larger characters use this extended
format.

\yskip\hang Long form: |flag[1]| |pl[4]| |cc[4]| |tfm[4]| |dx[4]| |dy[4]|
|w[4]| |h[4]| |hoff[4]| |voff[4]|.  This is the general format which
allows all of the
parameters of the \.{GF} file format, including vertical escapement.
\vskip\baselineskip
The |flag| parameter is the flag byte.  The parameter |pl| (packet length)
contains the offset
of the byte following this character descripter, with respect to the beginning
of the |tfm| width parameter.  This is given so a \.{PK} reading program can,
once it has read the flag byte, packet length, and character code (|cc|), skip
over the character by simply reading this many more bytes.  For the two short
forms of the character preamble, the last two bits of the flag byte should be
considered the two most-significant bits of the packet length.  For the short
format, the true packet length might be calculated as |(flag mod 4)*256+pl|;
for the extended format, it might be calculated as |(flag mod 4)*65536+pl|.

The |w| parameter is the width and the |h| parameter is the height in pixels
of the minimum bounding box.  The |dx| and |dy| parameters are the horizontal
and vertical escapements, respectively.  In the short formats, |dy| is assumed
to be zero and |dm| is |dy| but in pixels;
in the long format, |dx| and |dy| are both
in pixels multiplied by $2^{16}$.  The |hoff| is the horizontal offset from the
upper left pixel to the reference pixel; the |voff| is the vertical offset.
They are both given in pixels, with right and down being positive.  The
reference pixel is the pixel which occupies the unit square in \MF; the
\MF\ reference point is the lower left hand corner of this pixel.  (See the
example below.)

@ \TeX\ requires that all characters which have the same character codes
modulo 256 also have the same |tfm| widths, and escapement values.  The \.{PK}
format does not itself make this a requirement, but in order for the font to
work correctly with the \TeX\ software, this constraint should be observed.
The current version of \TeX\ (1.5) cannot output character codes greater
than 255 anyway.

Following the character preamble is the raster information for the
character, packed by run counts or by bits, as indicated by the flag byte.
If the character is packed by run counts and the required number of nybbles
is odd, then the last byte of the raster description should have a zero
for its least significant nybble.

@ As an illustration of the \.{PK} format, the character \char4\ from the font
amr10 at 300 dots per inch will be encoded.  This character was chosen
because it illustrates some
of the borderline cases.  The raster for the character looks like this (the
row numbers are chosen for convenience, and are not \MF's row numbers.)

\vskip\baselineskip
\centerline{\vbox{\baselineskip=10pt
\halign{\hfil#\quad&&\hfil#\hfil\cr
0& & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M\cr
1& & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M\cr
2& & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M\cr
3& & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M\cr
4& & &M&M& & & & & & & & & & & & & & & & &M&M\cr
5& & &M&M& & & & & & & & & & & & & & & & &M&M\cr
6& & &M&M& & & & & & & & & & & & & & & & &M&M\cr
7\cr
8\cr
9& & & & &M&M& & & & & & & & & & & & &M&M& & \cr
10& & & & &M&M& & & & & & & & & & & & &M&M& & \cr
11& & & & &M&M& & & & & & & & & & & & &M&M& & \cr
12& & & & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M& & \cr
13& & & & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M& & \cr
14& & & & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M& & \cr
15& & & & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M& & \cr
16& & & & &M&M& & & & & & & & & & & & &M&M& & \cr
17& & & & &M&M& & & & & & & & & & & & &M&M& & \cr
18& & & & &M&M& & & & & & & & & & & & &M&M& & \cr
19\cr
20\cr
21\cr
22& & &M&M& & & & & & & & & & & & & & & & &M&M\cr
23& & &M&M& & & & & & & & & & & & & & & & &M&M\cr
24& & &M&M& & & & & & & & & & & & & & & & &M&M\cr
25& & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M\cr
26& & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M\cr
27& & &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M\cr
28&*& &M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M&M\cr
&\hphantom{M}&\hphantom{M}\cr
}}}
The width of the minimum bounding box for this character is 20; its height
is 29.  The `*' represents the reference pixel; notice how it lies outside the
minimum bounding box.  The |hoff| value is $-2$, and the |voff| is~28.

The first task is to calculate the run counts and repeat counts.  The repeat
counts are placed at the first transition (black to white or white to black)
in a row, and are enclosed in brackets.  White counts are enclosed in
parentheses.  It is relatively easy to generate the counts list:
\vskip\baselineskip
\centerline{82 [2] (16) 2 (42) [2] 2 (12) 2 (4) [3]}
\centerline{16 (4) [2] 2 (12) 2 (62) [2] 2 (16) 82}
\vskip\baselineskip
Note that any duplicated rows that are not all white or all black are removed
before the repeat counts are calculated.  The rows thus removed are rows 5, 6,
10, 11, 13, 14, 15, 17, 18, 23, and 24.

@ The next step in the encoding of this character is to calculate the optimal
value of |dyn_f|.  The details of how this calculation is done are not
important here; suffice it to say that there is a simple algorithm which in one
pass over the count list can determine the best value of |dyn_f|.  For this
character, the optimal value turns out to be 8 (atypically low).  Thus, all
count values less than or equal to 8 are packed in one nybble; those from
nine to $(13-8)*16+8$ or 88 are packed in two nybbles.  The run encoded values
now become (in hex, separated according to the above list):
\vskip\baselineskip
\centerline{\tt D9 E2 97 2 B1 E2 2 93 2 4 E3}
\centerline{\tt 97 4 E2 2 93 2 C5 E2 2 97 D9}
\vskip\baselineskip\noindent
which comes to 36 nybbles, or 18 bytes.  This is shorter than the 73 bytes
required for the bit map, so we use the run count packing.

@ The short form of the character preamble is used because all of the
parameters fit in their respective lengths.  The packet length is therefore
18 bytes for the raster, plus
eight bytes for the character preamble parameters following the character
code, or 26.  The |tfm| width for this character is 640796, or {\tt 9C71C} in
hexadecimal.  The horizontal escapement is 25 pixels.  The flag byte is
88 hex, indicating the short preamble, the black first count, and the
|dyn_f| value of 8.  The final total character packet, in hexadecimal, is:
\vskip\baselineskip
$$\vbox{\halign{\hfil #\quad&&{\tt #\ }\cr
Flag byte&88\cr
Packet length&1A\cr
Character code&04\cr
|tfm| width&09&C7&1C\cr
Horizontal escapement (pixels)&19\cr
Width of bit map&14\cr
Height of bit map&1D\cr
Horizontal offset (signed)&FE\cr
Vertical offset&1C\cr
Raster data&D9&E2&97\cr
&2B&1E&22\cr
&93&24&E3\cr
&97&4E&22\cr
&93&2C&5E\cr
&22&97&D9\cr}}$$

@ This format was written by Tomas Rokicki in August, 1985.

@* Input and output.
There are three types of files that this program must deal with---standard
text files, files of integers (pixel files), and files of bytes (packed files.)
For our purposes, we shall consider an eight-bit byte to consist of the
values |0..255|.  If your system does not pack these values to a byte, it is
no major difficulty; you must only insure that the input function
|pk_byte| can read packed bytes.

@<Types...@>=
@!eight_bits=0..255; {packed file byte}
@!word_file=packed file of integer; {for pixel file words}
@!byte_file=packed file of eight_bits ; {for packed file words}
@^system dependancies@>

@ @<Glob...@>=
@!pxl_file:word_file; {where the input comes from}
@!pk_file:byte_file;  {where the final output goes}
@^system dependencies@>

@ To prepare these files for input, we |reset| them. An extension of
\PASCAL\ is needed in the case of |pxl_file|, since we want to associate
it with external files whose names are specified dynamically (i.e., not
known at compile time). The following code assumes that `|reset(f,s)|'
does this, when |f| is a file variable and |s| is a string variable that
specifies the file name. If |eof(f)| is true immediately after
|reset(f,s)| has acted, we assume that no file named |s| is accessible.
@^system dependencies@>

@p procedure open_pxl_file; {prepares to write packed bytes in a |pxl_file|}
begin rewrite(pxl_file,pxl_name);
pxl_loc := 0 ;
end;
@#
procedure open_pk_file; {prepares the input for reading}
begin reset(pk_file,pk_name);
pk_loc := 0 ;
end;

@ We need a place to store the names of the input and output files, as well
as a byte counter for the output file.

@<Glob...@>=
@!pxl_name,@!pk_name:packed array[1..name_length] of char; {names of input
    and output files}
@!pxl_loc, @!pk_loc:integer; {how many bytes have we sent?}

@ We need a procedure that will write a word to the \.{PXL} file.  If the
particular system
@^system dependencies@>
requires buffering, here is the place to do it.

@p procedure pixel_integer (i : integer) ;
begin pxl_file^ := i ;
put(pxl_file) ;
incr(pxl_loc) ;
end;

@ We also need a function that will get a single byte from the \.{PK} file.
Again, buffering may be done in this procedure.

@p function pk_byte : eight_bits ;
var nybble, temp : eight_bits ;
begin
   temp := pk_file^ ;
   get(pk_file) ;
   pk_loc := pk_loc + 1 ;
   pk_byte := temp ;
end ;

@ Now we are ready to open the files and write the identification of the
pixel file.

@<Open files@>=
open_pk_file ;
open_pxl_file ;
pixel_integer(pxl_id)

@ As we are reading the packed file, we often need to fetch 16 and 32 bit
quantities.  Here we have two procedures to do this.

@p function get_16 : integer ;
var a : integer ;
begin a := pk_byte ; get_16 := a * 256 + pk_byte ; end ;
@#
function get_32 : integer ;
var a : integer ;
begin a := get_16 ; if a > 32767 then a := a - 65536 ;
get_32 := a * 65536 + get_16 ; end ;

@ Now we read and check the preamble of the \.{PK} file.  In the preamble, we
find the |hppp|, |design_size|, |checksum|.

@<Read preamble@>=
if pk_byte <> pk_pre then abort('Bad pk file!  pre command missing.') ;
if pk_byte <> pk_id then abort('Wrong version of packed file!.') ;
j := pk_byte ;
for i := 1 to j do hppp := pk_byte ;
design_size := get_32 ;
checksum := get_32 ;
hppp := get_32 ; vppp := get_32 ;
if hppp <> vppp then print_ln('Warning:  aspect ratio not 1:1!') ;
magnification := round(hppp * 72.27 * 5 / 65536)

@ Of course, we need to define the above variables.

@<Glob...@>=
@!magnification : integer ; {resolution at which pixel file is prepared}
@!design_size : integer ; {design size in \.{FIXes}}
@!checksum : integer ; {checksum of pixel file}
@!hppp, @!vppp : integer ; {horizontal and vertical points per inch}

@* Character unpacking.
This is the procedure where we read the character packet and decode it into
standard pixel raster format.  We create one row at a time, checking for
repeat commands, and then repeat the row as many times as is necessary.
The information
gleaned from the beginning of the character packet is to be placed in
the arrays |sizes|, |offsets|, |raster_pointer|, and |tfm_width|.
We also check that the length of the character packet is correct.

@<Unpack and write character@>=
dyn_f := flag_byte div 16 ;
flag_byte := flag_byte mod 16 ;
turn_on := flag_byte >= 8 ;
if turn_on then flag_byte := flag_byte - 8 ;
if flag_byte = 7 then
   @<Read long character preamble@>
else if flag_byte > 3 then
   @<Read extended short character preamble@>
else
   @<Read short character preamble@> ;
if raster_pointer[car] <> 0 then abort('Second time this character used!') ;
raster_pointer[car] := pxl_loc ;
@<Read and translate raster description@> ;
if end_of_packet <> pk_loc then abort('Bad pk file!  Bad packet length.')

@ We need a whole lot of globals used but not defined up there.

@<Glob...@>=
@!i, @!j : integer ; {index pointers}
@!end_of_packet : integer ; {where we expect the end of the packet to be}
@!raster_pointer : array [0..127] of integer ; {raster pointers}
@!sizes : array [0..127] of integer ; {character sizes}
@!offsets : array [0..127] of integer ; {character offsets}
@!tfm_width : array [0..127] of integer ; {character \.{tfm} widths}
@!dyn_f : integer ; {dynamic packing variable}
@!car : integer ; {the character we are reading}

@ We also need to initialize the |raster_pointer| array to zeros.

@<Set init...@>=
for i := 0 to 127 do raster_pointer[i] := 0 ;

@ Now, the preamble reading modules.  First, we have the general case: the
long character preamble format.  If the character number is too large, we go
to the label |bad_char|.

@<Read long character preamble@>=
begin
   packet_length := get_32 ; car := get_32 ;
   end_of_packet := packet_length + pk_loc ;
   if (car > 127) or (car < 0) then goto bad_char ;
   tfm_width[car] := get_32 ;
   hor_esc := get_32 ;
   i := get_32 ;
   c_width := get_32 ;
   c_height := get_32 ;
   if (c_width < 0) or (c_height < 0) or (c_width > 65535) or
   (c_height > 65535) then goto bad_char ;
   word_width := (c_width + 31) div 32 ;
   sizes[car] := c_width * 65536 + c_height ;
   i := get_32 ; j := get_32 ;
   if j < 0 then j := j + 65536 ;
   offsets[car] := i * 65536 + j ;
end

@ This module reads the character preamble with double byte parameters.

@<Read extended short character preamble@>=
begin
   packet_length := (flag_byte - 4) * 65536 + get_16 ;
   car := pk_byte ;
   end_of_packet := packet_length + pk_loc ;
   if car > 127 then goto bad_char ;
   i := pk_byte ;
   tfm_width[car] := i * 65536 + get_16 ;
   hor_esc := get_16 ;
   c_width := get_16 ;
   c_height := get_16 ;
   word_width := (c_width + 31) div 32 ;
   sizes[car] := c_width * 65536 + c_height ;
   i := get_16 ; j := get_16 ;
   if i > 32767 then i := i - 65536 ;
   offsets[car] := i * 65536 + j ;
end

@ Here we read the most common character preamble, that with single byte
parameters.

@<Read short character preamble@>=
begin
   packet_length := flag_byte * 256 + pk_byte ;
   car := pk_byte ;
   end_of_packet := packet_length + pk_loc ;
   if car > 127 then goto bad_char ;
   i := pk_byte ;
   tfm_width[car] := i * 65536 + get_16 ;
   hor_esc := pk_byte ;
   c_width := pk_byte ;
   c_height := pk_byte ;
   word_width := (c_width + 31) div 32 ;
   sizes[car] := c_width * 65536 + c_height ;
   i := pk_byte ; j := pk_byte ;
   if i > 127 then i := i - 256 ;
   if j > 127 then j := j + 255 * 256 ;
   offsets[car] := i * 65536 + j ;
end

@ Some more globals:

@<Glob...@>=
@!c_height, @!c_width : integer ; {sizes of the character glyphs}
@!c_offsets : integer ; {where to put the character offsets}
@!word_width : integer ; {width of character in raster words}
@!hor_esc : integer ; {the character escapement}
@!packet_length : integer ; {the length of the packet in bytes}

@ Now we have the most important part of the program, where we actually
interpret the commands in the raster description.  First of all, we need
a procedure to get a single nybble from the file, as well as one to get
a single bit.  We also use the |pk_packed_num| procedure defined in the
\.{PK} file description.

@p function get_nyb : integer ;
var temp : eight_bits ;
begin
   if bit_weight = 0 then begin
      input_byte := pk_byte ;
      bit_weight := 16 ;
   end ;
   temp := input_byte div bit_weight ;
   input_byte := input_byte - temp * bit_weight ;
   bit_weight := bit_weight div 16 ;
   get_nyb := temp ;
end ;
@#
function get_bit : boolean ;
var temp : boolean ;
begin
   bit_weight := bit_weight div 2 ;
   if bit_weight = 0 then begin
      input_byte := pk_byte ;
      bit_weight := 128 ;
   end ;
   temp := input_byte >= bit_weight ;
   if temp then
      input_byte := input_byte - bit_weight ;
   get_bit := temp ;
end ;
@<Packed number procedure@>

@ Now, the globals to help communication between these procedures, and a buffer
for the raster row.

@<Glob...@>=
@!input_byte : eight_bits ; {the byte we are currently decimating}
@!bit_weight : eight_bits ; {weight of the current bit}
@!nybble : eight_bits ; {the current nybble}
@!row : array [0..max_div_32] of integer ; {where the row is constructed}

@ And the main procedure.

@<Read and translate raster description@>=
bit_weight := 0 ;
if dyn_f = 14 then
   @<Get raster by bits@>
else @<Create normally packed raster@>

@ If |dyn_f=14|, then we need to get the raster representation
one bit at a time.

@<Get raster by bits@>=
begin
bit_weight := 0 ;
for i := 1 to c_height do begin
   word := 0 ;
   word_weight := 31 ;
   for j := 1 to c_width do begin
      if get_bit then word := word + power[word_weight] ;
      word_weight := word_weight - 1 ;
      if word_weight = -1 then begin
         pixel_integer(word) ;
         word := 0 ;
         word_weight := 31 ;
      end ;
   end ;
   if word_weight < 31 then
      pixel_integer(word) ;
end;
end

@ We need the power array, which contains the powers of two, as well as the
|word| and |word_weight| variables for the above procedure.

@<Glob...@>=
@!word : integer ; {contains partially created raster word}
@!word_weight : integer ; {weight of the current position}
@!power : array [0..31] of integer ; {contains the powers of two}
@!gpower : array [0..32] of integer ; {contains various rows of black}

@ @<Set init...@>=
power[0] := 1 ;
for i := 1 to 30 do power[i] := power[i-1] * 2 ;
power[31] := - one_fourth - one_fourth ;
gpower[0] := 0 ;
for i := 1 to 32 do gpower[i] := gpower[i-1] + power[i-1] ;

@ @<Const...@>=
one_fourth = 1073741824 ;

@ Otherwise, we translate the bit counts into the raster rows.  |count|
contains the number of bits of the current color, and |turn_on| indicates
whether or not they should be black.  |rows_left| contains the number of
rows to be sent.

@<Create normally packed raster@>=
begin
rows_left := c_height ;
h_bit := c_width ;
repeat_count := 0 ;
word_weight := 32 ;
word := 0 ;
r_p := 1 ;
while rows_left > 0 do begin
   count := pk_packed_num ;
   while count > 0 do begin
      if (count < word_weight) and (count < h_bit) then begin
        if turn_on then word := word + gpower[word_weight] -
                                       gpower[word_weight - count] ;
        h_bit := h_bit - count ;
        word_weight := word_weight - count ;
        count := 0 ;
      end else if (count >= h_bit) and (h_bit <= word_weight) then begin
        if turn_on then word := word + gpower[word_weight] -
                                       gpower[word_weight - h_bit] ;
        row[r_p] := word ;
        @<Send row@> ;
        rows_left := rows_left - repeat_count - 1 ;
        repeat_count := 0 ;
        r_p := 1 ;
        word := 0 ;
        word_weight := 32 ;
        count := count - h_bit ;
        h_bit := c_width ;
      end else begin
         if turn_on then word := word + gpower[word_weight] ;
         row[r_p] := word ;
         r_p := r_p + 1 ;
         word := 0 ;
         count := count - word_weight ;
         h_bit := h_bit - word_weight ;
         word_weight := 32 ;
      end ;
   end ;
   turn_on := not turn_on ;
end ;
if (rows_left <> 0) or (h_bit <> c_width) then
   abort('Bad pk file---more bits than required!');
end

@ We need to declare the repeat flag, bit counter, and color flag here.

@<Glob...@>=
@!repeat_count : integer ; {how many times to repeat the next row?}
@!rows_left : integer ; {how many rows left?}
@!turn_on : boolean ; {are we black here?}
@!h_bit : integer ; {what is our horizontal position?}
@!count : integer ; {how many bits of current color left?}
@!r_p : integer ; {where in the row are we?}

@ @<Send row@>=
for i := 0 to repeat_count do
   for j := 1 to word_width do
      pixel_integer(row[j])

@ To finish up the pixel file, we simply write out the data for each of the
characters, including the tfm widths, their raster locations, sizes, and
offsets.  Then, we finish up the pixel file with the |magnification|,
|design_size|, |checksum|, and another |pxl_id|.

@<Write pixel directory@>=
dir_ptr := pxl_loc ;
for car := 0 to 127 do begin
   pixel_integer(sizes[car]) ;
   pixel_integer(offsets[car]) ;
   pixel_integer(raster_pointer[car]) ;
   pixel_integer(tfm_width[car]) ;
end ;
pixel_integer(checksum) ;
pixel_integer(magnification) ;
pixel_integer(design_size) ;
pixel_integer(dir_ptr) ;
pixel_integer(pxl_id)

@ We need a global |dir_ptr| which holds the location of the pixel
directory, as well as the |flag_byte| variable.

@<Glob...@>=
@!dir_ptr : integer ; {where does the pixel directory begin?}
@!flag_byte : integer ; {command or character flag byte}

@ Another necessary procedure skips over any specials between characters
and before and after the postamble.

@p procedure skip_specials ;
var i, j, k : integer ;
begin
   repeat
      flag_byte := pk_byte ;
      if flag_byte >= 240 then
         case flag_byte of
            240, 241, 242, 243 :
begin
   i := 0 ;
   for j := 240 to flag_byte do i := 256 * i + pk_byte ;
   for j := 1 to i do k := pk_byte ;
end ;
            244 : i := get_32 ;
            245 : begin end ;
            246 : begin end ;
            247, 248, 249, 250, 251, 252, 253, 254, 255 :
              abort('Unexpected ', flag_byte:1,'!') ;
         endcases ;
   until (flag_byte < 240) or (flag_byte = pk_post) ;
end ;

@* Terminal communication.
We must get the file names and determine whether input is to be in
hexadecimal or binary.  To do this, we use the standard input path
name.  We need a procedure to flush the input buffer.  For most systems,
this will be an empty statement.  For other systems, a |print_ln| will
provide a quick fix.  We also need a routine to get a line of input from
the terminal.  On some systems, a simple |read_ln| will do.  Finally,
a macro to print a string to the first blank is required.

@d flush_buffer == begin end
@d get_line(#) == if eoln(input) then read_ln(input) ;
   i := 1 ;
   while not (eoln(input) or eof(input)) do begin
      #[i] := input^ ;
      incr(i) ;
      get(input) ;
   end ;
   #[i] := ' '

@ @p procedure dialog ;
var i : integer ; {index variable}
buffer : packed array [1..name_length] of char; {input buffer}
begin
   for i := 1 to name_length do begin
      pxl_name[i] := ' ' ;
      pk_name[i] := ' ' ;
   end;
   print('Input file name:  ') ;
   flush_buffer ;
   get_line(pk_name) ;
   print('Output file name:  ') ;
   flush_buffer ;
   get_line(pxl_name) ;
end ;

@* The main program.
Now that we have all the pieces written, let us put them together.

@p begin
initialize ;
dialog ;
@<Open files@> ;
@<Read preamble@> ;
skip_specials ;
while flag_byte <> pk_post do begin
   @<Unpack and write character@> ;
   goto good_char ;
   bad_char: while pk_loc <> end_of_packet do i := pk_byte ;
      print_ln('Character ',car:1,' out of range!') ;
   good_char: skip_specials ;
end ;
while not eof(pk_file) do i := pk_byte ;
@<Write pixel directory@> ;
print_ln(pk_loc:1,' bytes read from packed file.');
print_ln((pxl_loc*4):1,' bytes written to pixel file.');
final_end :
end .

@* System-dependent changes.
This section should be replaced, if necessary, by changes to the program
that are necessary to make \.{PKtoPX} work at a particular installation.
Any additional routines should be inserted here.
@^system dependencies@>

@* Index.
Pointers to error messages appear here together with the section numbers
where each ident\-i\-fier is used.