The J Programming Language, also known as Oh God My Eyes Are Bleeding

So, back towards the beginning of the semester, we had a simple assignment in Data Compression: compute pixel differences for an image file. The differences will be defined “going backwards” — the i’th difference will be the i’th value, minus the (i-1)th value (as opposed to the (i+1)th). So, including a “virtual” pixel value of 128 at the beginning, a list of numbers such as

0 1 4 4 3 10

would be transformed into the list

-128 1 3 0 -1 7. Because these are really 8-bit unsigned char values, this is equivalent to 128 1 3 0 255 7.

In a mainstream, Algol-derived language like C or Python, this problem has a simple, obvious solution that is intelligible even to programmers who don’t know the language. Something like this:

import sys

if len(sys.argv) != 2:
  print "Error! Must give an input filename."
  print "\tUsage: python imgdiff.py in_file"
  sys.exit(1)

infile_name = sys.argv[1]
outfile_name = infile_name + ".diff"


infile = open(infile_name, "rb")
data = infile.read() # Read all bytes from file
infile.close()

data = chr(128) + data # Prepend pseudo-pixel for diff. calculation

# Calculate the differences between each pixel in the data
diff_at = lambda i: ord(data[i]) - ord(data[i-1])
diff = [chr( diff_at(i) % 256 ) for i in range(1, len(data))]

outfile = open(outfile_name, "wb")
outfile.writelines(diff) # Write the difference values to the output file
outfile.close()

This Python program is short and simple. It has six blocks, half of which are just one line. For the non-programmers who might be interested in commentary (hi Joe!): The first “big” block ensures that we have an input file to read values from and a file to write to; the second block just reads the value in. The line data = chr(128) + data just adds the value 128, our virtual pixel value, to the beginning of the list. The next line is a direct translation of our definition of a difference pixel, given above, into Python syntax. The trickiest line is the one defining diff, because it makes use of a list comprehension. From the inside out, chr( diff_at(i) % 256 ) computes the i’th pixel difference, modulo 256 (which is the maximum value a character (chr) can be. That computation is repeated for every pixel in the file, and the resulting list is assigned to diff. Finally, the last line prints out the differences to the output file.

So that’s nice: simple, clean, easy, boring. I wanted something more… esoteric.

Usually, when a computer science class gets an assignment, the instructor picks the language the students use, but for this assignment, we were given free reign to pick any language we wanted to get the job done. Since this particular professor puts a lot of emphasis on program readability, I thought it would be amusing to see the expression on his face when presented with a valid solution in a completely unintelligible language. But which language to use?

Perhaps the most amusing might have been a language called Whitespace (in which a valid solution would be a blank sheet of paper… or maybe two blank sheets of paper), but that would have been pushing it, even for me. But then I remembered reading something about J, somewhere. I thought it had been cited in an exhortation from Steve Yegge for programmers to learn a wider variety of languages, but I can’t seem to find it now.

So, J.

Here’s the equivalent program in J:

((256|(2-~/\])128,a.i.1!:(2}ARGV)){a.)1!:3(3}ARGV)

Fun, eh?

Here’s an easier-to-understand version:

infile  =: 2}ARGV            NB. grab command line args
outfile =: 3}ARGV
chars   =: 1!:1 infile       NB. 1!:1 means read contents of file
nums    =: a. i. chars       NB. convert chars to ascii indices
diff    =: 2 -~/\ ]          NB. uhh...
diffs   =: 256|diff 128,nums
(diffs {a.) 1!:3 outfile

Okay, so that’s not THAT bad for the first few lines. Sure, 1!:1 is a pretty atrocious syntax for reading in file contents. but we can look beyond that for now. a. i. chars is actually sort of cool. a. is a table of the ASCII characters, chars is a list of the characters from the file, and i. is (in this context) the index-of operator. It works like this:


'abcde' i. 'bed'
1 4 3

In essence, a. i. is the J equivalent of Python’s ord function, but built out of a more fundamental operator. Cool enough to forgive the, er, terse syntax. But what the heck is the next line?!? diff =: 2 -~/\ ] — are you kidding me?

Actually, it’s straight out of the J vocabulary reference. Sentences in J read right-to-left. The ] is an identity operator; it simply selects whatever comes to its right. In essence, here it stands in for the thing being diffed, much like the word “it” itself. The \ character means infix, where x is 2, u is -~/, and y is ]. I’ll cover u in a second, but first, a quick illustration of \. Suppose you wanted to select every successive pair of elements from the list 1 2 3 4. Then u would simply be the identity function ], like this:

2 ] \ 1 2 3 4
1 2
2 3
3 4

So what does -~/ mean? - is the subtraction operator, simple enough. / is the insert operator, so +/ 1 2 3 is 1+2+3, and -/ 1 2 is 1 – 2. But note that we don’t want 1 – 2, we want 2 – 1. That’s what ~ does — it swaps arguments, so 1 -~ 2 is the same as 2 – 1. Phew!

256| means compute values mod 256. { a. is the equivalent of the chr function in Python, converting integers back to characters. And, finally, 1!:3 prints.

Simple and intuitive, eh?

Here’s another example. The task is to take a binary file and figure out the Huffman codewords encoded therein. The file format is 256 words of 4 bytes, followed by 256 sets of 1-byte lengths. Each length gives how many of the low-order bits from the corresponding word are part of the Huffman codeword.

Python, I was pleased to see, has a module called struct that is built for doing exactly this kind of bit-level interpretation. Given a string of 4 chars, the value of those chars as an integer can be had with struct.unpack(“l”, chars). Cool! Unfortunately, Python doesn’t have built-in libraries for converting integers to binary strings. The end program ended up being about 25 lines, not including trivial things like file input.

J fares rather better. Negative numbers passed to the infix operator gives non-overlapping infixes, perfect for splitting our list of bytes into chunks of 4.

_3 ]\ 'abcdefghi'
abc
def
ghi

And J has built-in operators for converting to and from binary representation of integers. So, given a list of four integers, we can select the low, oh, four bits like this:

(-4) {. , #: 0 0 0 9
1 0 0 1

That really speaks to the conciseness of J, I think. From 25 lines of Python to one line of J.

J is like concentrated Perl, with all the sugar evaporated out. Honestly, it makes my brain hurt to look at J for too long.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s