Archive for the ‘smalltalk’ Category

What’s a Binary Selector?

Sunday, May 6th, 2007

Intuitively, binary selectors are obvious—you string together a few funky characters and you have yourself a binary selector. And binary selectors can useful for making some things easier to read. At least that’s the theory, and reading the ANSI standard one might assume everything is exactly that clear. The practice in Smalltalk-80 descendants is different.

To begin with, Smalltalk-80 only allowed two characters in a binary selector, ruling out beauties such as <=>, <+> or >>=. Fortunately, the ANSI standard has dropped this limit, and Squeak also allows selectors of more than two characters. VisualWorks, however, still sticks to the old ways.

Then there is the wrench negative number literals throw in the works. Intuitively, 3 –– 4 is supposed to send the #– message to 3—and probably cause an MNU. Indeed, that is what would happen in an ANSI-compliant Smalltalk. Instead, Squeak will happily parse and evaluate the above to 7, while VisualWorks will throw a compiler error saying that an argument is expected following the first minus.

Here is why that is. A negative number literal in Smalltalk-80 is simply a minus token followed by a number token. Squeak still sticks to that interpretation, and so it sees no difference between "-4" and "-  4". Both are valid forms of writing a negative four. ANSI and VisualWorks have since moved to a less quirky behavior treating the minus as part of the literal number token. Interestingly, the cute syntax diagrams at the end of the Blue Book suggest Smalltalk-80 does the same, but evaluating "-  4" shows that it doesn’t.

But wait, before the second minus in "3 –– 4" got attached by Squeak to the number that follows, why did it get detached from the first one to begin with?

And here lurks another feature. This time it is in fact documented by the cute syntax diagrams. A binary selector is made of special characters, but a minus is a special case of a special character. It’s only allowed once, and only as the first character of a binary selector. So, -, -> or -+ are legal, +-, <- or -- are not. Clearly, the point of this feature was to have “3+-4” or “3––4” parse “sensibly” as arithmetic expressions rather than exotic message sends.

This also explains why VisualWorks treats “3 –– 4” as an error. It switched from the classical treatment of a minus before a literal to a more modern (or at least compliant with the Blue Book syntax diagrams) treatment, so it doesn’t see the second minus as belonging to the 4 that follows—but it still refuses to accept “––” as a valid binary selector, hence the error. Strange, that.

And that’s not the end of it. The problem with a minus is that the grammar uses it for two purposes—as part of a number literal and also as part of a selector. Another character just like that is a vertical bar. “foo || bar” doesn’t work in Squeak and VisualWorks. And it never did, going as far back as Smalltalk-80 again.

3 || 4

This time the scanner doesn’t even try to be nice. A vertical bar always parses as a separate token, so anything like || or |> is treated by the scanner as a sequence of two tokens and then rejected by the parser as invalid syntax. So in fact the only legal binary selector with a vertical bar is the vertical bar itself. (This again is contrary to what the Blue Book claims Smalltalk-80 recognizes as a binary selector).

All that was perhaps way too much nit-picking as far as any practical programming is concerned, but I find two things interesting about this situation.

One is that the Blue Book misrepresents the grammar of the actual implementation, and not once but twice. The version in the book is probably more of what the authors wanted the grammar to be, rather than an accurate documentation of the ad-hoc implementation in the image. Maybe this was why one of these quirks later got fixed in the ParcPlace branch of the family.

The second interesting point is why is this such a minor issue? How many people in the past 25 years have actually wanted to write “foo || bar” or “foo <- bar” and discovered they didn’t work? Do Smalltalkers use binary messages creatively—that is, outside (semi-)traditional math, concatenation and point creation? And if, as it seems, they do not—why?

NotNil, then what?

Sunday, April 22nd, 2007

This is about a fix of a conceptual bug in the current version of Squeak, though it’s the detailed analysis why this was a bug that I think is worth a blog post.

A while ago when ifNil: and ifNotNil: were not standard in the commercial Smalltalks of the day, there was an occasional argument on c.l.s now and then whether they were a good thing or superfluous sugar not providing any new functionality. The typical case against ifNil: is something like

foo ifNil: [self doSomething]

which can trivially be rewritten in “classic” Smalltalk. The case that is less trivial, though, is

^self foo ifNil: [self defaultFoo]

It’s important that the value being tested is computed and not just fetched from a variable. For a computed value, we cannot in the general case reduce the above to

^self foo isNil ifTrue: [self defaultFoo] ifFalse: [self foo]

because this would evaluate self foo twice—something we would want to avoid if self foo had side effects. Avoiding double evaluation would call for a variable to hold onto the result of self foo:

| foo |
foo := self foo.
foo isNil ifTrue: [self defaultFoo] ifFalse: [foo]

which is significantly more verbose than the ifNil: alternative. So, while ifNil: in this case can still be reduced to “classic” Smalltalk, doing so takes us down an abstraction level—from a single message doing exactly what it says to messing around with variables, tests and branches. While in some languages such exposed plumbing is a fact of life, in Smalltalk we like to hide it when we don’t need to deal with it directly.

Now, looking at ifNotNil:, it’s important to note that its typical use case is different. In fact, this is what’s interesting about ifNil: and ifNotNil: in general—they are asymmetrical. While ifNil: allows us to provide a useful value in the cases when the “main” branch of the computation returns nil, it’s unlikely that we’d ever use ifNotNil: in a mirrored pattern as

^self foo ifNotNil: [nil]

This shows that the cause of the asymmetry is the obvious fact that typically nil is used as a token indicating the absence of a value. A non-nil object is interesting in its own right, while nil isn’t.

So, ifNotNil: is primarily useful not as a value-producing expression, but rather as a control statement that triggers computation when another expression produces a useful result, in the simplest case going as something like

foo ifNotNil: [self doSomethingWith: foo]

This use of ifNotNil: could again be reduced to isNil ifFalse:. The case when the receiver of ifNotNil: is computed is more interesting because of the same problem of avoiding multiple evaluation, the obvious solution to which would again make the code much bulkier:

| foo |
foo := self computeFoo.
foo ifNotNil: [self doSomethingWith: foo]

The way to hide the plumbing here is to have ifNotNil: accept a one-argument block and feed the receiver into it, allowing us to fold the above back into a single expression

self computeFoo ifNotNil: [:foo | self doSomethingWith: foo]

This illustrates another asymmetry of ifNotNil: and ifNil:—while ifNil: block needs no arguments because nil is not an “interesting” object, it’s often helpful for ifNotNil: to take the non-nil receiver as argument.

A number of years ago Squeak had ifNil: and ifNotNil: implemented exactly this way. The former would take a niladic block as the argument and the latter (as far as I can remember) would only accept a monadic one. When a few years ago Eliot and I were adding the two to VisualWorks we kept almost the same pattern, extending the ifNotNil: case to also accept a niladic block.

In Squeak the two messages have since then been reimplemented as special messages, expanded by the compiler into the equivalent == nil ifTrue:/ifFalse: forms. Inexplicably, in the process ifNotNil: was changed to only accept a niladic block—precisely the case that is less valuable! Also interestingly, the fallback implementation in ProtoObject still allows a monadic block in a “pure” ifNotNil: but not as the ifNotNil argument of ifNil:ifNotNil: and ifNotNil:ifNil:! Their classification under the ‘testing’ protocol is a minor nit in comparison.

And now for the fix. It is available as a zip file MonadicIfNotNil.zip with two change sets. One modifies the handling of ifNotNil: and related messages to allow monadic blocks. The second contains the tests and should obviously be filed in after the compiler changes in the first set.

The change was fairly straightforward. Most of the work was dancing around the original error checking code that assumes only niladic blocks are legal as arguments of macroexpanded messages. The expansion simply promotes the block argument into a method temp, expanding

self foo ifNotNil: [:arg | ...]

into

(arg := self foo) ifNotNil: [...]

In VisualWorks such treatment would count as incorrect, but this promotion of a block argument into a method temp is in fact the classic Smalltalk-80 trick still surviving in Squeak in the similar treatment of the to:do: message.

Sections Wrap-Up

Friday, April 6th, 2007

To wrap up for now the thread of functional-like features, here is a very simple implementation of sections that doesn’t rely on currying:

Object>>~ aSymbol
    aSymbol numArgs = 1 ifFalse: [self error: 'Invalid selector'].
    ^[:arg | self perform: aSymbol with: arg]

Symbol>>~ anObject
    self numArgs = 1 ifFalse: [self error: 'Invalid selector'].
    ^[:arg | arg perform: self with: anObject]

First of all, note the ambiguity of the section operator that we didn’t discuss in the previous post. What does

#, ~ #,

mean? Is it a left section prepending a comma to the argument, or a right section appending it?

Considering the implementation, clearly it’s the latter, but this particular choice of behavior is only a side effect of the implementation. In principle though, the intended meaning is undecidable. There is no difference in Smalltalk between a symbol and a selector—so there is no way to tell which comma is the operator and which is the section argument. This creates two potential gotchas.

One, it is impossible to create a left section with a Symbol as the fixed argument. We can’t make a block to check whether the symbol #foobar: includes a given character by writing

#foobar: ~ #includes:

The second, related, problem is that the behavior of the section operator can vary if the first argument is a variable:

foo ~ #includes:

creates a left section if the current value of foo is anything but a Symbol, a right section if it is a one-argument selector, or fails otherwise.

In practice, the ambiguity can be avoided by “downgrading” the first argument to a string in situations where it might be a symbol we don’t want to be mistaken for a selector.

A few people asked for some examples how some of the things I talked about could be useful. As I wrote before, I don’t think currying of blocks can be very useful in practical Smalltalk, simply because the Smalltalk school of expression is different.

I have a slightly different opinion about sections. I use them (the simple currying-independent implementation above) in the framework I’m working on. I hope to write more about that one day, but for now I’ll adapt the example to my past project, VisualWorks Announcements.

Remember that it’s possible to subscribe for announcements by using either a block

anObject
    when: SomethingHappened
    do: [:ann | ...do something with ann...]

or a receiver-selector pair

anObject
    when: SomethingHappened
    send: #processAnnouncement:
    to: self

Given sections, we could handle the receiver-selector case using the same message we use for blocks:

anObject
    when: SomethingHappened
    do: self ~ #processAnnouncement:

Of course, we could also simply drop the receiver-selector option and require explicitly writing

[:ann | self processAnnouncement: ann]

but arguably the block isn’t quite as succinct as the equivalent section. So, what does this buy us?

Of course, we simplify the API. There is now only one subscription message instead of two. What was a separate case becomes part of the same mechanism.

The implementation is also simpler. In the original we needed to remember the receiver and the selector to send to it. (The block case is handled by remembering the block as the receiver and setting the selector to #value:—granted, I’m simplifying the Announcements picture a little, but as I said my example is only an adaptation). Once we throw out separate support for the receiver-selector case, all we do is remember the block and evaluate it when needed.

There even is a potential performance improvement. The receiver-selector implementation always sends #perform:with: to the receiver to deliver an announcement, even for those subscriptions that use a block. The simplified implementation instead evaluates the block to set things in motion, and #perform:with: only enters the picture together with sections in those cases where we specifically want a receiver-selector option.

Sections

Saturday, March 31st, 2007

Today we are bringing together selectors as blocks from the last post and currying discussed before, to produce something pretty neat: sections.

We begin with an example. Using asBlock from the last post, we can write

#> asBlock

to mean a block comparing two arguments. If we curry it with a number, say 42:

#> asBlock curried value: 42

we get a one-argument block which, when invoked, tells whether 42 is greater than the argument. We could use it as a regular one-argument block, for example, with select:

#(1 20 50 43 11) select: (#> asBlock curried value: 42)

Of course, this by itself isn’t particularly exciting. [:each | 42 > each] would have done the same, and without excessive mental acrobatics.

Without being too concerned about the form for now, let’s consider what we have just done by writing that expression. We took an operator (a binary selector) and produced a function (a block) which is an application of that selector to 42 on the left and the function argument on the right. In Haskell such a construct is called a left section of an operator. Similarly, a block applying #> with 42 on the right and the argument on the left would be a right section. This is an interesting concept—apart from the unwieldy shape it had in our code. Let’s fix that.

We add two methods to the system, one to the Symbol class, the other to Object.

Object>>~ mustBeSymbol
	| block |
	block := mustBeSymbol asBlock.
	block numArgs ~= 2 ifTrue: [self error: 'invalid selector'].
	^block curried value: self

Symbol>>~ anObject
	| block |
	block := self asBlock.
	block numArgs ~= 2 ifTrue: [self error: 'invalid selector'].
	^[:a :b | block value: b value: a] curried value: anObject

This defines ~ as a section operator. Sent to a symbol, it produces a right section, sent to anything else with a symbol as the argument—a left section. We can now rewrite the example as

#(1 20 50 43 11) select: #> ~ 42

or to select the opposite

#(1 20 50 43 11) select: 42 ~ #>

This is more interesting than just currying (and can if fact be rewritten to not rely on currying). It will work for any binary or one-argument keyword message:

Transcript ~ #show:

is a block that writes its argument to the transcript.

#print: ~ foo

produces a block writing the printString of whatever was in the variable foo to the argument, which must be a stream-like object.

#, ~ ', hello!'

is a block that appends ‘, hello!’ to the argument, and

'Hello ' ~ #,

prepends ‘Hello ‘ to it. In general, we can think of a tilda as “stick the selector and the object together, and the object missing for this to be a complete message send will be provided when the block we create is called”.

Selectors as Blocks

Tuesday, March 27th, 2007

In the previous two posts we implemented currying in Smalltalk, and in no fewer than two different ways. As Travis asked in a comment, what good is that other than being cool?

First of all, coolness is a virtue in itself. It says a lot about the Smalltalk system and the principles of its design that something like this can be added to it in a matter of minutes by changing a method or two (or not changing anything at all).

Coolness aside, I don’t expect currying to be as useful in Smalltalk as it is elsewhere. The means of expression Smalltalk and Smalltalkers rely on are different from those of functional languages. Higher-order functions are not nearly as pervasive. Blocks in Smalltalk have always been subservient to objects, so much so that they are not real closures in a few implementations (and some Smalltalkers even proposed to get rid of them). So from a pragmatic viewpoint Smalltalk blocks are good enough as they are. But I’m not interested in being pragmatic here. This is an exercise in looking at familar things in an unfamiliar perspective, or mixing a new ingredient into the standard mix just to see what happens. Smalltalk is a pretty good chemistry set.

Today we continue our exploration by adding this method to the Symbol class:

asBlock
	| arity |
	arity := self numArgs + 1.
	arity = 1 ifTrue: [^[:r | r perform: self]].
	arity = 2 ifTrue: [^[:r :a | r perform: self with: a]].
	arity = 3 ifTrue: [^[:r :a :b | r perform: self with: a with: b]].
	"... and to keep the example simple..."
	self error: 'too many arguments'

As advertised by the selector, it turns a symbol into a block. The block sends the symbol as a message to the first argument, passing the remaining arguments as the arguments of the message. Given this method, we can write:

#('Hello' 'world') collect: #size asBlock

to collect the sizes of the collection elements.

That’s right, this reminds the (in)famous Symbol>>value: hack, however it avoids the problems the hack has.

Consider the meaning of numArgs. This message can be sent to both blocks and selectors to determine how many arguments they take. Symbol>>value: pretends that symbols are the same as blocks. Unfortunately, considered as a selector #isLower has zero arguments, while considered as a block it has one. The same holds for any other selector: a Symbol’s numArgs doesn’t count the receiver as an argument, while the receiver does become an argument of the block the selector pretends it is. (The reason arity in the code above is numArgs + 1).

In practice this means that if we pass a Symbol such as #isLower to code that explicitly checks the arity of a block it receives by doing something like

aBlock numArgs = 1 ifFalse: [self error: 'Invalid block'].
^aBlock value: foo

the code will reject it, even though #isLower was supposed to pass for a one-argument block.

Furthermore, Symbol>>value: does nothing of value (sorry) for selectors of more than zero arguments. In contrast, asBlock is a uniform mechanism to cross over from the domain of selectors to the domain of blocks equivalent to those selectors sent to the block’s first argument. In particular, binary and one-argument keyword selectors can mix well with inject:into:, fold: and with:collect:.

#(1 2 3) with: #(4 5 6) collect: #@ asBlock
#(20 20 42 16) inject: 0 into: #max: asBlock
(1 to: 6) fold: #* asBlock  "6 factorial"
#('Hello ' 'world' '!') fold: #, asBlock

Currying in Smalltalk, part 2

Saturday, March 24th, 2007

In the first part we changed the behavior of the value: protocol of blocks to support currying. That changes the existing behavior of the system, making it possible for those messages to answer a block instead of a “real” value. Is that a problem?

One could argue that the new behavior comes into force only in the situations where the original system would signal an error anyway, so if it falls over because of getting a block it didn’t expect, it would still have fallen over in the old system because of the number of arguments mismatch. On the other hand, what if it doesn’t fall over immediately after it gets that unexpected block? It is possible for the new behavior to mask an error and make it difficult to find its cause later. Also, some might not like this kind of a change on the familiarity grounds. Gilad was the one who got me on the track of playing with currying, and for these reasons he suggested that it might be a good idea to keep value: and friends intact and introduce a different evaluation message that would allow currying–something like curry:curry:.

A similar in spirit approach, but one that gives us the best of both worlds, is to introduce a unary message curried used as a prefix to value: to indicate that currying is allowed. The example from the previous post then becomes

| add inc |
add := [:a :b | a + b].
inc := add curried value: 1.
(inc value: 2) + (inc value: 3)

This both preserves the familiar evaluation protocol and declares that currying is possible. To implement this, unlike in the Part 1 version, we leave valueWithArguments: unchanged from its standard implementation. Instead, we introduce a new class called Currier with an instance variable block and an initialization method block:, and add the following method to the block class (either BlockContext or BlockClosure, depending on the Smalltalk dialect):

curried
    ^Currier new block: self

This injects a Currier into the message stream as add curried value: 1 is evaluated, so that the value: message is received by the Currier. To process it, the Currier needs to invoke the original block if there are enough arguments to do so immediately, or create a CurriedBlock otherwise:

value: anObject
    block numArgs = 1
        ifTrue: [^block value: anObject].
    block numArgs > 1
        ifTrue: [^CurriedBlock new block: block arguments: {anObject}].
    self error: 'Incorrect number of arguments'

The above assumes that an expression such as {anObject} creates an Array with anObject, as in Squeak or in VisualWorks with my BraceConstructor extension.

Other value:... protocol messages are implemented similarly. Also, the CurriedBlock class should implement the same curried method as “real” blocks do–curried blocks are no different from regular ones, so they can be “re-curried”. (In Part 1 we would get such re-currying for free).

Currying in Smalltalk

Friday, March 23rd, 2007

Currying, unless you are into Thai cooking, is partial function application commonly used in functional languages. (Some argue it should be called Schönfinkelisation, but oddly so far the term hasn’t caught on). Given a function of N arguments, one can invoke it with K arguments, K < N. This produces a new “curried” function of N – K arguments, with the K arguments of the original call closed over (remembered).

In Smalltalk, currying would allow the example:

| add inc |
add := [:a :b  | a + b].
inc := add value: 1.
(inc value: 2) + (inc value: 3)

run without errors and produce 7. This feels like a significant change down in the foundations of the universe, but amazingly it is trivial to implement.

Both in VisualWorks and Squeak, when a Block is invoked with a wrong number of arguments, message processing ends with a primitive failure in the valueWithArguments: method. The standard Smalltalk response to that is to complain about the wrong number of arguments. All we need is change that response to curry the receiver when appropriate.

To make the example shorter and nicer (we’ll see why a little later), we define a class called CurriedBlock, with instance variables block and arguments and an obvious initialization method block:arguments:. We also change the standard valueWithArguments: as follows:

valueWithArguments: anArray
	<primitive: 500>
	(anArray isMemberOf: Array)
		ifTrue: [anArray size < self numArgs
			ifTrue: [^CurriedBlock new block: self arguments: anArray]
			ifFalse: [self error: 'Incorrect number of arguments']]
		ifFalse: [self error: 'valueWithArguments: requires an Array']

The method above is a simplification of the VisualWorks version, with the horrible UserMessages replaced by readable strings. The Squeak version is nearly identical.

CurriedBlock needs to reproduce the standard block protocol:

valueWithArguments: anArray
	^block valueWithArguments: arguments, anArray

value: anObject
	^block valueWithArguments: (arguments copyWith: anObject)

... other required value:... methods

numArgs
	^block numArgs - arguments size

This is all that is needed to make the example work.

In fact, we could do without the CurriedBlock class. Our implementation of valueWithArguments: could answer a real block by doing something like:

valueWithArguments: anArray
	...
		ifTrue: [anArray size < self numArgs
			ifTrue: [^self curriedWithArguments: anArray]
	...

curriedWithArguments: anArray
	| arity |
	arity := self numArgs - anArray size.
	arity = 1 ifTrue:
		[^[:arg | self valueWithArguments: (anArray copyWith: arg)]].
	arity = 2 ifTrue:
	...

However, since Smalltalk blocks don’t support the equivalent of Lisp “rest” arguments, the curriedWithArguments: method would have to explicitly enumerate and handle all arities we realistically expect to use in block invocations. Using CurriedBlock instead produces a nicer example.

To be continued.