Newspeak will be open sourced under Apache 2.0 license. Note the future tense; there is much work still to be done. In the meantime get the scoop here.
I’m presenting what I’ve been busy with for the past year and a half (well, almost) at Smalltalk Solutions in June. The title of the talk is “Interfaces without Tools”.
Programming environments are commonly built as sets of tools, where a tool has the shape of a pre-composed window displaying and manipulating a set of domain objects. This talk takes a critical look at that approach and presents Hopscotch, an application framework and development environment based on different concepts. Hopscotch is the IDE and the application framework of Newspeak, a new language and development platform inspired by Smalltalk and Self.
The talk is on the first day of the conference, June 19 at 1:30 pm.
About a month ago Gilad gave a talk at the University of Potsdam about our work. The video of the talk is now available online. I have been quiet for quite a while busy with our UI framework and environment (Brazil and Hopscotch), so this is a chance to see some of it in action. I hope to eventually find more time to tell about it here, but for now feel free to leave a comment with any questions and I’ll do my best to answer them.
This is a follow-up to the post about message chains from a few days ago. Like any syntax-related issue, the choice of the operator is one of those delightful things that encourage a lively exchange of opinions by virtue of being easy to have an opinion about. I ended the post half-seriously considering “:)”, but here is an interesting thought experiment.
Suppose we abolish a comma as a binary message. A + could just as well work as a generic “join these two things” message a comma usually is. Instead, suppose that the comma becomes the new chain operator. Here is what it would look like.
something thingsAt: aKey, includes: aThing, ifTrue: [...] ifFalse: [...]
This fits nicely with the original Smalltalk idea of using natural language punctuation for message control, and continues the line-up of a period and a semicolon by being the weakest message separator of them all.
On the squeak-dev mailing list, Fabio Filasieno raised the issue of “pipes” in Smalltalk, which generated quite a discussion. Since I’ve provided an implementation, I’m also writing down my thoughts on the subject.
In a nutshell, this is what the idea is about. While the “classic” Smalltalk syntax allows us to write chains of unary messages such as
x foo bar
without explicit parentheses, this is not the case with keyword messages.
(x foo: 1) bar: 2
is a chain similar to the one above, yet its syntactic form is somewhat further removed from the idea of a “smooth” linear progression of operations, especially so when there are more than two consecutive message sends:
(((x foo: 1) bar: 2) baz: 3) zork: 4
Imagine now that we have an operator which I will for now designate as “:>” allowing us to rewrite the above as
x foo: 1 :> bar: 2 :> baz: 3 :> zork: 4
In other words, it makes the result of the preceding expression the receiver of the following message send, so that in fact
x foo :> bar :> baz
x foo bar baz
are two syntactic forms of the same chain of messages.
One obvious example of the use of this operator is rewriting something like
(aCollection select: [:some | some foo]) collect: [:each | each bar]
aCollection select: [:some | some foo] :> collect: [:each | each bar]
An even more likely case, which I don’t believe was mentioned on squeak-dev, are complex conditions:
something thingsAt: aKey :> includes: aThing :> ifTrue: [...] ifFalse [...]
Writing the “standard” form often involves going back to the beginning of the expression to add parentheses once you’ve written enough of the expression to realize what it will look like. The chain operator exactly follows the “do this, then this, then that” structure of such expressions and does not require going back.
The change set linked at the beginning of the post implements the operator for Squeak. The inevitable question now is, Is This A Good Thing?
Clearly it adds nothing new to the language—chaining messages was always possible—only a new ability to express something in a different syntactic form. Such things generally invite (justified) suspicion and tossing around the various “less is more” quotes. Without questioning the value of minimalism and structured approach, some prior “sugary” additions such as tuples (brace Array constructors) or ifNil:/ifNotNil: could be (and have been) criticized on the same grounds. Any Array constructor discussion usually involves someone pointing out how the ease of array creation is overrated because redesigning the code to replace arrays with structured objects or store them in self-documenting variables only improves the code. This is true, of course—and yet in some contexts, one of which are DSL-like Smalltalk expressions, syntactically lightweight array creation turns out quite useful. Perhaps there is a similar “killer use” out there waiting for lightweight message chains to become available. Or perhaps not. My take is that this operator might be a good thing, and only practice can show if there are use cases out there waiting for it.
As for less is more—perhaps the power of Smalltalk is not as much in having a small untouchable core (and if anything, Object and UndefinedObject protocols in Squeak live to prove that more is more) as in having a core small and malleable enough to support this sort of extensions and experimentation. I’d rather say Squeak core is still not small and malleable enough if something like this can only be done by modifying the compiler rather than tweaking the meta-level.
Having said this, I should comment on Bert Freudenberg’s elegant #asPipe implementation. There are two reasons why I consider it very neat but still not enough of the real thing. One is the look of the code. The value proposition of pipes is improving readability. Getting rid of parentheses is only one step towards that, the other is having the links of the chain stand out enough for the reader to easily see where the distinct steps are. This is partly an issue of formatting, but proper “graphic design” of a visually distinct special operator is still unbeatable.
In fact, in terms of visibility I think the operator even better than “:>” (which already looks quite like a smiley) is “:)”:
something thingsAt: aKey :) includes: aThing :) ifTrue: [...] ifFalse [...]
The other issue are minor anomalies that are always hard to hide with DNU tricks because of the need for the DNU handler to have some rudimentary behavior of its own, and because some things are not handled by messages. For example:
2 asPipe between: 1 and: 3; == true
evaluates to false.
2 asPipe between: 1 and: 3; ifTrue: [#foo]
fails to compile, ruling out the use of pipes to simplify conditions.
foo := [#nil]. bar := [#notNil]. nil asPipe value; ifNil: foo ifNotNil: bar
does compile because the branches are not literal blocks, but then the result of the “pipe” is #notNil.
Intuitively, binary selectors are obvious—you string together a few funky characters and you have yourself a binary selector. And binary selectors can useful for making some things easier to read. At least that’s the theory, and reading the ANSI standard one might assume everything is exactly that clear. The practice in Smalltalk-80 descendants is different.
To begin with, Smalltalk-80 only allowed two characters in a binary selector, ruling out beauties such as <=>, <+> or >>=. Fortunately, the ANSI standard has dropped this limit, and Squeak also allows selectors of more than two characters. VisualWorks, however, still sticks to the old ways.
Then there is the wrench negative number literals throw in the works. Intuitively,
3 –– 4 is supposed to send the #– message to 3—and probably cause an MNU. Indeed, that is what would happen in an ANSI-compliant Smalltalk. Instead, Squeak will happily parse and evaluate the above to 7, while VisualWorks will throw a compiler error saying that an argument is expected following the first minus.
Here is why that is. A negative number literal in Smalltalk-80 is simply a minus token followed by a number token. Squeak still sticks to that interpretation, and so it sees no difference between
"- 4". Both are valid forms of writing a negative four. ANSI and VisualWorks have since moved to a less quirky behavior treating the minus as part of the literal number token. Interestingly, the cute syntax diagrams at the end of the Blue Book suggest Smalltalk-80 does the same, but evaluating
"- 4" shows that it doesn’t.
But wait, before the second minus in
"3 –– 4" got attached by Squeak to the number that follows, why did it get detached from the first one to begin with?
And here lurks another feature. This time it is in fact documented by the cute syntax diagrams. A binary selector is made of special characters, but a minus is a special case of a special character. It’s only allowed once, and only as the first character of a binary selector. So,
-+ are legal,
-- are not. Clearly, the point of this feature was to have “3+-4” or “3––4” parse “sensibly” as arithmetic expressions rather than exotic message sends.
This also explains why VisualWorks treats “3 –– 4” as an error. It switched from the classical treatment of a minus before a literal to a more modern (or at least compliant with the Blue Book syntax diagrams) treatment, so it doesn’t see the second minus as belonging to the 4 that follows—but it still refuses to accept “––” as a valid binary selector, hence the error. Strange, that.
And that’s not the end of it. The problem with a minus is that the grammar uses it for two purposes—as part of a number literal and also as part of a selector. Another character just like that is a vertical bar. “foo || bar” doesn’t work in Squeak and VisualWorks. And it never did, going as far back as Smalltalk-80 again.
This time the scanner doesn’t even try to be nice. A vertical bar always parses as a separate token, so anything like || or |> is treated by the scanner as a sequence of two tokens and then rejected by the parser as invalid syntax. So in fact the only legal binary selector with a vertical bar is the vertical bar itself. (This again is contrary to what the Blue Book claims Smalltalk-80 recognizes as a binary selector).
All that was perhaps way too much nit-picking as far as any practical programming is concerned, but I find two things interesting about this situation.
One is that the Blue Book misrepresents the grammar of the actual implementation, and not once but twice. The version in the book is probably more of what the authors wanted the grammar to be, rather than an accurate documentation of the ad-hoc implementation in the image. Maybe this was why one of these quirks later got fixed in the ParcPlace branch of the family.
The second interesting point is why is this such a minor issue? How many people in the past 25 years have actually wanted to write “foo || bar” or “foo <- bar” and discovered they didn’t work? Do Smalltalkers use binary messages creatively—that is, outside (semi-)traditional math, concatenation and point creation? And if, as it seems, they do not—why?
This is about a fix of a conceptual bug in the current version of Squeak, though it’s the detailed analysis why this was a bug that I think is worth a blog post.
A while ago when
ifNotNil: were not standard in the commercial Smalltalks of the day, there was an occasional argument on c.l.s now and then whether they were a good thing or superfluous sugar not providing any new functionality. The typical case against
ifNil: is something like
foo ifNil: [self doSomething]
which can trivially be rewritten in “classic” Smalltalk. The case that is less trivial, though, is
^self foo ifNil: [self defaultFoo]
It’s important that the value being tested is computed and not just fetched from a variable. For a computed value, we cannot in the general case reduce the above to
^self foo isNil ifTrue: [self defaultFoo] ifFalse: [self foo]
because this would evaluate
self foo twice—something we would want to avoid if
self foo had side effects. Avoiding double evaluation would call for a variable to hold onto the result of
| foo | foo := self foo. foo isNil ifTrue: [self defaultFoo] ifFalse: [foo]
which is significantly more verbose than the ifNil: alternative. So, while ifNil: in this case can still be reduced to “classic” Smalltalk, doing so takes us down an abstraction level—from a single message doing exactly what it says to messing around with variables, tests and branches. While in some languages such exposed plumbing is a fact of life, in Smalltalk we like to hide it when we don’t need to deal with it directly.
Now, looking at
ifNotNil:, it’s important to note that its typical use case is different. In fact, this is what’s interesting about ifNil: and ifNotNil: in general—they are asymmetrical. While
ifNil: allows us to provide a useful value in the cases when the “main” branch of the computation returns nil, it’s unlikely that we’d ever use
ifNotNil: in a mirrored pattern as
^self foo ifNotNil: [nil]
This shows that the cause of the asymmetry is the obvious fact that typically nil is used as a token indicating the absence of a value. A non-nil object is interesting in its own right, while nil isn’t.
So, ifNotNil: is primarily useful not as a value-producing expression, but rather as a control statement that triggers computation when another expression produces a useful result, in the simplest case going as something like
foo ifNotNil: [self doSomethingWith: foo]
This use of ifNotNil: could again be reduced to
isNil ifFalse:. The case when the receiver of ifNotNil: is computed is more interesting because of the same problem of avoiding multiple evaluation, the obvious solution to which would again make the code much bulkier:
| foo | foo := self computeFoo. foo ifNotNil: [self doSomethingWith: foo]
The way to hide the plumbing here is to have ifNotNil: accept a one-argument block and feed the receiver into it, allowing us to fold the above back into a single expression
self computeFoo ifNotNil: [:foo | self doSomethingWith: foo]
This illustrates another asymmetry of ifNotNil: and ifNil:—while ifNil: block needs no arguments because nil is not an “interesting” object, it’s often helpful for ifNotNil: to take the non-nil receiver as argument.
A number of years ago Squeak had
ifNotNil: implemented exactly this way. The former would take a niladic block as the argument and the latter (as far as I can remember) would only accept a monadic one. When a few years ago Eliot and I were adding the two to VisualWorks we kept almost the same pattern, extending the ifNotNil: case to also accept a niladic block.
In Squeak the two messages have since then been reimplemented as special messages, expanded by the compiler into the equivalent
ifTrue:/ifFalse: forms. Inexplicably, in the process
ifNotNil: was changed to only accept a niladic block—precisely the case that is less valuable! Also interestingly, the fallback implementation in ProtoObject still allows a monadic block in a “pure”
ifNotNil: but not as the
ifNotNil argument of
ifNotNil:ifNil:! Their classification under the ‘testing’ protocol is a minor nit in comparison.
And now for the fix. It is available as a zip file MonadicIfNotNil.zip with two change sets. One modifies the handling of ifNotNil: and related messages to allow monadic blocks. The second contains the tests and should obviously be filed in after the compiler changes in the first set.
The change was fairly straightforward. Most of the work was dancing around the original error checking code that assumes only niladic blocks are legal as arguments of macroexpanded messages. The expansion simply promotes the block argument into a method temp, expanding
self foo ifNotNil: [:arg | ...]
(arg := self foo) ifNotNil: [...]
In VisualWorks such treatment would count as incorrect, but this promotion of a block argument into a method temp is in fact the classic Smalltalk-80 trick still surviving in Squeak in the similar treatment of the
To wrap up for now the thread of functional-like features, here is a very simple implementation of sections that doesn’t rely on currying:
Object>>~ aSymbol aSymbol numArgs = 1 ifFalse: [self error: 'Invalid selector']. ^[:arg | self perform: aSymbol with: arg] Symbol>>~ anObject self numArgs = 1 ifFalse: [self error: 'Invalid selector']. ^[:arg | arg perform: self with: anObject]
First of all, note the ambiguity of the section operator that we didn’t discuss in the previous post. What does
#, ~ #,
mean? Is it a left section prepending a comma to the argument, or a right section appending it?
Considering the implementation, clearly it’s the latter, but this particular choice of behavior is only a side effect of the implementation. In principle though, the intended meaning is undecidable. There is no difference in Smalltalk between a symbol and a selector—so there is no way to tell which comma is the operator and which is the section argument. This creates two potential gotchas.
One, it is impossible to create a left section with a Symbol as the fixed argument. We can’t make a block to check whether the symbol
#foobar: includes a given character by writing
#foobar: ~ #includes:
The second, related, problem is that the behavior of the section operator can vary if the first argument is a variable:
foo ~ #includes:
creates a left section if the current value of
foo is anything but a Symbol, a right section if it is a one-argument selector, or fails otherwise.
In practice, the ambiguity can be avoided by “downgrading” the first argument to a string in situations where it might be a symbol we don’t want to be mistaken for a selector.
A few people asked for some examples how some of the things I talked about could be useful. As I wrote before, I don’t think currying of blocks can be very useful in practical Smalltalk, simply because the Smalltalk school of expression is different.
I have a slightly different opinion about sections. I use them (the simple currying-independent implementation above) in the framework I’m working on. I hope to write more about that one day, but for now I’ll adapt the example to my past project, VisualWorks Announcements.
Remember that it’s possible to subscribe for announcements by using either a block
anObject when: SomethingHappened do: [:ann | ...do something with ann...]
or a receiver-selector pair
anObject when: SomethingHappened send: #processAnnouncement: to: self
Given sections, we could handle the receiver-selector case using the same message we use for blocks:
anObject when: SomethingHappened do: self ~ #processAnnouncement:
Of course, we could also simply drop the receiver-selector option and require explicitly writing
[:ann | self processAnnouncement: ann]
but arguably the block isn’t quite as succinct as the equivalent section. So, what does this buy us?
Of course, we simplify the API. There is now only one subscription message instead of two. What was a separate case becomes part of the same mechanism.
The implementation is also simpler. In the original we needed to remember the receiver and the selector to send to it. (The block case is handled by remembering the block as the receiver and setting the selector to
#value:—granted, I’m simplifying the Announcements picture a little, but as I said my example is only an adaptation). Once we throw out separate support for the receiver-selector case, all we do is remember the block and evaluate it when needed.
There even is a potential performance improvement. The receiver-selector implementation always sends
#perform:with: to the receiver to deliver an announcement, even for those subscriptions that use a block. The simplified implementation instead evaluates the block to set things in motion, and
#perform:with: only enters the picture together with sections in those cases where we specifically want a receiver-selector option.
Today we are bringing together selectors as blocks from the last post and currying discussed before, to produce something pretty neat: sections.
We begin with an example. Using
asBlock from the last post, we can write
to mean a block comparing two arguments. If we curry it with a number, say 42:
#> asBlock curried value: 42
we get a one-argument block which, when invoked, tells whether 42 is greater than the argument. We could use it as a regular one-argument block, for example, with
#(1 20 50 43 11) select: (#> asBlock curried value: 42)
Of course, this by itself isn’t particularly exciting.
[:each | 42 > each] would have done the same, and without excessive mental acrobatics.
Without being too concerned about the form for now, let’s consider what we have just done by writing that expression. We took an operator (a binary selector) and produced a function (a block) which is an application of that selector to 42 on the left and the function argument on the right. In Haskell such a construct is called a left section of an operator. Similarly, a block applying #> with 42 on the right and the argument on the left would be a right section. This is an interesting concept—apart from the unwieldy shape it had in our code. Let’s fix that.
We add two methods to the system, one to the Symbol class, the other to Object.
Object>>~ mustBeSymbol | block | block := mustBeSymbol asBlock. block numArgs ~= 2 ifTrue: [self error: 'invalid selector']. ^block curried value: self Symbol>>~ anObject | block | block := self asBlock. block numArgs ~= 2 ifTrue: [self error: 'invalid selector']. ^[:a :b | block value: b value: a] curried value: anObject
~ as a section operator. Sent to a symbol, it produces a right section, sent to anything else with a symbol as the argument—a left section. We can now rewrite the example as
#(1 20 50 43 11) select: #> ~ 42
or to select the opposite
#(1 20 50 43 11) select: 42 ~ #>
This is more interesting than just currying (and can if fact be rewritten to not rely on currying). It will work for any binary or one-argument keyword message:
Transcript ~ #show:
is a block that writes its argument to the transcript.
#print: ~ foo
produces a block writing the printString of whatever was in the variable foo to the argument, which must be a stream-like object.
#, ~ ', hello!'
is a block that appends ‘, hello!’ to the argument, and
'Hello ' ~ #,
prepends ‘Hello ‘ to it. In general, we can think of a tilda as “stick the selector and the object together, and the object missing for this to be a complete message send will be provided when the block we create is called”.
In the previous two posts we implemented currying in Smalltalk, and in no fewer than two different ways. As Travis asked in a comment, what good is that other than being cool?
First of all, coolness is a virtue in itself. It says a lot about the Smalltalk system and the principles of its design that something like this can be added to it in a matter of minutes by changing a method or two (or not changing anything at all).
Coolness aside, I don’t expect currying to be as useful in Smalltalk as it is elsewhere. The means of expression Smalltalk and Smalltalkers rely on are different from those of functional languages. Higher-order functions are not nearly as pervasive. Blocks in Smalltalk have always been subservient to objects, so much so that they are not real closures in a few implementations (and some Smalltalkers even proposed to get rid of them). So from a pragmatic viewpoint Smalltalk blocks are good enough as they are. But I’m not interested in being pragmatic here. This is an exercise in looking at familar things in an unfamiliar perspective, or mixing a new ingredient into the standard mix just to see what happens. Smalltalk is a pretty good chemistry set.
Today we continue our exploration by adding this method to the Symbol class:
asBlock | arity | arity := self numArgs + 1. arity = 1 ifTrue: [^[:r | r perform: self]]. arity = 2 ifTrue: [^[:r :a | r perform: self with: a]]. arity = 3 ifTrue: [^[:r :a :b | r perform: self with: a with: b]]. "... and to keep the example simple..." self error: 'too many arguments'
As advertised by the selector, it turns a symbol into a block. The block sends the symbol as a message to the first argument, passing the remaining arguments as the arguments of the message. Given this method, we can write:
#('Hello' 'world') collect: #size asBlock
to collect the sizes of the collection elements.
That’s right, this reminds the (in)famous
Symbol>>value: hack, however it avoids the problems the hack has.
Consider the meaning of
numArgs. This message can be sent to both blocks and selectors to determine how many arguments they take. Symbol>>value: pretends that symbols are the same as blocks. Unfortunately, considered as a selector
#isLower has zero arguments, while considered as a block it has one. The same holds for any other selector: a Symbol’s
numArgs doesn’t count the receiver as an argument, while the receiver does become an argument of the block the selector pretends it is. (The reason
arity in the code above is
numArgs + 1).
In practice this means that if we pass a Symbol such as
#isLower to code that explicitly checks the arity of a block it receives by doing something like
aBlock numArgs = 1 ifFalse: [self error: 'Invalid block']. ^aBlock value: foo
the code will reject it, even though
#isLower was supposed to pass for a one-argument block.
Symbol>>value: does nothing of value (sorry) for selectors of more than zero arguments. In contrast,
asBlock is a uniform mechanism to cross over from the domain of selectors to the domain of blocks equivalent to those selectors sent to the block’s first argument. In particular, binary and one-argument keyword selectors can mix well with
#(1 2 3) with: #(4 5 6) collect: #@ asBlock #(20 20 42 16) inject: 0 into: #max: asBlock (1 to: 6) fold: #* asBlock "6 factorial" #('Hello ' 'world' '!') fold: #, asBlock