Parsing words
If you’ve been paying close attention so far, you realize I’ve lied to you. I
said each word acts on the stack in order, but there a few words like [
, ]
,
:
and ;
that don’t seem to follow this rule.
These are parsing words and they behave differently from simpler words like
5
, [1,b]
or drop
. We will cover these in more detail when we talk about
metaprogramming, but for now it is enough to know that parsing words are special.
They are not defined using the :
word, but with the word SYNTAX:
instead.
When a parsing words is encountered, it can interact with the parser using a
well-defined API to influence how successive words are parsed. For instance :
asks for the next tokens from the parsers until ;
is found and tries to
compile that stream of tokens into a word definition.
A common use of parsing words is to define literals. For instance {
is a
parsing word that starts an array definition and is terminated by }
.
Everything in-between is part of the array. An example of array that we have
seen before is { 1 2 3 4 5 6 7 8 9 10 }
.
There are also literals for hashmaps,
H{ { "Perl" "Larry Wall" } { "Factor" "Slava Pestov" } { "Scala" "Martin Odersky" } }
,
and byte arrays, B{ 1 14 18 23 }
.
Other uses of parsing word include the module system, the object-oriented
features of Factor, enums, memoized functions, privacy modifiers and more. In
theory, even SYNTAX:
can be defined in terms of itself, although of course
the system has to be bootstrapped somehow.
Stack shuffling
Now that you know the basics of Factor, you may want to start assembling more complex words. This may sometimes require you to use variables that are not on top of the stack, or to use variables more than once. There are a few words that can be used to help with this. I mention them now since you need to be aware of them, but I warn you that using too many of these words to manipulate the stack will cause your code to quickly become harder to read and write. Stack shuffling requires mentally simulating moving values on a stack, which is not a natural way to program. In the next section we’ll see a much more effective way to handle most needs.
Here is a list of the most common shuffling words together with their effect on the stack. Try them in the listener to get a feel for how they manipulate the stack, and explore the online help to find out more.
1 |
|
Combinators
Although the words mentioned in the previous paragraph are occasionally useful
(especially the simpler dup
, drop
and swap
), you should write code that
does as little stack shuffling as possible. This requires practice getting the
function arguments in the right order. Nevertheless, there are certain common
patterns of needed stack manipulation that are better abstracted away into their
own words.
Suppose we want to define a word to determine whether a given number n
is
prime. A simple algorithm is to test each number from 2
to the square root of
n
and see whether it is a divisor of n
. In this case, n
is used in two
places: as an upper bound for the sequence, and as the number to test for
divisibility.
The word bi
applies two different quotations to the single element on the
stack above them, and this is precisely what we need. For instance
5 [ 2 * ] [ 3 + ] bi
yields
bi
applies the quotation [ 2 * ]
to the value 5
and then the quotation
[ 3 + ]
to the value 5
leaving us with 10
and then 8
on the stack.
Without bi
, we would have to first dup
5
, then multiply, and then swap
the result of the multiplication with the second 5
, so we could do the addition
You can see that bi
replaces a common pattern of dup
, then calculate,
then swap
and calculate again.
To continue our prime example, we need a way to make a range starting from 2
.
We can define our own word for this [2,b]
, using the [a,b]
range word we
discussed earlier
1 |
|
What’s up with that inline
word? This is one of the modifiers we can use after
defining a word, another one being recursive
. This will allow us to have the
definition of a short word inlined wherever it is used, rather than incurring
a function call.
Try our new [2,b]
word and see that it works
Using [2,b]
to produce the range of numbers from 2
to the square root of a
number n
that is already on the stack is easy: sqrt floor [2,b]
(technically
floor
isn’t necessary here, as [a,b]
works for non-integer bounds). Let’s
try that out
Now, we need a word to test for divisibility. A quick search in the online help
shows that divisor?
is the word we want. It will help to have the arguments
for testing divisibility in the other direction, so we define multiple?
1 |
|
Both of these return t
1 |
|
If we’re going to use bi
in our prime
definition, as we implied above, we
need a second quotation. Our second quotation needs to test for a value in the
range being a divisor of n
- in other words we need to partially apply the
word multiple?
. This can be done with the word curry
, like this:
[ multiple? ] curry
.
Finally, once we have the range of potential divisors and the test function on
the stack, we can test whether any element satisfied divisibility with any?
and then negate that answer with not
. Our full definition of prime
looks like
1 |
|
Altough the definition of prime
is complicated, the stack shuffling is minimal
and is only used in the small helper functions, which are simpler to reason
about than prime?
.
Notice that prime?
uses two levels of quotation nesting since bi
operates on
two quotations, and our second quotation contains the word curry
, which also
operates on a quotation. In general, Factor words tend to be rather shallow,
using one level of nesting for each higher-order function, unlike Lisps or more
generally languages based on the lambda calculus, which use one level of nesting
for each function, higher-order or not.
Many more combinators exists other than bi
(and its relative tri
), and you
should become acquainted at least with bi
, tri
, bi*
and bi@
by reading
about them in the online help and trying them out in the listener.
In the next post, we will see how to organize our words in modules and add unit tests.
Until then!