The goal for these notes is to introduce the Reader Monad, and present two linguistic applications: binding and intensionality. Along the way, we'll continue to think through issues related to order, and a related notion of flow of information.
At this point, we've seen monads in general, and three examples of monads: the Identity monad (invisible boxes), the Option/Maybe monad (option types), and the List monad.
We've also seen an application of the Option/Maybe monad to safe division.
The starting point was to allow the division function to return an int
option
instead of an int
. If we divide 6
by 2
, we get the answer Some
3
. But if we divide 6
by 0
, we get the answer None
.
The next step was to adjust the other arithmetic functions to teach
them what to do if they received None
instead of a boxed integer.
This meant changing the type of their input from int
s to int option
s.
But we didn't need to do this piecemeal; rather, we "lift"ed the
ordinary arithmetic operations into the monad using the various tools
provided by the monad.
We'll go over this lifting operation in detail in the next section.
Tracing the effect of safe_div
on a larger computation
So let's see how this works in terms of a specific computation.
(+ 1 (* (/ 6 2) 4)) in tree format: ___________ | | _|__ ___|___ | | | | + 1 __|___ 4 | | * __|___ | | _|__ 2 | | / 6
No matter what order we evaluate it in, this computation
should eventually reduce to 13
. Given a specific reduction strategy,
we can watch the order in which the computation proceeds. Following
on the evaluator developed during the previous homework, let's
adopt the following reduction strategy:
In order to reduce an expression of the form (head arg), do the following in order:
- Reduce
head
tohead'
- Reduce
arg
toarg'
.- If
(head' arg')
is a redex, reduce it.
There are many details left unspecified here, but this will be enough for today. The order in which the computation unfolds will be
- Reduce head
(+ 1)
to itself- Reduce arg
((* ((/ 6) 2)) 4)
- Reduce head
(* ((/ 6) 2))
- Reduce head
*
to itself- Reduce arg
((/ 6) 2)
- Reduce head
(/ 6)
to itself- Reduce arg
2
to itself- Reduce
((/ 6) 2)
to3
- Reduce
(* 3)
to itself- Reduce arg
4
to itself- Reduce
((* 3) 4)
to12
- Reduce
((+ 1) 12)
to13
This reduction pattern follows the structure of the original expression exactly, at each node moving first to the left branch, processing the left branch, then moving to the right branch, and finally processing the results of the two subcomputation. (This is called a depth-first postorder traversal of the tree.)
[diagram with arrows traversing the tree]
It will be helpful to see how the types change as we make adjustments.
type num = int
type contents = Num of num | Op2 of (num -> num -> num)
type tree = Leaf of contents | Branch of tree * tree
Never mind that these types will allow us to construct silly
arithmetric trees such as + *
or 2 3
. Note that during the
reduction sequence, the result of reduction was in every case a
well-formed subtree. So the process of reduction could be animated by
replacing subtrees with the result of reduction on that subtree, until
the entire tree is replaced by a single integer (namely, 13
).
Now we replace the number 2
with 0
:
___________ | | _|__ ___|___ | | | | + 1 __|___ 4 | | * __|___ | | _|__ 0 | | / 6
When we reduce, we get quite a ways into the computation before things break down:
- Reduce head
(+ 1)
to itself- Reduce arg
((* ((/ 6) 0)) 4)
- Reduce head
(* ((/ 6) 0))
- Reduce head
*
to itself- Reduce arg
((/ 6) 0)
- Reduce head
(/ 6)
to itself- Reduce arg
0
to itself- Reduce
((/ 6) 0)
to ACKKKK
This is where we replace /
with safe_div
. safe_div
returns not an int
, but an int option
. If the division goes
through, the result is Some n
, where n
is the integer result.
But if the division attempts to divide by zero, the result is None
.
We could try changing the type of the arithmetic operators from int
-> int -> int
to int -> int -> int option
; but since we now have to
anticipate the possibility that any argument might involve division by
zero inside of it, it would be better to prepare for the possibility
that any subcomputation might return None
here. So our operators should have
the type int option -> int option -> int option
. Let's bring that about
by just changing the type num
from int
to int option
, leaving everying else the same:
type num = int option
type contents = Num of num | Op2 of (num -> num -> num)
type tree = Leaf of contents | Branch of tree * tree
The only difference is that instead of defining our numbers to be
simple integers, they are now int option
s; and so Op
is an operator
over int option
s.
At this point, we bring in the monadic machinery. In particular, here
is the ⇧
and the map2
function from the notes on safe division:
⇧ (x : 'a) = Some x
map2 (f : 'a -> 'b -> 'c) (xx : 'a option) (yy : 'b option) =
match xx with
| None -> None
| Some x ->
(match yy with
| None -> None
| Some y -> Some (f x y))
Then we lift the entire computation into the monad by applying ⇧
to
the integers, and by applying map2
to the operators. Only, we will replace /
with safe_div
, defined as follows:
safe_div (xx : 'a option) (yy : 'b option) =
match xx with
| None -> None
| Some x ->
(match yy with
| None -> None
| Some 0 -> None
| Some y -> Some ((/) x y))
___________________ | | ___|____ ____|_____ | | | | map2 + ⇧1 _____|_____ ⇧4 | | map2 * ____|____ | | ___|____ ⇧0 | | safe_div ⇧6
With these adjustments, the faulty computation now completes smoothly:
- Reduce head
((map2 +) ⇧1)
- Reduce arg
(((map2 *) ((safe_div ⇧6) ⇧0)) ⇧4)
- Reduce head
((map2 *) ((safe_div ⇧6) ⇧0))
- Reduce head
(map2 *)
- Reduce arg
((safe_div ⇧6) ⇧0)
- Reduce head
(safe_div ⇧6)
- Reduce arg
⇧0
- Reduce
((safe_div ⇧6) ⇧0)
toNone
- Reduce
((map2 *) None)
- Reduce arg
⇧4
- Reduce
(((map2 *) None) ⇧4)
toNone
- Reduce
(((map2 +) ⇧1) None)
toNone
As soon as we try to divide by 0
, safe_div
returns None
.
Thanks to the details of map2
, the fact that None
has been returned
by one of the arguments of a map2
-ed operator guarantees that the
map2
-ed operator will pass on the None
as its result. So the
result of each enclosing computation will be None
, up to the root
of the tree.
It is unfortunate that we need to continue the computation after
encountering our first None
. We know immediately at the result of
the entire computation will be None
, yet we continue to compute
subresults and combinations. It would be more efficient to simply
jump to the top as soon as None
is encoutered. Let's call that
strategy Abort. We'll arrive at an Abort
operator later in the semester.
So at this point, we can see how the Option/Maybe monad provides
plumbing that allows subcomputations to send information from one part
of the computation to another. In this case, the safe_div
function
can send the information that division by zero has been attempted
upwards to the rest of the computation. If you think of the plumbing
as threaded through the tree in depth-first, postorder traversal, then
safe_div
drops None
into the plumbing half way through the
computation, and that None
travels through the rest of the plumbing
till it comes out of the result faucet at the top of the tree.
Information flowing in the other direction: top to bottom
We can think of this application as facilitating information flow.
In the safe_div
example, a subcomputation created a message that
propagated upwards to the larger computation:
message: Division by zero occurred! ^ ___________ | | | | _|__ ___|___ | | | | | | + 1 __|___ 4 | | | | * __|___ -----| | | _|__ 0 | | / 6
(The message was implemented by None
.)
We might want to reverse the direction of information flow, making information available at the top of the computation available to the subcomputations:
[λn] ___________ | | | | _|__ ___|___ | | | | | | + 1 __|___ 4 | | | | * __|___ | | | | _|__ n <----| | | / 6
We've seen exactly this sort of configuration before: it's exactly what we have when a lambda binds a variable that occurs in a deeply embedded position. Whatever the value of the argument that the lambda form combines with, that is what will be substituted in for free occurrences of that variable within the body of the lambda.
Binding
So our next step is to add a (primitive) version of binding to our computation. We'll allow for just one binding dependency for now, and then generalize later.
Binding is independent of the safe division, so we'll return to a situation in which the Option/Maybe monad hasn't been added. One of the nice properties of programming with monads is that it is possible to add or subtract layers according to the needs of the moment. Since we need simplicity, we'll set the Option/Maybe monad aside for now.
So we'll go back to the point where the num type is simple int
, not
int option
s.
type num = int
And we won't be using the map2
or ⇧
from the Option/Maybe monad anymore.
As you might guess, the technique we'll use to arrive at binding will
be to use the Reader monad, defined here in terms of mid
/⇧
and mbind
/>>=
:
type α = int -> α ⇧x = \n. x xx >>= k = \n. k (xx n) n map f xx = \n. f (xx n) ff ¢ xx = \n. (ff n) (xx n) map2 f xx yy = map f xx ¢ yy = \n. f (xx n) (yy n)
A boxed type in this monad will be a function from an integer to an
value in the original type. The mid
/⇧
function lifts a value x
to
a function that expects to receive an integer, but throws away the
integer and returns x
instead (most values in the computation don't
depend on the input integer).
The mbind
/>>=
function in this monad takes a monadic value xx
, a function
k
taking non-monadic values into the monad, and returns a function
that expects an integer n
. It feeds n
to xx
, which delivers a
result in the orginal type, which is fed in turn to k
. k
returns
a monadic value, which upon being fed an integer, also delivers a result
in the orginal type.
The map
, ¢
, and map2
functions corresponding to this mbind
are given
above. They may look familiar --- we'll comment on this in a moment.
Lifing the computation into the monad, we have the following adjusted types:
type num = int -> int
That is, num
is once again replaced with the type of a boxed int
.
When we were dealing with the Option/Maybe monad, a boxed int
had type int
option
. In this monad, a boxed int has type int -> int
.
__________________ | | ___|____ ____|_____ | | | | map2 + ⇧1 ____|_____ ⇧4 | | map2 * ___|____ | | ___|____ x | | map2 / ⇧6
It remains only to decide how the variable n
will access the input value supplied
at the top of the tree. The input value is supposed to be the
value that gets used for the variable n
. Like every leaf in the tree
in argument position, the code we want in order to represent the
variable will have the type of a boxed int
, namely, int -> int
. So
we have the following:
var_n = fun (n : int) -> n
That is, variables in this system denote the identity function!
The result of evaluating this tree will be a boxed integer: a function
from any integer n
to (+ 1 (* (/ 6 n)) 4)
.
Take a look at the definition of the Reader monad again. The
mid
takes some object x
and returns \n. x
. In other words,
⇧x = Kx
, for our familiar combinator K, so ⇧ = K
. Likewise, the map
operation
is just function composition, and the mapply
/¢
operation is our friend the S combinator. map2 f xx yy
for the Reader monad is (f ○ xx) ¢ yy
or S (f ○ xx) yy
.
We've seen this before as a strategy for translating a binding
construct into a set of combinators. To remind you, here is a part of
the general scheme for translating a lambda abstract into Combinatory
Logic. The translation function [.]
translates a lambda term into a
term in Combinatory Logic:
[(MN)] = ([M] [N])
[\a.a] = I
[\a.M] = K[M] (assuming a not free in M)
[\a.(MN)] = S[\a.M][\a.N]
The reason we can make do with this subset of the full translation
scheme is that we're making the simplifying assumption that there is
at most a single lambda involved. So once again we have the identity
function I as the translation of the bound variable, K as the function
governing expressions that don't contain an instance of the bound
variable, S as an operation that manages ¢
, that is, the applicative combination of complex expressions.
Jacobson's Variable Free Semantics as a Reader Monad
We've designed the presentation above to make it as easy as possible to show that Jacobson's Variable Free Semantics (e.g., Jacobson 1999, Towards a Variable-Free Semantics) implements a Reader monad.
More specifically, it will turn out that Jacobson's geach combinator
g is exactly our map
operator, and her binding combinator z is
exactly our mbind
(though with the arguments reversed).
Jacobson's system contains two main combinators, g and z. She calls g the Geach rule, and z performs binding. Here is a typical computation. This implementation is based closely on email from Simon Charlow, with beta reduction as performed by the on-line evaluator:
; Jacobson's analysis of "Everyone_i thinks he_i left" let g = \f xx. \n. f (xx n) in ; or `f ○ xx` let z = \k xx. \n. k (xx n) n in ; or `S (flip k) xx`, or `Reader.(>>=) xx k` let he = \n. n in let everyone = \P. FORALL i (P i) in everyone (z thinks (g left he)) ~~> FORALL i (thinks (left i) i)
Two things to notice: First, pronouns once again denote identity functions, just as we saw in the reader monad in the previous section.
Second, g plays the role of transmitting a binding dependency for an embedded constituent to a containing constituent.
The basic recipe in Jacobson's system is that you transmit the dependence of a pronoun upwards through the tree using g until just before you are about to combine with the binder, when you finish off with z. Here is an example with a longer chain of g's:
; "Everyone_i thinks that Bill said he_i left" everyone (z thinks (g (T bill) (g said (g left he)))) ; or `everyone (thinks =<< T bill ○ said ○ left ○ I)` ~~> FORALL i (thinks (said (left i) bill) i)
If you compare Jacobson's values for g and z to the functions in
the reader monad given above, you'll see that Jacobson's g
combinator is exactly our map
operator. Furthermore, Jacobson's z
combinator is identical to our Reader >>=
, except with the order of the
arguments reversed (i.e., (z k xx) == (xx >>= k)
).
(The T
combinator in the derivations above is given by T x <~~> \f. f x
;
it handles situations in which English word order reverses
the usual function/argument order. T
is what Curry and Steedman call this
combinator. Jacobson calls it "lift", but it shouldn't be confused with the
mid
and map
operations that lift values into the Reader monad we're focusing
on here. It does lift values into a different monad, that we'll consider in
a few weeks.)
In other words, Jacobson's variable-free semantics is essentially a Reader monad.
One of the peculiar aspects of Jacobson's system is that binding is accomplished not by applying z to the element that will (in some pre-theoretic sense) bind the pronoun, here, everyone, but rather by applying z instead to the predicate that will take everyone as an argument, here, thinks.
The Reader monad for intensionality
Now we'll look at using the Reader monad to do intensional function application.
In Shan (2001) Monads for natural language semantics, Ken shows that making expressions sensitive to the world of evaluation is conceptually the same thing as making use of the Reader monad. This technique was beautifully re-invented by Ben-Avi and Winter (2007) in their paper A modular approach to intensionality, though without explicitly using monads.
All of the code in the discussion below can be found here: intensionality-monad.ml. To run it, download the file, start OCaml, and say
# #use "intensionality-monad.ml";;
Note the extra #
attached to the directive use
.
First, the familiar linguistic problem:
Bill left.
Cam left.
Ann believes that Bill left.
Ann believes that Cam left.
We want an analysis on which the first three sentences can be true at the same time that the last sentence is false. If sentences denoted simple truth values or booleans, we have a problem: if the sentences Bill left and Cam left are both true, they denote the same object, and Ann's beliefs can't distinguish between them.
The traditional solution to the problem sketched above is to allow
sentences to denote a function from worlds to truth values, what
Montague called an intension. So if s
is the type of possible
worlds, we have the following situation:
Extensional types Intensional types Examples ------------------------------------------------------------------- S t s->t John left DP e s->e John VP e->t (s->e)->s->t left Vt e->e->t (s->e)->(s->e)->s->t saw Vs t->e->t (s->t)->(s->e)->s->t thought
This system is modeled on the way Montague arranged his grammar. There are significant simplifications compared to Montague: for instance, determiner phrases are thought of here as corresponding to individuals rather than to generalized quantifiers.
The main difference between the intensional types and the extensional
types is that in the intensional types, the arguments are functions
from worlds to extensions: intransitive verb phrases like left now
take so-called "individual concepts" as arguments (type s->e
) rather than plain
individuals (type e
), and attitude verbs like think now take
propositions (type s->t
) rather than truth values (type t
).
In addition, the result of each predicate is an intension.
This expresses the fact that the set of people who left in one world
may be different than the set of people who left in a different world.
Normally, the dependence of the extension of a predicate to the world of evaluation is hidden inside of an evaluation coordinate, or built into the the lexical meaning function, but we've made it explicit here in the way that the intensionality monad makes most natural.
The intensional types are more complicated than the extensional types. Wouldn't it be nice to make the complicated types available for expressions like attitude verbs that need to worry about intensions, and keep the rest of the grammar as extensional as possible? This desire is parallel to our earlier desire to limit the concern about division by zero to the division function, and let the other functions, like addition or multiplication, ignore division-by-zero problems as much as possible.
So here's what we do:
In OCaml, we'll use integers to model possible worlds. char
s (characters in the computational sense, i.e., letters like 'a'
and 'b'
, not Kaplanian characters) will model individuals, and OCaml bool
s will serve for truth values:
type s = int
type e = char
type t = bool
let ann = 'a'
let bill = 'b'
let cam = 'c'
let left1 (x : e) = true
let saw1 (y : e) (x : e) = x < y
left1 ann (* true *)
saw1 bill ann (* true *)
saw1 ann bill (* false *)
So here's our extensional system: everyone left, including Ann;
and Ann saw Bill (saw1 bill ann
), but Bill didn't see Ann. (Note that the word
order we're using is VOS, verb-object-subject.)
Now we add intensions. Because different people leave in different worlds, the meaning of leave must depend on the world in which it is being evaluated:
let left (x : e) (w : s) = match (x, w) with ('c', 2) -> false | _ -> true
left ann 1 (* true: Ann left in world 1 *)
left cam 2 (* false: Cam didn't leave in world 2 *)
This new definition says that everyone always left, except that in world 2, Cam didn't leave.
Note that although this general left is sensitive to world of
evaluation, it does not have the fully intensionalized type given in
the chart above, which was (s->e)->s->t
. This is because
left does not exploit the additional resolving power provided by
making the subject an individual concept. In semantics jargon, we say
that leave is extensional with respect to its first argument.
We will adopt the general strategy of defining predicates in a way that they take arguments of the lowest type that will allow us to make all the distinctions the predicate requires. When it comes time to combine this predicate with monadic arguments, we'll have to make use of various lifting predicates.
Likewise, although see depends on the world of evaluation, it is extensional in both of its syntactic arguments:
let saw (y : e) (x : e) (w : s) = (w < 2) && (x < y)
saw bill ann 1 (* true: Ann saw Bill in world 1 *)
saw bill ann 2 (* false: no one saw anyone in world 2 *)
This (again, partially) intensionalized version of see coincides
with the saw1
function we defined above for world 1; in world 2, no
one saw anyone.
Just to keep things straight, let's review the facts:
World 1: Everyone left. Ann saw Bill, Ann saw Cam, Bill saw Cam, no one else saw anyone. World 2: Ann left, Bill left, Cam didn't leave. No one saw anyone.
Now we are ready for the intensionality monad:
type 'a intension = s -> 'a let mid x = fun (w : s) -> x let (>>=) xx k = fun (w : s) -> let x = xx w in let yy = k x in yy w (* or just `fun w -> k (xx w) w` *)
Then the individual concept mid ann
is a rigid designator: a
constant function from worlds to individuals that returns 'a'
no
matter which world is used as an argument. This is a typical kind of
thing for a monadic mid
to do.
Then combining a predicate like left which is extensional in its
subject argument with an intensional subject like mid ann
is simply mbind
in action:
(mid ann >>= left) 1 (* true: Ann left in world 1 *)
(mid cam >>= left) 2 (* false: Cam didn't leave in world 2 *)
As usual, >>=
takes a monadic box containing Ann, extracts Ann, and
feeds her to the function left. In linguistic terms, we take the
individual concept mid ann
, apply it to the world of evaluation in
order to get hold of an individual ('a'
), then feed that individual
to the predicate left.
We can arrange for a transitive verb that is extensional in both of its arguments to take intensional arguments:
let map2' f xx yy = xx >>= (fun x -> yy >>= (fun y -> f x y))
This is almost the same map2
predicate we defined in order to allow
addition in our division monad example. The difference is that this
variant operates on verb meanings that take extensional arguments but
returns an intensional result. Thus the original map2
predicate
has mid (f x y)
where we have just f x y
here.
The use of >>=
here to combine left with an individual concept,
and the use of map2'
to combine see with two intensional
arguments closely parallels the two of Montague's meaning postulates
(in PTQ) that express the relationship between extensional verbs and
their uses in intensional contexts.
map2' saw (mid bill) (mid ann) 1 (* true *) map2' saw (mid bill) (mid ann) 2 (* false *)
Ann did see Bill in world 1, but Ann didn't see Bill in world 2.
Finally, we can define our intensional verb thinks. Think is
intensional with respect to its sentential complement (it takes complements of type s -> t
), though still extensional
with respect to its subject (type e
). (As Montague noticed, almost all verbs
in English are extensional with respect to their subject; a possible
exception is appear.)
let thinks (p : s->t) (x : e) (w : s) =
match (x, p 2) with ('a', false) -> false | _ -> p w
In every world, Ann fails to believe any proposition that is false in world 2. Apparently, she stably thinks we may be in world 2. Otherwise, everyone believes a proposition iff that proposition is true in the world of evaluation.
(mid ann >>= thinks (mid bill >>= left)) 1 (* true *)
So in world 1, Ann thinks that Bill left (because in worlds 1 and 2, Bill did leave).
But although Cam left in world 1:
(mid cam >>= left) 1 (* true *)
he didn't leave in world 2, so Ann doesn't in world 1 believe that Cam left:
(mid ann >>= thinks (mid cam >>= left)) 1 (* false *)
Small project: add intersective (red) and non-intersective
adjectives (good) to the fragment. The intersective adjectives
will be extensional with respect to the nominal they combine with
(using mbind
), and the non-intersective adjectives will take
intensional arguments.
Connections with variable binding: the rigid individual concepts generated by mid ann
and the like correspond to the numerical constants, that don't interact with the environment in any way, in the variable binding examples we considered earlier on the page. If we had any non-contingent predicates that were wholly insensitive to intensional effects, they would be modeled using map2
and would correspond to the operations like map2 (+)
in the earlier examples. As it is, our predicates lift and saw, though only sensitive to the extension of their arguments, nonetheless are sensitive to the world of evaluation for their bool
output. So these are somewhat akin, at the predicate level, to expressions like var_n
, at the singular term level, in the variable bindings examples. Our predicate thinks shows yet a further kind of interaction with the intensional structures we introduced: namely, its output can depend upon evaluating its complement relative to other possible worlds. We didn't discuss analogues of this in the variable binding case, but they exist: namely, they are expressions like let x = 2 in ...
and forall x ...
, that have the effect of supplying their arguments with an environment or assignment function that is shifted from the one they are themselves being evaluated with.
notes: cascade, general env