Recent years have seen the development of several foundational models for statically typed object-oriented programming. But despite their intuitive similarity, differences in the technical machinery used to formulate the various pioposals have made them difficult to compare. Using the typed lambda-calculus F~: as a common basis, we now offer a detailed comparison of four models: (1) a recursive-record encoding similar to the ones used by Cardelli [Car84], Reddy [Red88, KR94], Cook [Coo89, CHC90], and others; (2) Hofmann, Pierce, and Turner's existential encoding [PT94, HP95]; (3) Bruce's model based on existential and recursive types [Bru94]; and (4) Abadi, Cardelli, and Viswanathan's type-theoretic encoding [ACV96] of a calculus of primitive objects.

Over the last half decade, several authors have proposed foundational models for statically typed object-oriented programming. Although their motivating intuitions and technical machinery are all strongly related to typed lambda-catculi with subtyping [Car84, CW85, CG92], stylistic differences have made rigorous comparisons difficult. For example, some models are presented as translations from high-level object syntax into the syntax of a typed lambda-calculus; others map high-level syntax directly into a denotational model; still others focus on the object syntax as a primitive calculus in its own right.

In this paper we compare four of these models. The first of these, based on recursively-defined records, was introduced by Cardelli [Car84] and studied in many variations by Kamin and Reddy [Red88, KR94], Cook and Palsberg [CP89], and Mitchell [Mitg0]. In its untyped form, this model was used rather effectively for the denotational semantics of untyped object-oriented languages. In its typed form, it was used to encode individual object-oriented examples, but had difficulties with uniform interpretations of typed object-oriented languages. The most successful effort in this direction was carried out by Cook et al. [CHC90, CCH+89b].

4t6

In 1993, Pierce and Turner [PT94] introduced an encoding that relied only on a type system with existential types, but no recursive types. This led Hofmann and Pierce [HP95] to the first uniform, type-driven interpretation of objects in a functional calculus.

At the same conference in 1993, Bruce presented a paper [Bru94] on the semantics of a functional object-oriented language. This semantics was originally presented as a direct mapping into a denotational model of F~:, but has recently been reformulated as an object encoding that depends on both existential and recursive types.

Meanwhile, frustrated by the difficulties of encoding objects in lambda calculi, Abadi and Cardelli introduced a calculus of primitive objects [AC96]. Later, however, Abadi, Cardelli, and Viswanathan [ACV96] discovered a faithful encoding of that object calculus in terms of bounded existentials and recursive types. (The encoding is simplified in this paper to facilitate comparisons with the other encodings; in particular, method update is only considered in Section 4.8).

In this paper we examine these object encodings and compare their strengths and weaknesses. Points of comparison include the expressiveness of the objectoriented constructs that can be encoded, the simplicity of the encoding, the uniformity of the encoding (e.g., independence of the encoding from the types of the objects and methods), and the power and proof-theoretic tractability of the underlying type theory used by the encoding.

We concentrate, throughout, on the lambda-calcuius expressions that form the targets of the four encodings, eliding the associated "primitively objectoriented" source languages and the encoding functions mapping these into the lambda-calculus. (There are interesting comparisons to be made at this level too, but they are complicated by many inessential syntactic and stylistic differences between source languages.) Thus, the phrase "object encodings" in the title of the paper can be read as "object-oriented programming styles in typed lambdacalculus."

We also stop short-of considering classes and subclassing mechanisms. These are of course supported--in interestingly different ways--by all four encodings, but a detailed comparison falls outside the scope of this study.

Chapter 18 of [AC96] describes and compares several object encodings with respect to the object-oriented constructions that they can express and the properties that they enjoy. A main difference of approach in this paper is in the use of type operators to represent different encodings more uniformly. A paper by Fisher and Mitchell [FM96] (see also [FM94]) gives a general tutorial on type systems for object-oriented languages. It describes the origins and evolution of the recursive and existentiM encodings, and compares them with an axiomatic presentation of objects. 2

T e c h n i c a l P r e l i m i n a r i e s The "ambient type theory" in which our four encodings are expressed is the omega-order polymorphic lambda-calculus with subtyping, System F~: [Carg0, CL91, PT94, HP95, PS97, Com94], extended with existential types IMP88], recursively defined types [AC93], recursive functions, and records. In the interest of brevity, we assume that readers have some prior familiarity with F~:, with recursive types, and with the use of existential types for information hiding d la Mitchell and Plotkin. (Prior familiarity with some of the encodings we discuss will also be helpful, but is not required.) In this section, we sketch the syntax of the language and briefly discuss a few technical points of particular relevance to what follows.

The sets of kinds, types, and terms are given by the following grammar: K ::----Type

I K->K W ::~ X

Fun(X:K)T T T Top(K) T->T All(X)W All(X< :T)T Some (X) T Some (X< :T) T Rec (X)T {I:T...I:T} e ::= x fun(x:T)e ee fun(X<:T)e e T pack [X,e] as T open e as [X,x] in e {l=e...l=e} e.l let x=e in e letrec x(y:T):T = e in e kind of types kind of type operators type variable type operator application of a type operator maximal type of kind K function type universally quantified type bounded universal type existentially quantified type bounded existential type recursive type record type variable abstraction application type abstraction type application existential package construction existential package use record construction field selection local definition recursive local definition We assume standard definitions of reduction and conversion, writing m--~n to indicate that mand n are convertible. Although we shall perform conversion steps in whatever order is convenient for the sake of examples, we could just as well impose a call-by-name reduction strategy. (Most of the examples would diverge under a call-by-value strategy. This can be repaired, at the cost of some extra lambda-abstractions and applidations to delay evaluation at appropriate points.)

We are informal about kinding throughout the paper. In particular, we omit kind declarations on type abstractions, writing Fun(X)T instead of Fun (X:K)T.

In the definitions of the encodings, we use pairs in addition to records; these can, of course, be encoded straightforwardly. We write (m,n) for the pair of m and n and use the selectors f s t and snd to destruct pairs. S*T is the type of pairs of S and T.

Our formulation of existential types is standard, following (for example) Mitchell and Plotkin's. If S is a type expression, then any element v with type of the form S[U/X] can be "packed" into an element (pack [U,v] as Some(X)S) of type Some(X)S.

The expression (open o as [X,x] in b) unpacks the existential value o, yielding bindings for the type variable X and the term variable x, whose scope is the expression b. X represents the hidden, abstracted type, while x represents the term before it was packed. In particular, the expression (open o as IX,x] i n b) where o is (pack [U,v] as Some(X)S) will result in X being bound to the type expression U and o to the expression v. In order to preserve type-safety, one may only apply operations to x that do not depend on knowing the actual hidden type bound to X.

The rules for introduction and elimination of existentials are the usual ones. Informally:

T=flSome(X)S

~- v : [X ~-*U]S (pack [U,v] as T) : T ~-o : Some(X)S x : S } - b : B

X ˘ FV(B) (open o as [X,x] in b) : B (T-PACK) (T-UNPACK) Note the important side condition on the rule T-Unpack of existentials. If this side condition were dropped, then the hidden state X could "escape," breaking the abstraction.

In examples, we use the informal pattern-matching notation

open o as IX, (s,m)] in b to abbreviate

open o as [X,x] in let s=fst(x) in let m=snd(x) in b.

For example the following defines a simple abstraction containing a value of type X and a function mapping type X to integers: a b s t r dcf -- pack [ { x : S t r i n g } , ( {X="source"},

fun(s :{x:String}) length(s, x) )] as Some(X) X * (X -> Int) W e can use abstr by "opening" it and applying the second component to the first component:

: OE(Celll) open a b s t r as [ X , ( x , f ) ] in f ( x ) Because the type of f (x) does not involve X, this is legal according to T-Unpack. However, replacing f (x) by x or f (concat (x, "more")) is illegal according to T-Unpack as these changes would break the abstraction.

We can extend the subtyping relation to type functions (functions from types to types) by defining subtyping pointwise. Thus if F and G are type functions then F <: G ifffor all types X, F(X) <: G(X).

I(X)<:J(X) Fun(X) I(X)<:Fun(X) J(X) (S-A~s) Thus if G(X) = {bump:X, eq:X->Bool} and G(X) = {bump:X, eq:X->Bool, set:Int->X}thenF <: G.

The followingfolding and unfolding rules allow us to makeuse of recursive types:

Rec(X) I(X)<:I(Rec(X) I(X)) (S-UNFOLD) l(Rec(X) l(X))<:Rec(X) I(X) (S-FOLD) We will use these rules implicitly as needed rather than clutter the presentation.

The "Amber rule" is used £o determine when recursively-defined types are subtypes:

X<:Y ~ I(X)<:J(Y) Rec(X) ,I(X)<:Rec(Y) J(Y) (S-REc) Note that this rule has a stronger premise than the pointwise subtyping rule for type operators above (S-ABs). Adopting a pointwise rule for recursive types (i.e., making Rec(X) I(X) a subtype of Rec(X) J(X) whenever I(X)<:J(X)) would render the type system unsound [AC93].

The l e t r e c construct allows us to define terms using auxiliary functions (which may be defined recursively): f:S->T,x:SJ-e:T

f:S->TJ-b : B (letrec f(x:S):T = e in b) : B (T-LETREC)

For subtyping quantifiers, we have a choice of rules. Some of our encodings will work fine with the Kernel Fun fragment. One needs the full Fsub rule. The following is the Kernel Fun rule for bounded polymorphic functions. Notice that the bounds on the parameters are identical for Kernel Fun. For the full Fsub they are allowed to v~try:

X<:A ~ D<:B AII(X<:A)D<:AII(X<:A)B

(S-ALL-KFUN)

A<:C X<:A ~ D<:B All (X< :C)D< :All (X<: A)B

(S-ALL-FULL-FSUB) Our running example throughout the paper will be (purely functional) integer reference cell objects. 4 The interface of cell objects is represented by the following type operator:

Celll(X)

ao=~ {get:Int, set:Int->X, bump:X} Operationally, a cell object has three methods: g e t , which returns its current contents; s e t , which returns a new cell object (we intend that the contents of the resulting object should be set to the integer provided as a parameter, although of course the interface type doesn't guarantee this); and bump, which returns a new cell (whose contents should be one greater than the current contents). The role of the parameter X varies between the encodings we consider, but it may be thought of intuitively as a placeholder for the "type of self." Given an interface I, we write 0 (I) for the type of "objects with interface I."

We are interested in the properties of 0 ( I ) for different values of 0--i.e. for different ways of encoding objects with interface I. The four 0's that we consider in detail are:

OR(Z)
0E(

l(X)

Some(Y) Y * (Y->I(Y)) Rec(X) Some(Y) Y * (Y->I(X))

Rec(X) Some(Y<:X) Y * (Y->I(Y)) OR is a "classical" recursive-record encoding. 0E is the "existential encoding" of Hofmann, Pierce, and Turner [PT94, HP95]. ORE is a type-theoretic analog of Bruce's denotational semantics for objects [Bru94]. 0RBE is a variant of Abadi, Cardelli, and Viswanathan's type-theoretic encoding [ACV96]. The names are designed to remind the reader of the main features of the encodings: R stands for recursive types, E for existential types, and BE for bounded existentials.

The use of type operators (rather than just types) to represent object interfaces is a way of capt'uring, uniformly, two different points of view about the types of the objects methods: the "external view" of the object, in which the methods are abstract services that can only be invoked by an operation of "message sending," and the "internal view" of the object when it is being created, in which the methods are concrete values. The internal view of the methods' types varies from encoding to encoding (in two encodings I is applied to the recursively bound type variable X, while in the other two it is applied to the existentially bound variable Y.) On the other hand, the external view will always be the same: 4 We concentrate here on the purely functional versions of each of the encodings.

This choice aids both in formulating each of the systems (for example, it allows us to assume a call-by-name reduction strategy, avoiding some extra thunking for the corresponding call-by-value variants) and in later comparisons between systems. CellMessages def = =

O(CellI) -> CellI(O(CellI)) O(CellI) -> {get:Int, set:Int->O(CellI), bump: 0 (CellI) } That is, the messages supported by cellobjects can be viewed as a collectionof functions whose first parameter (the "self parameter") is a cell object and whose results are described by C e l l I ( 0 ( C e l l I ) ) . Of course, message sends will have to be interpreted differently in each of the object encodings in order to obtain this form.

It is technically convenient to write a single self parameter at the front of the whole collection of messages instead of abstracting each message individually on its self parameter. For example, for most of the paper we will assume that object interfaces are represented by covariant type operators, in which the bound variable appears only in positive positions. That is, each method of an object implicitly takes a single self parameter and can then return results of the self parameter type but not take any more arguments of this type. Section 4.6 discusses the implications of relaxing this restriction to allow "binary" methods with parameters of the same type as the receiver. See [BCC+96] for a more extended discussion.

Note that all of these encodings need to be combined with some kind of higher-order bounded quantification to provide satisfactory typings for functions manipulating objects. For example, a function that accepts a cell object and sends it the bump message twice is given the type bumpTwice

: AII(I<:CelII) O(I) -> O(I) capturing the fact that, if it is applied to a colored cell object, the result will also be colored.

We now develop each of the encodings in detail, using the example of cells to illustrate each one. 3.1

OR: R e c u r s i v e r e c o r d s The encoding of recursive records is fairly straightforward:

OR(I)

~' Rec(X) I(X) In this case an0bject is simply a recursive record in which each occurrence of X stands for the type of the entire record. Thus if T = 0R(I) then T = I(T).

We can encode a cell object as follows: mycell -- letrec mkobj(s:{x:Int}) : O R ( C e l I I ) = { get = s.x, set = :fun(n:Int) mkobj ({x=n)), bump = mkobj ({x=s.x+l}) } i n

mkobj ({x=0}) : OR(CellI) T h e recursive function mkobj creates a new object of type 0 R ( C e l l I ) , given a value for the internal state. 5

Let us introduce the informal syntax o<=1 for sending a message 1 to an object o. Because objects in this encoding are simply recursively defined records, message sending is represented by field selection (after unfolding the recursive type): 0<=I d==f It is easy to see that (mycell<=bump)<=get reduces to I as follows: (mycell<=bump) <=get = ~ (mkobj ({x=O}) <=bump) <=get = 8 ({ g e t = {x=O}.x, set = fun(n:Int) mkobj(fx=n}), bump = mkobj ({x={x=O}.x+l}) }.bump)<=get = ~ mkobj ({x= ({x=O}. x+l) }) <=get = ~ mkobj ({x=l})<=get =~ { x = l } . x =8 1

Instead of implementing bump by manipulating the state directly, suppose we want to implement it in terms of the other methods. We can write: m y c e l l ~ l e t r e c m k o b j ( s : { x : I n t } ) : 0 R ( C e l l I ) = let self = mkobj s in { get = s.x, set = fun(n:Int) mkobj ({x=n}), bump = self<=set(self<=get + i) } in

mkobj ({x=O}) : OR(CelII)

It is easy to see b y reducing the messages sent to self that this is equivalent to the original definition, above. 3.2

OE: E x i s t e n t i a l s In the next encoding, we treat objects as pairs of state (with type Y) and methods (with type Y->I(Y)), in which the state component is hidden from the outside and methods are functions that depend on the state. Thus

OE(

~-' Some(Y) Y * (Y->I(Y)) Note that, if we wanted to enforce a call-by-value reduction scheme, it would be necessary to change the encoding of the bump field, as otherwise a call to mkobj would always diverge. One solution would be to convert the bump field to a function of no arguments returning an object. where the bound type variable Y represents the hidden state. We can define a cell object as follows: It is now slightly more complex to send messages as we must "unpack" elements of existential type before we can access their components. Simple message sends like g e t are encoded as: o<=get d=ef

open o as [X,(s,m)] i n m ( s ) . g e t That is, we open the existential, apply the method suite to the state, and then extract the appropriate method.

However, messages like bump that return new objects with updated internal state require a bit more, since the resulting object must be re-packed. o<=bump clef open o as [X,(s,m)] in = pack [X, (m(s) .bump,m)] as OE(CelII) The extra pack in the translation follows from the fact that the return type of the method has type Y, rather than the object type. In order to yield a fresh object as result, the state returned by the method must be re-packaged (with the original methods and state type) as an existential value. With this abbreviation it is easy to see that (mycell<=bump)<=get evaluates to 1: (mycell<=bump) <=get = (open mycell as [X, (s,m)] in pack [X, (m(s).bump,m)] as OE (CellI)) <=get =~ =8 (pack [{x:Int}, ((methfun {x=O}) .bump, methfun)] as OE(CelII)) <=get (pack [{x:Int}, ({get={x=O}. x, set=fun (n: Int) {x=n},

bump={x={x=O}, x+l}}, bump, methfun) ] as OE(CelII)) <=get =~ =~ =B =~ =~ =~ (pack [ { x : I n t } , ({x={x=0}.x+l}, methfun)] as OE(Celll)) <=get (pack [{x:Int}, ({x=1}, methfun)] as OE(CellI)) <=get open (pack [{x:Int}, ({x=i}, meghfun)] as OE(CelII)) as [X,(s,m)] in m(s).get (methfun({x=l})).get {get={x=l}.x, set=fun(n:Int){x=n},

bump={x={x=l}.x+l}}.get {x=l}.x 1 where methfun d-c-f-- fun(s:{x:Int}) {get=s.x, set=fun(n:Int){x=n},

bump={x=s, x+l}}

Because the message-sending code has to repack the object after the send in the case of bump, but not in the case of get, message-sending boilerplate must be generated from types, rather than being defined independently of types (as in the other encodings). On the other hand, the call to mkobj in the s e t method of the 0R encoding of cells--which performs essentially the same "repackaging"--is omitted in the 0E encoding, so the method bodies themselves are more uniform than in OR (and the o~her two encodings to follow).

This encoding technique is closely related to semantic models of Abstract Data Types. See IMP88] for details. This encoding has also been adopted in [MMH96] in order to represent closures as objects in compilers.

In the simple encoding, the "bump" method has no access to the "set" and "get" methods--it's only passed the state as a parameter. But, as for OR, we can also build mycell in such a way that bump is defined in terms of g e t and s e t . This time, though, we have to do it a little differently. It doesn't help to send g e t and s e t to the whole object, since the result of s e t is then a whole object, while the bump method is supposed to return just an element of the state type. Instead, we build just the set of methods recursively: mycell

Note that this encoding can be refined by using a bounded existential to expose some of the instance variables. (This idea will come back later!) 0BE(X,R)

~-f Some(Y<:R) Y * Y->I(Y) In this encoding we are revealing that the state is a subtype of some "public instance variables interface" R, but are not specifying exactly what the type of the state is. 3.3

ORE:R e c u r s i o n and E x i s t e n t i a l s The intuition behind the ORE encoding is similar to 0E except that any methods that return new objects do the repacking of internal state th.emselves, rather than requiring that the sender do it. This eliminates the need for different encodings of o<=mdepending on the type of m.

0RE(I)

~f Rec(X) Some(Y) Y * (Y->I(X)) As with 0E, Y represents the state of the object, while the methods are functions that depend on the current state. Notice that the types of methods now are expressed in terms of X, the type of the entire object, rather than just the type of the Y component. This will make it easier for us to encode message sends in a more uniform way. Thus a method returning a value of type X is returning an object, not just its state component. As we shall see in Section 4.6, this also provides support for "binary methods."

For convenience, define: close def = fun(internalObj:

~x :Int} * ({x :Int}-YCellI (ORE(CellI)) ) ) pack [{x:Int}, internalObj] as ORE (CellI) The function c l o s e takes a pair representing the state and method definitions (in general, of type Y * (Y->t(X)) and creates an object of type 0RE(I) by hiding the type of the state.

Now define m y c e l l as: mycell dcf = letrec methfun(s:{x:Int}) : C e l I I ( O R E ( C e l I I ) ) = {get = s.x, set = fun(n:Int) close ({x=n},methfun), bump = close ({x=s.x+l},methfun)} in

close ({x=O},methfun) : OKE(CelII) but each method that returns an object must explicitly call c l o s e to repackage the internal state before it returns. (The call to c l o s e here corresponds to the call to mkobj in the OR encoding.)

T h e expression (mycell<=bump)<=get evaluates to 1 as before.

(mycell<=bump) <=get = ( ( c l o s e ( { x = 0 } , m e t h f u n ) ) <=bump)<=get ----~ ((pack [{x:Int}~ ({x=O},methfun)]

as ORE (CellI)) <=bump) <=get =;~ ((methfun {x=O}) .bump) <=get = ~ (close ({x={x=O}.x+l},methfun))<=get ~-~ (close ({x=l},methfun))<=get = # (pack [{x:Int}, ({x=l}jmethfun)]

as ORE(Celll) )<=get = # (methfun {x=l}).get =~ { x = l } . x = 8 1 We can implement bump in terms of s e t as follows: In this definition, s, with type {x: I n t } , represents the state while self, with type 0 R E ( C e l l I ) , represents an object with that state. As before, the methods s e t and bump both return values of type 0 R E ( C e l l I ) .

It is useful to compare this definition with the corresponding one for OR. T h e main difference is the splitting of the function mkobj of the earlier definition into two separate functions m e t h f u n and c l o s e . In essence, c l o s e allows the creation of new objects by simply packing a new state with an existing method suite rather than requiring the creation of a new recursively-defined record. Thus 0RE makes an explicit distinction between the state component of the o b j e c t - - t h e part that changes in response to message-sends--and the methods themselves, which are constant. (Of course, 0E makes the same distinction. In 0RBE, on the other hand, it becomes somewhat blurred, especially in the variant with method update discussed in Section 4.8.) 3.4

0RBE: l%ecurslon a n d B o u n d e d E x i s t e n t i a l s We can understand the 0RBE encoding by starting with the 0E encoding and working our way up to the more complex one.

T h e 0E encoding makes no public commitment about the type of the state: we can choose the state to be a record of instance variables, as we have done so far, or an element of any other type, so long as we can write methods that operate on this state in the appropriate way. In particular, we can choose the state type to be the type of the object itself! This may seem a slightly strange thing to do, but note that it allows us to use the o<=l syntax in the definition of bump: It would be nice ifwe could use the more uniform encoding of message sending in 01%and ORE. W e can do thisifwe add a recursivedefinitionof X while revealing only some of the information about the actual type of the object. Define: OKBE(I)

d~.f Rec(X) Some(Y<:X) Y * (Y->I(Y)) In the implementation of mycell, Y willbe the actual type ORBE(I) of the entire object, but we do not revea/this publicly.W e can now definean object as follows: letrec mkobj(s:{x:Int}) : ORBE(Celll) = let self = mkobj s in pack [ORBE(CellI), (self, fun(self':ORBE(Celll)) {get = s.x, set = fun(n:Int) mkobj {x=n}, bump = self'<=set (self'<=get + I) } )] as ORBE(CellI) in

mkobj({x=O}) : ORBE(CelII)

With this more refined encoding we can now define message sends uniformly (with the same definition as in the ORE encoding):

dİf o<=1 =

open o as [X,(s,m)] i n ( m ( s ) ) . l As in ORE and OR, this external uniformity comes at the price of having to call mkobj at the end of each method that returns an updated object.

For example: mycell<=set = open mycell as [X,(s,m)] in (m(s)).set =fl fun(n:Int) mkobj {x=n} Note that the assumption Y<:X is critically used in the typing of o<=bump: the body ( m ( s ) ) . l has minimal type Y, but in order to satisfy the side condition on the open rule for existential types, this has to be promoted to a Y-free supertype--i.e. ORBE(CellI). This subsumption works as long as Y appears in only positive positions. 4

Having presented these four models as encodings in a common notational framework, we are now in a position to begin comparing them along a number of dimensions. 4.1

T r e a t m e n t o f t h e s e l f p a r a m e t e r The four encodings represent four strategies for encoding objects. In OR, methods do not take an explicit s e l f argument on invocation. Instead, s e l f is implicitly bound by a recursive declaration when the object is constructed. In the other three encodings, an argument representing s e l f is explicitly passed to the methods. In OE and ORE, the argument is just the "internal state" of the object, while in 0RBE the argument is the whole object. In 0E, methods that return a modified version of s e l f (such as bump), return just the state part, while in ORE and 0P~E, such methods return a whole object. Summarizing, we can say that self-returning methods in 0E map internal states to new internal states, while ORand ORE map internal states to whole objects. 0RBEis like 0E in that methods map states to states, but also like OR and ORE in that what gets returned is a whole object. The four encodings "represent the same kind of objects," in the sense that an object in one of the encodings can be wrapped up into an object in any other encoding that reacts to messages in exactly the same way as the original. In two cases, the "wrapping procedure" is actually trivial:

ORBE(

ORBE(i) <: ORE(I) This shows that 01%BEis the most revealing of the three encodings involving existential types, in the sense that 0E and OREcan be viewed as variants of 0RBE that make fewer public commitments about their implementation. 4.3

Full a b s t r a c t i o n A more subtle--and arguably less important--difference between the OR encoding and the encodings based on existentials is that, in the latter three, an "observing context" can perform operations on an object that do not correspond to sending messages and, in some cases, obtain some information about the internal implementation of the methods. With the ORencoding, the only test that an observing context can make of an object is to look at the results that are returned by its methods. In the existential encodings, the observer can also apply the methods to a divergent argument, giving it the power to discriminate between objects that cannot be told apart just by sending messages in the ordinary way. This represents a kind of failure of full abstraction for the existential encodings.

To see this, consider two very simple 0E-objects a c=lef pack [ I n t , (0, l = f u n ( s : I n t ) 5 ) ] as 0E(J) b dc=~ pack [ I n t , (5, l : f u n ( s : I n t ) s ) ] as 0E(J) where

J(X)

The x messages of both objects yield the result 5. But internally, the code for x in a is a constant function, while the code for x in b is an identity function. This fact can be detected by the observer obs ~-----~ffun(o:0E(J)) unpack o as [X,(s,m)] in

m. x (bottom(X)) where bottom(X) is a divergent computation of type X, such as: bottom(X) ~ letrec f(n:Int) : X = f(n)

in f (

Thus, 0R has a claim to being the tightestencoding of the four, in the sense that the type OR (I) does not allow an observer to test the behavior of an object's methods directly by applying them to arguments other than the intended self parameter.

Note that the failure of full abstraction described here applies only in the case of a call-by-name evaluation strategy, since, with call-by-value, applying the methods to bottom always diverges. Since all common object-oriented languages use cail-by-vaiue, the difference is probably not significant in practice. 4.4

U n i f o r m m e t h o d s vs. u n i f o r m m e s s a g e s e n d i n g Another difference between the encodings is whether they choose to impose the burden of repackaging states into whole objects on the code that sends messages to objects (BE) or on the bodies of methods inside objects (OR, ORE, and 0RBE).

In ORE and 0RBE, every message is sent by opening the packed object, applying the second (method) component to the first (state) component, and then extracting the appropriate field of the result. It is even easier in OR, since no existential unpacking is needed.

In 0E, the encoding of message sending depends on the type of the method. If there is no occurrence of the "self type variable" (the bound variable of the type operator representing the interface signature) in the result type, then message sends are encoded as for OREand 0RBE.However if the return type is the self type variable, then the result of the method must be repackaged as a new existential value (of type 0E(I)).

Similarly, in OR, ORE, and 0RBE, methods that yield updated objects must return in a different way than methods returning simple values such as numbers

In either case, this repackaging introduces some non-uniformity in the encoding, since the results of methods that return objects must be treated differently from those that do not. For all of the encodings, it appears that the required packaging code can be generated automatically, based on the type of the method [HP95]. For the extension of the 0aBE encoding discussed in Section 4.8, a more uniform treatment is possible, in which the repackaging code is identical in all methods [ACV96]. 4.5

S t r e n g t h o f u n d e r l y i n g t y p e t h e o r y 0E works in the "most elementary" type t h e o r y - - ~ : with the Kernel Fun subtyping rule. If classes and inheritance are omitted, the underlying calculus is even strongly normalizing.

All the other models require recursive types, which entail recursion and loss of strong normalization. All the models (including 0E) use recursive values when adapted to allow method invocation through self--or, more generally, when extended with classes. In the presence of recursive values, the semantics of the type system becomes more challenging; recursive types also complicate the metatheory.

OR, 0E, and ORE work fine with the Kernel Fun subtyping rule for quantifiers. 0RBErequires the full F<: rule, leading to a substantial increase in the theoretical complexity of the calculus [Ghe95, Ghe93] and the loss of some pragmatically desirable properties such as decidability [Pie94]. See [PS97] for more discussion of variants of this rule.

The stronger rule is needed in 0RBE to validate the usual subtyping rule for object types. Recall that, in F~:, bounded existential are encoded in terms of bounded universals. When comparing two 0RBE object types, the Amber rule must be used first on the recursive types, followed by a comparison of existential types where the existential bounds are different type variables. Therefore, a general rule for subtyping existential types with different bounds is needed. This rule is derivable from the full F~: rule for universals, but not from weaker rules.

Even if existentials are taken as primitive, with a strong subtyping rule, the resulting system has undecidable typing. Karl Crary has observed [personal communication] that it may be possible to ameliorate this deficiency in 0RBEby introducing a single type constructor combining the behaviors of Rec and Some. 4.6

B i n a r y M e t h o d s Another difference between the encodings concerns the treatment of binary methods---methods taking an argument of the same type as the receiver object. Consider the following object interfaces:

do~'{get :Int, set :Int->X, bump :X} def (get:Int, set:Int->X, bump:X, eq:X->Bool} EqClrCellI(X) de=f{get:Int, set:Int->X, bump:X, eq:X->Bool, color :Color) CellI isour running example of cells;EqCellI adds a method eq that takes a cell and compares its contents with the contents of the cell to which the eq message is sent; E q C l r C e l l adds one more method (whose behavior is unimportant). The crucial difference between C e l l I and the other two operators is that C e l l I is covariant--that is, S< :T implies C e l l I (S)< : C e l l I (T)--which is not the case for EqCellI or E q C l r C e l l I , which both contain occurrences of the bound variable X in contravariant positions. This section and the next explore the consequences of non-covariant operators as object signatures.

Unfortunately, neither the 0E nor the 0RBE encoding handles non-covariant interfaces satisfactorily. For example, consider the object type 0 E ( E q C e l l I ) - simple existential cell objects with equality methods:

EqCell act = =

OE(EqCellI) Some(Y) Y * (Y -> {get:Int, set:Int->Y, bump:Y, eq:Y->Bool}) We can create objects with this type exactly a s w e did in Section 3.2. However, it is not possible to send eq messages to such objects in a way we would expect. Having unpacked the existential, applied the methods to the state, and projected out the eq field of the resulting record, we are left with a function that expects a parameter of the same state type. But the second cell object that we want to pass as argument has its own--possibly different--internal state type, so its internal state is not an appropriate argument. The same observation applies to 0RBE(EqCellI) (even though the state type is partially known).

This defect can be repaired to some extent by manually introducing a recursion in the interface signatures, binding the contravariant occurrences of the "self variable," and adding explicit object constructors:

REqCellI REqClrCellI This step allows binary messages to be sent to objects, but involves a nontrivial extension to the ambient type theory, since it relies on a recursively defined type operator. Moreover, it destroys the important property of pointwise subtyping between interfaces: REqClrCetlI is not a subtype of REqCellI (whereas EqClrCelll is a subtype of EqCelll).

By contrast, with OR and ORE we can create objects with interfaces like EqCellI and EqClrCellI and send them messages exactly as before: support for binary methods is "built in." We illustrate with the ORE encoding: Thus the message send myeqcell <= eq(othereqcell) will be well typed as long as othereqcell has type EqCell. N o changes are required to the definition of message sending in either ORor OREin order to support these binary methods.

Thus the recursively-bound type variable in ORE (and OR) enables the definition and use of messages whose types involve both covariant and contravariant occurrences of the object type being defined. Because the ORBEencoding does not use the recursively-bound t y p e variable in method types, it has the same difficulties as OE with binary methods.

Furthermore, since EqClrCellI is (pointwise) a subtype of EqCellI, we can write functions that manipulate both cells and colored cells with equality, by abstracting over subtypes of EqCellI: test5 a___cf fun(I<EqCellI) fun(o: 0RE(I)) if o.eq <= (o.set<=5) then "o contains 5" else "doesn ~t '' 4.7

S u b t y p i n g of O b j e c t T y p e s Unfortunately, the previous example would not work if we simply wrote test5' fun(C<0R(EqCellI)) fun(o :C) i f o . e q <= ( o . s e t < = 5 ) then "o c o n t a i n s 5" else "doesn't" using an abstraction over types bounded by the object type OR(EqCellI) instead of the abstraction over type operators I bounded by EqCellI. While this simpler version is well typed as it stands, it is not very useful because OR(EqCellI) does not have any nontrivial subtypes!

In general, the pointwise subtyping relation I < : J between object interfaces does not imply that the corresponding object types 0R(I) and 0R(J) are in the subtype relation. (Nor, similarly, does it follow that 0RE(I)<:0RE(J).) On the other hand, it does always follow that 0E(I)<:0E(J) and 0RBE(I)<:0RBE(J). The built-in support for binary methods in OR and O]REcomes at the price of subtyping between object types in some cases. In particular, it will only be the case that I <: J implies OR(I) < :OR(J) and ORE(I) <: ORE(J) when J is covariant.

This may or may not be viewed as a serious problem, since we can always write functions in the form of t e s t 5 instead of t e s t S ' . Indeed, variations on this style of "polymorphic programming by bounded abstraction over interfaces" have been proposed in several languages under the names matching, F-bounded quantification, and where clauses [BHJ+87, CCH+89a, BSvG95, AC95, BPF95, DGLM95]. 4.8

M e t h o d U p d a t e Method update can be added to encodings of the 0RBE flavor, by extending the encoding with a collection of method updaters. These updaters take a sufficiently polymorphic new method and return an object with the new method in it [ACV96]. Forms of method update can be added also to encodings of the OR flavor. See [AC96, p. 268], and [San96].

These techniques work for certain presentations of the encodings, but do not adapt trivially to our presentation. However, there is hope of finding a systematic treatment of method update for all of our encodings. We leave this topic for further work. 5

C o n c l u s i o n s Table 1 summarizes the major points of comparison between the four encodings we have considered. Interestingly, none of the columns completely dominates all of the others. However, we can make some broad comparisons.

There are two basic encoding techniques and two hybrids. The principal advantage of the basic techniques is straightforward intuition: OR represents the most naive view of objects as data values that can be interrogated by named messages; 0E gives a lower-level picture, showing explicitly that objects consist of state and methods, with the state inaccessible except via the methods. The hybrid encodings--both of which can be viewed as deriving from 0E--are more powerful, each offering a useful refinement: OREadds support for binary methods, while a variation of 0RBEwas the first to support method update.

This paper is the beginning of a uniform treatment of most known encodings, but more work needs to be done. In particular, we intend to extend this treatment to method update and classes. It would also be useful to develop a simple object-oriented language supporting the constructs treated here and present its translation using each of these encodings.

Kernel Fun

Kernel Fun

Kernel Fun full F<: if J is covariant yes

if J is covariant yes [}E

ORE message method sender can be added built-in no pure F~: no F~, + Rec

ORBE method built-in n o

<: % Rec
limited, using yes
extra Re˘
nO nO
limited, using
extra Rec
m a variant
responsibility for
repackaging results
internal access to
"self methods"
"fully abstract"
ambient type
theory
quantifier
subtyping
I<:J ::~
O(

OR method built-in ~'es F~, + Rec yes n o

Peter O'Hearn and Ramesh Viswanathan joined us in early discussions of the material presented here. O'Hearn was particularly helpful in understanding the circumstances under which "full abstraction" of the encodings will fail (cf. Section 4.3). Viswanathan contributed valuable insight into method update. Comments from Paul Steckler, Martfn Abadi, and four anonymous referees helped us improve our presentation from an earlier draft.

Bruce was partially supported by NSF grant CCR-9424123. Pierce was
partially supported by EPSRC grant G R / K 38403.
[AC93] Roberto Amadio and Luca Cardelli. Subtyping recursive types. ACM
Transactions on Programming Languages and Systems, 15(