Sophie: parrot-doc-1.6.0-1mdv2010.0 i586

parrot-doc-1.6.0-1mdv2010.0.i586.rpm

# Copyright (C) 2008-2009, Parrot Foundation.
# $Id: pdd31_hll_interop.pod 37220 2009-03-09 01:25:49Z allison $

=head1 [DRAFT] PDD 31: Inter-Language Calling

=head2 Version

$Revision: 37220 $

=head2 Abstract

This PDD describes Parrot's conventions and support for communication between
high-level languages (HLLs).  It is focused mostly on what implementors should
do in order to provide this capability to their users.

=head2 Description

The ability to mix different high-level languages at runtime has always been
an important design goal of Parrot.  Another important goal, that of
supporting all dynamic languages, makes language interoperability especially
interesting -- where "interesting" means the same as it does in the Chinese
curse, "May you live in interesting times."  It is expected that language
implementers, package authors, and package users will have to be aware of
language boundaries when writing their code.  It is hoped that this will not
become too burdensome.

None of what follows is binding on language implementors, who may do whatever
they please.  Nevertheless, we hope they will at least follow the spirit of
this document so that the code they produce can be used by the rest of the
Parrot community, and save the fancy footwork for intra-language calling.
However, this PDD B<is> binding on Parrot implementors, who must provide a
stable platform for language interoperability to the language implementors.

=head3 Ground rules

In order to avoid N**2 complexity and the resulting coordination headaches,
each language compiler provides an interface as a target for other languages
that should be designed to require a minimum of translation.  In the general
case, some translation may be required by both the calling language and the
called language:

{{ There seems to be an implied basic assumption here that language
interoperability is the responsibility of the language implementor. It is not.
We cannot require that language implementors design and implement their
languages according to some global specification. Any interoperability
infrastructure must be provide by Parrot, and must work for all languages.
--allison }}

        |
        |
        |                        Calling sub
        |                             |
        |   Language X                |
        |                             V
        |                        Calling stub
        +================             |
                                      |
          "plain Parrot"              |
                                      |
        +================             |
        |                              V
        |                        Called wrapper
        |                             |
        |                             |
        |   Language Y                V
        |                         Called sub
        |

Where necessary, a language may need to provide a "wrapper" sub to interface
external calls to the language's internal calling and data representation
requirements.  Such wrappers are free to do whatever translation is required.

Similarly, the caller may need to emit a stub that converts an internal call
into something more generic.

{{ Of course, "stub" is really too close to "sub", so we should find a better
word.  Doesn't the C community call these "bounce routines"?  Or something?
-- rgr, 31-Jul-08.

The language will never provide a wrapper for its subs. For the most part,
wrappers will be unnecessary. Where a wrapper is desired to make a library
from some other language act more like a "native" library, the person who
desires the native behavior can implement the wrapper and make it publicly
available.  --allison }}

{{ I am discovering that there are five different viewpoints here,
corresponding to the five layers (including "plain Parrot") of the diagram
above.  I need to make these viewpoints clearer, and describe the
responsibilities of each of these parties to each other.  -- rgr, 31-Jul-08.
}}

Languages are free to implement the stub and wrapper layers (collectively
called "glue") as they see fit.  In particular, they may be inlined in the
caller, or integral to the callee.

Ideally, of course, the "plain Parrot" layer will be close enough to the
semantics of both languages that glue code is unnecessary, and the call can be
made directly.  Language implementors are encouraged to dispense with glue
whenever possible, even if glue is sometimes required for the general case.

In summary:

=over 4

=item *

Each HLL gets its own namespace subtree, within which C<get_hll_global> and
C<set_hll_global> operate.  In order to make external calls, the HLL must
provide a means of identifying the language, the function, and enough
information about the arguments and return values for the calling language to
generate the call correctly.  This is necessarily language-dependent, and is
beyond the scope of this document.

=item *

When calling across languages, both the caller and the callee should try to
use "plain Parrot semantics" to the extent possible.  This is explained in
more detail below, but essentially means to use the simplest calling
conventions and PMC classes possible.  Ideally, if an API uses only PMCs that
are provided by a "bare Parrot" (i.e. one without any HLL runtime code), then
it should be possible to use this API from any other language.

{{ This is unnecessarily restrictive --allison }}

=item *

It is acceptable for languages to define subs for internal calling that are
not suitable for external calling.  Such subs should be marked as such, and
other languages should respect those distinctions.  (Or, if they choose to
call intra-language subs, they should be very sure they understand that
language's calling conventions.

{{ It's not possible to define a sub that can't be called externally
--allison }}

=back

=head2 Half-Baked Ideas

{{ Every draft PDD should have one of these.  ;-}  -- rgr, 28-Jul-08.  }}

=head3 Common syntax for declaring exported functions?

I assume we will need some additional namespace support.  Not clear yet
whether it's better to mark the ones that or OK for external calling, or the
ones that are not.

(As you can guess, I don't have a strong suggestion for what to call these
functions yet.  Do we call them "external"?  Would that get confused with
intra-language public interfaces?)

Beyond that, we probably need additional metainformation on the external subs
so that calling compilers will know what code to emit.  Putting them on the
subs means that the calling compiler just needs to load the PBC in order to
access the module API (though it may need additional hints).  Of course, that
also requires a PIR API for accessing this metainformation . . .

{{ Exporting is very much a Perl idea, not much applicability for exporting
outside of Perl. --allison}}

Crazy idea:  This is more or less the same information (typing) required for
multimethods.  If we encourage the export of multisubs, then the exporting
language could provide multiple interfaces, and the calling compiler could
query the set of methods for the one most suitable.

{{ Proposal rejected, because we aren't going with "external" and "internal"
subroutine variants, so it's not needed. --allison }}

=head3 More namespace complexity?

{{ Proposal rejected, because we aren't going with "external" and "internal"
subroutine variants, so it's not needed. --allison }}

It might be good to have some way for HLLs to define a separate external
definition for a given sub (i.e. one that provides the wrapper) that can be
done without too much namespace hair.  I.e.

        .sub foo :extern

defines the version that is used by interlanguage calling, and

        .sub foo

defines the version that is seen by other code written in that language
(i.e. via C<get_hll_global>).  If there is no plain C<foo>, the C<:extern>
version is used for internal calls.  That way, the compiler can emit both
wrapper code and internal code without having to do anything special (much),
even if different calling conventions and/or data conversions are required.

{{ Of course, this wouldn't be necessary if all external subs were multisubs.
-- rgr, 31-Jul-08. }}

=head3 Multiple type hierarchies?

Different languages will have to "dress up" the Parrot type/class hierarchy
differently.  For example, Common Lisp specifies that C<STRING> is a subtype
of C<VECTOR>, which in turn is a subtype of C<ARRAY>.  This is not likely to
be acceptable to other languages, so Lisp needs its own view of type
relationships, which must affect multimethod dispatch for Lisp generic
functions, i.e. a method defined for C<VECTOR> must be considered when passed
a string as a parameter.

{{ Common Lisp (for example) will have its own set of type relationships,
because it will have its own set of types. There will be no "remapping" of
core types --allison }}

The language that owns the multisub gets to define the type hierarchy and
dispatch rules used when it gets called.  In order to handle objects from
foreign languages, the "owning" language must decide where to graft the
foreign class inheritance graph into its own graph.  {{ It would be nice if
some Parrot class, e.g. C<Object>, could be defined as the conventional place
to root language-specific object class hierarchies; that way, a language would
only have to include C<Object> in order to incorporate objects from all other
conforming languages.  -- rgr, 26-Aug-08. }}

{{ The language that owns the multisub does get to define the dispatch rules
for the multisub. But, it doesn't get to alter the type hierarchy of objects
from other languages. --allison }}

Note that common Parrot classes will in general appear in different places in
different languages' dispatch hierarchies, so it is important to bear in mind
which language "owns" the dispatch.

{{ Absolutely not true. --allison }}

=head2 Definitions

{{ Collect definitions of new jargon words here, once we figure out what they
should be.  -- rgr, 29-Jul-08. }}

=head2 Implementation

=head3 Plain Parrot Semantics

Fortunately, "plain Parrot" is pretty powerful, so the "common denominator" is
not in fact the lowest possible.  For example, not all Parrot languages
support named, optional, or repeated arguments.  For the called language, this
is never a problem; calling module can only use the subset API anyway.
Implementers of subset calling languages are encouraged to provide their users
with an extended API for the interlanguage call; typically, this is only
required for named arguments.

{{ This needs more?  -- rgr, 28-Jul-08. }}

=head3 Strings

    {{ I am probably not competent to write this section.  At the very least,
    it requires discussion of languages that expect strings to be mutable
    versus . . . Java.  -- rgr, 28-Jul-08. }}

=head3 Other scalar data types

All Parrot language implementations should stick to native Parrot PMC types
for scalar data, except in case of dire need.  To see with this is so, take
the particular case of integer division, which differs significantly between
languages.

{{ No, this is completely backwards. Languages are heartily encouraged to
create their own PMCs for any and all common variable types found in the
language. --allison }}

In Tcl, "the integer three divided by the integer five" produces the integer
value 0.

In Perl 5 and Lua, this division produces the floating-point value 0.6.  (This
happens to be Parrot's native behavior as well.)

In Common Lisp, this division produces "3/5", a number of type C<RATIO> with
numerator 3 and denominator 5 that represents the mathematically-exact result.

Furthermore, no Perl 5 code, when given two integers to divide, will expect a
Common Lisp ratio as a result.  Any Perl 5 implementation that does this has a
bug, even if both those integers happen to come from Common Lisp.  Ditto for a
floating-point result from Common Lisp code that happens to get two integers
from Perl or Lua (or both!).

{{ Not a bug, it's the expected result. Divide operations are
multi-dispatched.  If you pass two Common Lisp integers into a divide
operation in Perl 5, it'll search for the best matching multi, and if it finds
one for Common Lisp integers (an exact match), it'll run that and return a
Common Lisp ratio.  --allison }}

Even though these languages all use "/" to represent division, they do not all
mean the same thing by it, and similarly for most (if not all) other built-in
arithmetic operators.  However, they pretty clearly B<do> mean the same thing
by (e.g.) "the integer with value five," so there is no need to represent the
inputs to these operations differently; they can all be represented by the
same C<Integer> PMC class.

{{ The whole point of having sets of PMCs in different languages is to handle
the case where "it's an integer, but has a different division operation than
other languages" --allison}}

{{ Must also discuss morphing:  If some languages do it and other do not, then
care must be taken at the boundaries.  -- rgr, 31-Jul-08. }}

=head4 Defining new scalar data types

There will be cases where existing Parrot PMC classes cannot represent a
primitive HLL scalar type, and so a new PMC class is required.  In this case,
interoperability cannot be guaranteed, since it may not be possible to define
behavior for such objects in other languages.  But the choice of a new PMC is
forced, so we must make the best of it.

{{ Yes, except this is the common case, and interoperability will still work
--allison }}

A good case in point is that of complex rational numbers in Common Lisp.  The
C<Complex> type provided by Parrot assumes that its components are
floating-point numbers.  This is a suitable representation type for C<(COMPLEX
REAL)>, but CL partitions "COMPLEX" into C<(COMPLEX REAL)> and C<(COMPLEX
RATIONAL)>, with the latter being further divided into C<(COMPLEX RATIO)>,
C<(COMPLEX INTEGER)>, etc.  The straightforward way to provide this
functionality is to define a C<ComplexRational> PMC that is built on
C<Complex> and has real and imaginary PMC components that are constrained to
be Integer, Bigint, or Ratio PMCs.

So how do we make C<(COMPLEX RATIONAL)> arithmetic work as broadly as
possible?

The first aspect is defining how the new type actually works within its own
language.  The Lisp arithmetic operators will usually return a ComplexRational
if given one, but need to return a RATIONAL subtype if the imaginary part is
zero, and that may not be suitable for other languages, so Lisp needs its own
set of basic arithmetic operators.  We must therefore define methods on these
multis that specialize ComplexRational (and probably the generic arithmetic to
redispatch on the type of the real and imaginary parts; you know the drill).
But, in case we are also passed another operand that is another language's
exotic type, we should take care to use the most general possible class to
specialize the other operands, in the hope that other exotics are subclasses
of these.

{{ It is perfectly fine for a Lisp arithmetic operator to return a RATIONAL
subtype. Please don't define methods for a pile of operations that already
have vtable functions --allison }}

The other aspect is extending other languages' arithmetic to do something
reasonable with our exotic types.  If we're lucky, Parrot will provide a basic
multisub that takes care of most cases, and we just need to add method(s) to
that.  If not, we will have to add specialized methods on the other language's
multisub, trying to redispatch to the other language's arithmetic ops passing
the (hopefully more generic) component PMCs.  Doing so is still the
responsibility of the language that defines the exotic class, since it is in
charge of its internal representation.

{{ The default multi for a common operation like division will call the PMC's
C<get_number> vtable function, perform a standard division operation, and
return a standard Integer/Number/BigNum. --allison }}

{{ We can define multimethods on another language without loading it, can't
we?  If not, then making this work may require negotiation between language
implementors, if it is feasible at all.  -- rgr, 31-Jul-08. }}

{{ I'm not sure what you mean by defining multimethods on another language.
Perhaps you're asking if it's possible to declare a multisub for a type that
doesn't exist yet?  --allison }}

This brings us to a number of guidelines for defining language-specific
arithmetic so as to maximize interoperability:

=over 4

=item 1.

Define language-specific operations using multimethods (to avoid conflict with
other languages).

{{ Clarify? How would non-multi's conflict?  --allison }}

=item 2.

Define them on the highest (most general) possible PMC classes (in order that
they continue to work if passed a subclass by a call from a different
language).

{{ Define them on the class that makes sense. There's no point in targeting
any particular level of the inheritance hierarchy. --allison }}

=item 3.

Don't define a language-specific PMC class unless there is clear need for a
different internal representation.  (And even then, you might consider
donating it to become part of the Parrot core.)

{{ This is definitely not true. --allison }}

=back

{{ The fundamental rule is to implement your language in the way that makes
the most sense for your language. Language implementors don't have to think
about interoperability. --allison }}

The rest of this section details exceptions and caveats in dealing with scalar
data types.

=head4 "Fuzzy" scalars

Some languages are willing to coerce strings to numbers and vice versa without
any special action on the part of the programmer and others are not.  The
problem arises when such "fuzzy" scalars are passed (or returned) to languages
that do not support "fuzzy" coercion . . .

{{ This section is meant to answer Geoffrey's "What does Lisp do with a Perl 5
Scalar?" question.  I gotta think about this more.  -- rgr, 29-Jul-08.  }}

{{ The scalar decides when to morph, not the language. All the languages that
have morphing scalars implement them in such a way that they know how to
handle, for example, morphing when a string value is assigned to an integer
scalar, and what to do if that value is later used as an integer again.
--allison }}

=head4 C<Complex> numbers

Not all languages support complex numbers, so if an exported function requires
a complex argument, it should either throw a suitable error, or coerce an
acceptable numeric argument.  In the latter case, be sure to advertise this in
the documentation, so that callers without complex numbers can tell their
compiler that acceptable numeric type.

{{ All documentation for a library should state what argument types it accepts
and what results it returns, there's nothing unique about complex numbers.
--allison }}

=head4 C<Ratio> numbers

Not all languages support ratios (rather few, actually), so if an exported
function requires a ratio as an argument, it should either throw a suitable
error, or convert an acceptable numeric value.

However, since ratios are rare (and it is rather eccentric for a program to
insist on a ratio as a parameter), it is strongly advised to accept a floating
point or integer value, and convert it in the wrapper.

{{ All documentation for a library should state what argument types it accepts
and what results it returns, there's nothing unique about ratios. --allison }}


    {{ Parrot does not support these yet, so this is not a current issue.  --
    rgr, 28-Jul-08. }}

=head3 Aggregate data types

{{ I probably haven't done these issues justice; I don't know enough Java or
Tcl to grok this part of the list discussion.  -- rgr, 28-Jul-08. }}

Aggregates (hashes, arrays, and struct-like thingies) can either be passed
directly, or mapped by wrapper or caller code into something different.  The
problem with mapping, besides being slow, is that if I<either> the caller or
the callee does this, the aggregate is effectively read-only.  (It is possible
for the wrapper to stuff the changes back in the original structure by side
effect, but this has its own set of problems.)

{{ Mapping is generally discouraged, but I don't see any reason it would make
the aggregate read-only. You can certainly convert a Python dictionary to a
Perl hash, use it in your Perl code, and then either return it as a Perl hash,
or convert it back to a Python dictionary. --allison }}

In other words, if the mapping is not straightforward, it may not be possible.
If the mapping C<is> straightforward it may not be necessary -- and an
unnecessary mapping may limit use of the called module's API.

Struct-like objects are problematic.  They are normally considered as
low-level and language-specific, and handled by emitting special code for slot
accessor/setter function, which other language compilers won't necessarily
know how to do.  The choices are therefore to (a) treat them like black boxes
in the other language, or (b) provide a separate functional or OO API (or
both) for calling from other languages.

{{ (a) is generally preferred --allison }}

Several questions arise for languages with multiple representations for
aggregate types.  Typically, this is because these types are more restricted
in some fashion.  [finish.  -- rgr, 29-Jul-08.]

{{ Not clear where you're going with this --allison }}

=head3 Functional data types

In a sense, functional types (i.e. callable objects) are the easiest things to
pass across languages, since they require no mapping at all.  On the other
hand, if a language doesn't support functional arguments, then there is no
hope of using an API written in another language that requires them.

{{ Hmmm? They're just subs, how would they not be callable from another
language? --allison }}

=head3 Datum vs. object

Some languages present everything to the programmer as an object; in such
languages, code only exists in methods.  A few languages have no methods, only
functions (and/or subroutines) and "passive" data.  The remainder have both,
and pose no problem calling into the others.

But how does an obligate OO language call a non-OO language, or vice versa?
An extreme case would be Ruby (which has only objects) and Scheme (which (as
far as Ruby is concerned) has none).  What good is a Ruby object as a datum to
a Scheme program if Scheme can't access any of the methods?  Similarly, what
could Ruby do with a Scheme list when it can't even get to the Scheme C<car>
function?

{{ Except that Ruby would never even get a Scheme list in the first place if
it hadn't loaded a Scheme library of some sort. And, being a list, the Scheme
list would still support the standard vtable functions for lists. --allison }}

{{ Methinks the right thing would be to define a common introspection API (a
good thing in its own right).  Scheme and Ruby should each define their own
implementation of the same in "plain Parrot semantics" terms, independently.
The caller can then use his/her language's binding of the introspection API to
poke around in the other module, and find the necessary tools to call the
other.  For Scheme, this would mean functions for finding Ruby classes and
providing functional wrappers around methods.  For Ruby, I admit this would
probably be even weirder.  In any case, it is important that the calling user
not need anything out of the ordinary, from either language or the called
module author.  -- rgr, 29-Jul-08. }}

{{ There is a common introspection API, the 'inspect' vtable function. But
what you're describing here isn't introspection, it's actually the standard
vtable functions.  --allison }}

=head4 Defining methods across language boundaries

{{ Is the term "unimethod" acceptable here?  -- rgr, 29-Jul-08.  They're just
methods or subroutines, and it's just "single dispatch". --allison}}

There will be cases where a module user wants to extend that module by
defining a new method on an externally-defined class, or add a multimethod to
an externally-defined multisub.  Since a class with unimethod dispatch belongs
wholly to the external language, the calling language (i.e. the one adding the
method) must use the semantics of the external language.  If the external
language uses a significantly different metamodel, simply adding the
C<:method> pragma may not cut it.

{{ No, the C<:method> flag is always all you need to define a method. The
class object you add the method to determines what it does with that method.
--allison }}

There are two cases:  (1) The calling language is adding a new method, which
cannot therefore interfere with existing usage in the called language; and (2)
the calling language is attempting to extend an existing interface provided by
the called language.  In the first case, the calling compiler has the option
of treating the new method as part of the calling language, and dispensing
with the glue altogether.  In the second case, the compiler must treat the new
method as part of the foreign language, and provide B<both> glue layers (as
necessary) around it.  It is therefore not expected that all compilers will
provide a way to define methods on all foreign classes for all language pairs.

{{ These should generally be handled by subclassing the parent language class,
and adding your method to the subclass. Monkeypatching is certainly possible,
but not encouraged. And, there really isn't any distinction between "treating
the new method as part of the calling language" and "treat[ing] the new method
as part of the foreign language". It's a method, you call it on an object, the
class of the object determines how it's found and invoked. --allison }}

Multimethods are easier; although the multisub does belong conceptually to one
language (from whose namespace the caller must find the multisub), multis are
more loosely coupled to their original language.

{{ Well, the semantics of the language that defined the multisub also
determine how it is found and invoked. --allison }}

The cases for multimethods are similar, though:  (1) If the calling language
method is specialized to classes that appear only in the calling module, then
other uses of the multisub will never call the new method, and the calling
language can choose to treat as internal.  (2) If the calling method is
specialized only on Parrot or called-language classes, then the compiler
should take care to make it generally usable.

{{ Not sure what you mean here. --allison }}

=head4 Subclassing across language boundaries

{{ This is an important feature, but requires compatible metamodels.  -- rgr,
29-Jul-08.

Or Proxy PMCs, which is how we're currently handling inheritance across
metamodel boundaries. --allison
}}

=head4 Method vs. multimethod

{{ This is the issue where some languages (e.g. Common Lisp) use only
multimethods, where others (e.g. Ruby) use only unimethods.  (S04 says
something about MMD "falling back" to unimethods, but so far this is not
described in Parrot.)  Calling is easy; multimethods look like functions, so
the MM language just has to create a function (or MM) wrapper for the UM
language, and a UM language can similarly treat a MM call as a normal function
call.  (Which will require the normal "make the function look like a method"
hack for obligate OO languages like Ruby.)  Defining methods across the
boundary is harder, and may not be worth the trouble.  -- rgr, 29-Jul-08. }}

{{ That's "multiple dispatch" and "single dispatch". In general, defining code
in one language and injecting it into the namespace of another language isn't
the primary focus of language interoperability. Using libraries from other
languages is. --allison }}

=head2 References

None.

=cut

__END__
Local Variables:
  fill-column:78
End:
vim: expandtab shiftwidth=4: