Adding Behavior to VRML

The Graphics and Visualizaton Center
an NSF Science and Technology Center

Tom Meyer and D. Brookshire Conner

Abstract

We propose a general method for both allowing for the specification of new types of VRML nodes and for describing their associated behavior. As part of this proposal, we suggest some additional syntax and standard nodes for VRML 2.0.

Revisions

15 August, translated to HTML.

17 August, modified section on other behavior systems to reflect recent VRML proposals, and added sections on time-critical behaviors, why a scripting language is a good idea, and api considerations.

30 August, modified the syntax, and restructured the document to clarify the ideas a lot.

Introduction

VRML, based on the Open Inventor ASCII file format, has rapidly become a standard 3D modeling language for the Internet. However, it has two major limitations:

The user may manipulate the camera and follow links, but no other types of interaction are supported.
There is only a very simple extension mechanism, and it does not allow one to specify the behavior of a new node type.

We would like to provide for objects which can change over time and in response to external stimuli. In David Zeltzer's taxonomy of graphical simulation systems (Autonomy, Interaction, and Presence), this corresponds to adding autonomy (animation, physics), and interaction (user manipulation) to the system. At the moment, presence is beyond the scope of VRML, since it refers to such things as the type and quality of the hardware used to interact with the virtual world.

Any proposed extension to VRML faces a unique set of challenges. It must be flexible enough to provide for the needs of an extremely diverse group of users, on platforms ranging from PCs to supercomputers, and must be able to work over an untrusted wide-area network. Additionally, it should fit the spirit of both VRML and the WWW, be defined through a process of group decision-making, and should use public-domain tools where possible.

Existing Systems

A robust virtual-environment description language should be able to support time-varying models, user interaction, and multi-user sessions. In addition, it should be flexible, allowing for extensions in all aspects of functionality. There are a number of research and commercial systems which support some or all of these:

Inventor, from SGI, is a 3D graphics system written in C++, and is the one of the most commonly-used interactive 3D systems. It provides for nodes in a scene graph, and for adding additional node types using dynamically-loaded C++ code. Internally, Inventor makes use of multimethods, also used in Cecil and CLOS to provide for a scene graph that can perform several different kinds of behaviors in an extensible way. For example, Inventor lets the programmer define both new nodes and new actions that nodes can perform (e.g., alternative rendering techniques).

TBAG, developed at Sun Microsystems, is a functional, stateless 3D system. Function objects in TBAG can be related to each other by multi-way constraints and then evaluated with different parameters to create time-varying or user-controlled geometries. TBAG also makes use of multimethods internally, even more pervasively than Inventor.

The UGA system, which was developed at Brown University, uses an interpreted language, Flesh. This is a prototype-delegation object-oriented language which was specifically developed to rapidly prototype complex interactive 3D scenes. It supports a variety of animation techniques, including keyframing, inverse kinematics, physically-based modeling, and the evaluation of arbitrary functions.

Alice, from the University of Virginia, is specifically designed to support rapid prototyping of 3D immersive environments, and uses Python as an interpreted extension language.

ANIM3D was developed at Digital to visualize algorithms, and uses a custom prototype-delegation language, Obliq, which provides for concurrent behaviors.

Worlds, a commercial product developed by Worlds, Inc, allows for multiple participants to take part in a dynamic environment with animations, spatialized audio, and texture-mapped video. VRML+, which has been proposed by Worlds, includes a behavior extension protocol which should be language independent, and also defines an API for modifying the scene graph. They also propose a networking protocol for connecting VRML servers and clients.

The BE system, developed by the BE Software Company, provides an excellent system of Inventor-based nodetypes for describing behaviors. They define a Pascal-based syntax for embedding new behaviors inside nodes, and demonstrate some nodes for physically based modeling.

Language Extension Mechanisms

There are many extension mechanisms available in contemporary programming languages, but they generally involve some combination of macro, function, or class definition. C++, for example, combines all three types of extension mechanisms and adds some very complex subclassing rules.

The graphics systems described above, however, tend to use different mechanisms that are more powerful in many ways than the mechanisms provided by C++. Multimethods in a graphics system allow for the extension of both the set of primitives and the set of renderers. Most graphics systems use some form of dynamic object model, such as the prototype/delegation model used in the Self language. This model is especially prevalent in MOOs.

Finally, most of the systems above make some use of concurrency. While not usually thought of as an extension mechanism, the use of concurrency permits the flexible extension of the system while it is running --- adding new behavior is simply a matter of adding a new thread (note that timer callbacks and similar mechanisms are essentially highly constrained forms of simulated concurrency).

Why define a scripting language?

Some have implied that all one needs to express behaviors in VRML is a good API (Application Programming Interface). In practice, yes, an API is sufficient. However, if VRML lacks simple techniques for compositing node behaviors together, and for defining nodes based on other nodes (inheritance), then it will always be necessary to use an external programming language, even for fairly simple behaviors added to VRML.

We think that, given a suitable library of existing behaviors, most new behaviors should be simple to implement based on compositions of other existing behaviors. Only in cases where entirely new functionality is being added (for example, a node which opens up a socket and listens to it), should it be necessary to use an external programming environment.

Additionally, a simple scripting language specifically designed for the requirements of VRML (flexible inheritance, concurrency, VRML-embedded syntax), should be easier for non-programmers to learn.

Prototyping new nodes

One of the more annoying parts of VRML right now is the impossibility of specifying a node separately from using it. We suggest a new type of separator, Prototypes, which doesn't render any of its children, and which also provides a unified location for definitions.

This scheme has several benefits. It allows information about the contents of a scene (e.g., what kinds of nodes it contains) to be factored out into a separate world consisting of prototypes. Prototypes also provide the ability to DEF something without actually using it, a capability the current semantics of DEF doesn't support.

We can consider an example VRML file, providing some sample geometries that can be used in a plug-and-play fashion, included via a WWWInline, and used without duplicating models.

Prototype {
   DEF Refrigerator {
       fields [tempSetting SFFloat]
       tempSetting 40
       # Children for geometry here
   }
}

Existing extension mechanisms

VRML 1.0 provides a limited mechanism for adding new nodetypes, using the fields and isA fields. In the VRML 1.0 specification, the Cube node could be specified as follows:

Cube {
  fields [ SFFloat width, SFFloat height, 
           SFFloat depth ]
}

However, this extension mechanism is not powerful enough to describe the behavior of these new node types.

We propose an extension mechanism that permits new node types, browser extensions, and the interaction of new node types and new browser extensions. The intention of our extension mechanism is to provide a mechanism that is simple and easy to understand and implement. We wish to build as much as possible on VRML 1.0 for backwards compatibility, while adding just enough new syntax to get the job done.

Modifications to isA

The model we propose allows for the prototype-delegation model, which has been found useful in a variety of interactive graphics systems, and is further discussed here and here.

As in VRML 1.0, new nodes have both fields and isA fields. We suggest the minor extension that the isA field be an MFString, rather than an SFString, allowing multiple inheritance while maintaining backwards compatibility with old syntax.

In VRML 1.0, the isA field is defined only to provide an alternate implementation of a new node if the VRML browser is not able to resolve the node. We would like to modify the semantics of isA to make it more like inheritance in a traditional programming language.

The first change is that any node which references another node using isA inherits all its field definitions. Also, since we may want to inherit from a specific instance of that node, we can use the following syntax:

  DEF RedCube Cube {
    color 1 0 0
  }
  AnotherRedCube {
    isa USE RedCube
  }

In this case, AnotherRedCube not only inherits the fields of Cube, but also the color value of RedCube. By default, this value is forwarded to/from the parent. This means that if we change the color of AnotherRedCube, that actually changes the color of RedCube.

However, a new node may use the new reserved word COPY before the name of a prototype, thereby creating a copy of all of the fields of the prototype. other). Any field values which are copied using COPY can be changed in the copy independent of the original prototype's value. Thus, a copy is ``dead,'' in that the new copy does not have any further ties to the object it was copied from. A copy, then, may be located on a different machine from its prototype, with contact occurring only during the initial copying.

For example, we could specify that AnotherRedCube is a real copy of RedCube:

  AnotherCube {
    isa COPY USE RedCube
  }

Alternatively, a new node may use COPYALL, indicating that this should be a "deep copy", and all the referenced nodes should be copied as well (e.g., we want to ship an object and its references over the network to a remote machine).

If COPY is not used values are shared, even remotely. Thus, a simple way to provide networked shadows of remote objects is by using isA:

# Remote source
DEF Remote {
    fields { someValue SFBool }
    someValue TRUE
}
# Local source
DEF ShadowOfRemote {
    isA Remote
}

Combined with the use of the Prototypes nodes, this scheme provides some support for distribution of functionality, as the prototypes need not be in the same world as the nodes that inherit from them or copy them. Of course, the too frequent use of this mechanism could result in exceedingly slow worlds. We would recommend that this only be used for cases in which two objects need to be seen in exactly the same way, e.g., a locked door is important but fish in a bowl may not be.

Resolving ambiguity

Since isA allows multiple strings, we must consider the possibility that a node inherits or copies several fields with the same name. We follow the convention that the first occurance of the field is copied as is, with subsequent copies renamed to include the type name at the end of the new field name.

The code field

We suggest one additional optional field designed to allow a node to be an executable thread of control. This field is code of type MFString. The strings in this field may be either a URI (pointing to a possible implementation of the object) or a list of messages (described below).

Messages

A message is given in a very simple syntax. A message is either:

the name of a field of the current node. This indicates that, if the field refers to a node, that node's code should begin executing. As a special case, the keyword SELF refers to the current node. If a field refers to a node, it may be followed by a period and the syntax for a message.
the name of a field, an equals sign, and then a value (such as another field name, a USE of a DEF, or a literal), to indicate assignment.
the new reserved word WAIT followed by a field name. This indicates that execution of this code (and hence of any following messages) should wait until the node refered to by the field finishes its current execution.
A USE followed by the name of a node previously named using DEF, followed by a period, then followed by any of the previous syntaxes.

The following is a fragment of a YACC grammar describing a message. Note that the semicolons are part of the YACC syntax, not part of the message syntax. A series of messages are separated by commas.

AnyMessage : Message 
           | UseMessage;
Message    : Execute 
           | SetValue 
           | Wait;
Execute    : fieldName
           | SELF
           | fieldName . Message;
SetValue   : fieldName = Value;
Wait       : WAIT fieldName;
UseMessage : USE nodeName . Message;
Value      : literal | fieldName | UseClause;

This syntax is quite simple and can easily be added to a parser. It supports a simple form of concurrency. As observed before, concurrency allows extension of behavior while the system is running. In addition, this form of concurrency is asynchronous, allowing for the possibility that the objects receiving messages may be widely distributed.

Setting and getting values are atomic operations (i.e., nothing else can change that value while it is being used), but in general all operations should be considered to happen in parallel. It is extremely important that these operations be atomic even when used with shared distributed objects. There are a several commercially available distributed databases which might be useful to use as a general lock server in such environments.

Note that, while conceptually every node on a single machine may be a thread, it need not be implemented in that way. A simple scheme would involve a round-robin of every node currently executing code, evaluating one message for each node in turn. A WAIT can be treated specially, taking the node waiting out of the round-robin. This provides conceptual concurrency without the use of operating-system threads.

An example: Video initialization

Let's consider a small example, nodes that will evaluate two other nodes. One will evaluate the nodes in sequence, while the other will evaluate them in parallel, then end when both have completed.

# This just takes two nodes and does nothing with them
Sequencer {
   fields [ one SFNode, two SFNode]
}

# Execute one, wait for it to finish, then execute the next
Serial {
   isA Sequencer
   code [ "one, WAIT one, two, WAIT two" ]
}

# Execute two nodes together, and wait for them both to finish
Parallel {
   isA Sequencer
   code [ "one, two, WAIT one, WAIT two" ]
}

Note that in order to specify a blocking operation, it is necessary first to start execution of a node and then block on it.

Using the two nodes we just defined, we can create some interesting behaviors. For example, assume that we are writing a program which connects to a video server, and then plays an audio and a video stream simultaneously.

VideoInitialize { 
  code "http://vrml.org/VideoInitialize.java" 
}
VideoPlay {
  code "http://vrml.org/VideoPlay.java"
}
AudioPlay {
  code "http://vrml.org/AudioPlay.java"
}
TVDisplay {
  fields [ serial SFNode, parallel SFNode ]
  parallel Parallel { }
  serial Serial { }
  code [ serial.one = VideoInitialize,
            parallel.one = VideoPlay,
            parallel.two = AudioPlay,
         serial.two = parallel,
         serial, WAIT serial ]
}

Since the assignment syntax just assigns an initial value to the fields of the sequencing nodes, the previous example is equivalent to the following:

TVDisplay {
  fields [ serial SFNode, parallel SFNode ]
  parallel Parallel { 
     one VideoPlay 
     two AudioPlay
  }
  serial Serial { 
     one VideoInitialize
     two parallel
  }
  code [ serial, WAIT serial ]
}

A convention for multimethods

Many object-oriented languages, such as C++, use single dispatching to determine which function to execute on an object when a method is called. In single dispatching, method lookup proceeds by searching for an appropriate method on the single object, and proceeds to search up the inheritance tree for an appropriate method upon any of its parents. Multimethods are the extension of this technique to multiple dispactching, where several objects together define the appropriate method to invoke. Consider the common case where one is implementing a device-independent graphics library. Graphics hardware is not at all standardized, let alone consistent: some graphics boards support fast sphere rendering, others support triangle strips, some provide for fast texture mapping, etc. In such cases, the code is liberally sprinkled with switch statements and #ifdef's. With multi-methods, we can define a general render method for the generic object/hardware combination. Then, we can use inheritance to simplify the amount of code specifying which method to invoke in which case. Multimethods have been widely used in systems such as CLOS and Cecil,

`Generic`, `Specific`, and `GenericFields`

We propose three node types, Generic, Specific, and GenericFields (a generic function, a method, and its parameters). A Generic represents all possible interactions, with the possible interactions described by many Specific prototypes. A GenericFields node provides fields for parameters. When it executes, it chooses a specific suitable for the actual parameters (with each specific having described what kinds of parameters are suitable for it using fields of its own).

For example, here is a Generic representing rendering, with several specifics representing particular ways to render.

Render {
  isA [ COPY Generic ]

  # This field defines the GenericField node containing the parameters
  genericFields DEF RenderFields GenericFields {
    fields [ viewer SFNode, node SFNode ]
  }

  #One implementation
  GLRenderCube {
        isA [ 
            COPY USE Params
            COPY Specific ]
        generic USE Render
        viewer  USE GLViewer
        node    USE Cube
        code    BuiltinGLRenderCube
  }
  
  #Another Implementation
  GLRenderSphere {
      isA [ 
          COPY USE Params
          COPY Specific ]
      generic USE Render
      viewer  USE GLViewer
      node    USE Sphere
      code    BuiltinGLRenderSphere
  }
}

Time-critical behaviors

The types of time-critical behaviors we would like to support are often classified under "soft real-time" systems. These are systems where the system does not guarantee responsiveness, but merely tries to meet deadlines (e.g., an MPEG decoder which occasionally misses a frame). The LOD node in VRML, under some interpretations, already provides for such time-critical behavior when the system switches to a simpler model in order to increase scene update rates.

As a behavioral example of such a system, consider a physically based model of a pendulum. There are several different algorithms one could use to simulate the motion of a pendulum, ranging from extremely inaccurate but fast (linear extrapolation of velocity), to accurate but slow (fourth-order Runge-Kutta with a small stepsize).

In previous work, we have developed scheduling algorithms which attempt to balance computational demands versus rendering demands across multiple-processor systems in order to attain a constant frame rate in computationally demanding scientific-visualization environments. Such a complex system is not necessary in VRML, however, since such schedulers have high overhead, and since we can use simple reactive scheduling to good effect, as is obvious in SGI's Webspace browser.

Since a Specific is just another nodetype, we can embed several versions of the same algorithm inside an LOD node, as follows:

 
ComputePendulum {
  isA [ COPY SimpleBehavior, COPY Generic ]
  LOD {
    range [1 8]
    DEF ComputePendulumRK4 Prototype {
      isA [ COPY SimpleBehavior, COPY Specific }
      generic USE ComputePendulum
      code "http://www.physics.com/rk4.java"
    }
    DEF ComputePendulumEuler Prototype {
      isA [ COPY SimpleBehavior, COPY Specific }
      generic USE ComputePendulum
      code "http://www.physics.com/euler.java"
    }
  }
}

Note that this assumes a slightly different interpretation of LOD than that traditionally used, but it is one that we would like to encourage. This interpretation is: LOD does not imply a set of representations of an object, to be interchanged based on distance from an object; instead, it defines a set of relative benefit values for alternate representations of an object or algorithm. The browser is responsible for choosing among the set of alternate representations based the actual cost of each representation relative to the benefit of that representation and the current load on the system. This interpretation allows for VRML scenes which can contain fast geometry and behaviors, independent of the speed of the host machine, but possibly sacrificing some accuracy.

Primitives

So far in this paper we have described high-level mechanisms. We also feel that the ability to provide ``primitives'' in an external scripting language will be useful, allowing programmers to code more complex behaviors, such as socket communication and protocols.

API Considerations

Some have assumed that the definition of a scripting language implies that one is opposed to the development of an API for VRML. This is not the case, it just postpones consideration of the proper API. Mitra from Worlds proposes a possible API which provides for the manipulation of the scene graph and individual fields, and the insertion of new VRML code into a scene. Although his approach does not provide for data hiding and abstraction, it provides a useful base to begin from.

Any proposed API should be able to be described both in terms of local invocations (direct function calls) and remote invocations (RPC, OLE, HTTP, etc). We would prefer a specification which respected the object-oriented nature of the scene graph, as well as provided for some ability to specify different levels of security (interface hiding), both on the local machine and remotely.

It is also necessary to provide an additional intermediate, device-independent API for managing such devices as the renderer, audio tools, video cameras, networking, etc. This is in itself a very difficult problem.

Extension languages

One important question to answer is: Why do we need a language at all? After all, there exist a large number of standards for executing (possibly external) code, such as CORBA, OLE, OpenDOC, and even HTTP. If all code could be efficiently executed remotely, or if we could trust the remote server to give us safe code, such protocols would be perfect. However, we would like to be able to download code to execute locally, for efficiency. Because we cannot support every possible language, it makes sense to require a standard extension language.

There are several important criteria for a 3D behavior-specification language. They must have the following:

support on the majority of Internet-based machines
ability to download safe code to execute on a local machine
protocols for invoking remote procedures
time-critical behaviors
support for object-orientation, with flexible inheritance semantics

Unfortunately, there is no existing language which satisfies all of these criteria. Python, Self, Tcl, and Java, among others, individually satisfy many of the requirements, but fall short in others.

Since it is advertised as a ``safe'' language, because it is well-defined and documented, with released source code for Sun Solaris, and because Netscape has recently licensed it for integration into their browser, Java would seem to be a good enough choice for a common language for the World Wide Web. Although we will present the following examples in terms of Java, the API and protocols could be reworked in any other object-oriented language.

Mitra from Worlds describes a process by which an interpreter for a new extension language could be downloaded dynamically. If this interpreter is able to generate code in the platform's native extension language, then it is no longer necessary to choose a single language. Note that this may cause confusion and make it difficult to create new nodes based on existing code, but it may be a viable solution for the polyglot web.

Due to time limitations, we have not attempted to describe how one would implement a new VRML node type in Java. This should happen relatively infrequently, since the language we have described in this paper is quite powerful; however, adding new functionality such as the ability to open sockets, etc., will require developing new libraries, probably in Java.

Example primitives

Using Java, we can define primitive nodes that represent, not geometry, but time-varying behavior. We will make some suggestions for useful kinds of nodes, but keep in mind that different Web sites can define their own.

These nodes use fields to describe interesting values. These values would be more useful if VRML supported the ability to refer to only a particular field in a node, as can be done in Inventor. We suggest adapting that simple syntax (the node, followed by a period, followed by the field name), a syntax we are already making use of in message sends.

Browser

The Browser node represents the capabilities of a generic VRML browser. Some agreement would have to be reached concerning the minimum functionality this object would support, but it seems that it could be expected to support a few fields:

location, an FVec2f, representing the screen-space location of a mouse,
button, an SFBool, TRUE if the mouse button was down, FALSE otherwise, and
cameraPosition, cameraAt, and cameraUp, a series of SFVec3f representing the orientation of the camera.

Other prototypes could then represent special capabilities of other browsers. For example, this provides an alternative mechanism to WebSpace's use of DEF with Info nodes to set special parameters. Thus, a prototype of WebSpace might support fields like backgroundColor and viewer.

Location2

This kind of node has one field, position of SFVec2f, corresponding to the location of a pointer, such as a mouse. Other kinds of nodes can have a similar structure to correspond to other input devices. For example, a Location6 could correspond to a 6 degree-of-freedom input device with two fields, position of SFVec3f and rotation of SFRotation.

VirtualSphere

This kind of node has one field rotation which contains the rotation resulting from a virtual sphere interaction with the user.

Toggle

A Toggle node could have one value field containing either ON or OFF. An additional field could describe what device the field should get its value from, such as MOUSE_BTN_1 or a key on the keyboard.

CurrentSelection

A node implementing the current selection can have two fields, one for the geometry determining the pick (such as a ray or 3D box) and one for the geometry that was picked.

Time

This node has a single field, value, tied to the current wall clock time. This can be used to produce models that are intrinsically animated.

Some Sample Behavior Extensions

In previous sections, we have described how to implement a number of common browser functions with our proposed additions to VRML. Here we provide examples of more complex interactive behavior and describe how one might be able to approach implementing them in this system.

Magic lenses

Magic Lens filters consist of simulated filters which one can superimpose over a scene to hide or show various types of information. For example, a magic lens to show dependencies between objects could be composited with a magic lense which renders all objects semi-transparently, to make it easier to see the lines illustrating the dependencies. These types of information hiding and displaying tools could become very useful in complex, information-laden virtual environments.

Since some of these filters could be purely geometric, while others will be based on the semantics of the scene, generating useful composite behaviors is difficult for all scenes. However, multimethods can be used to determine how to render particular nodes for particular kinds of lenses.

Avatars

Avatars, or 3D objects representing live participants in a shared virtual world, are a common feature of various real or imagined collaborative 3D environments. In simple scenes, these avatars could be implemented through VRML nodes which represent a 3D body with the possible input modalities. In the MERLE system, some participants are fully mobile in the 3D scenes; others, for whom only the head and arms are tracked, are represented as seated around a table.

Based on a combination of the possible input devices and the desired 3D representation of the person, we would keep a live connection (possibly using mechanisms as simple as Java's existing networking code) between the local browser and the remote participant, so that users could experience real-time shared behavior.

Worlds has developed a set of documents on how one might develop shared, multiuser VRML worlds. We are also currently writing a separate white paper on some of the important issues for multi-user worlds and how one might address them.

Collision detection

Collision detection is another computationally demanding problem which is best dealt with on a scene-by-scene basis. This is absolutely necessary for many DOOM-like games, and for collaborative systems such as Worlds Chat. However, most CAD and drawing systems do not incorporate collision detection, and it is not clear that it would be useful when creating complex objects.

This is an ideal situation for multi-methods, where browsers which are capable of computing collision detection could decide to subclass scene-navigation methods and detect collisions with standard objects, or a world creator could create a new class of collidable objects, all of which would register themselves with a collision-detection manager.

Future Work

We need to clarify the fields available for Browser prototypes and Method prototypes. This is not intrinsically difficult but will require a great deal of agreement on browser capabilities. In particular the mechanism for creating a new type of primitive node in Java must be described, and its possibilities explored.

We plan to continue to develop this proposal in participation with the VRML community and create a reference implementation as a testbed.

Acknowledgements

Thanks to the Brown Graphics Group, especially Andries van Dam, John F. Hughes, Robert C. Zeleznik, Jeff White, and Steven C. Dollins. Mark Pesce encouraged the further development of these ideas. Gavin Nichol and Bill Smith served as useful sounding boards during the development of these ideas, and Mitra provided useful comments on the rewriting of this paper.

This work was supported in part by grants from NSF, ARPA, NASA, Taco, Sun, HP, and ONR grant N00014-91-J-4052, ARPA order 8225.

Unlinked Bibliography

These are references which are mentioned in the text but for which we could find no WWW-based equivalent. Please let us know if we have missed an electronic version of one of these.

Conal Elliott, Greg Schechter, Ricky Yeung and Salim Abi-Ezzi. TBAG: A High Level Framework for Interactive, Animated 3D Graphics Applications. In Andrew S. Glassner, editor, Computer Graphics (SIGGRAPH '94 Proceedings), volume 28, pages 421-434, July 1994.

David Zeltzer. Autonomy, Interaction and Presence. Presence: Teleoperators and Virtual Environments. 1(1), January 1992, pp. 127-132.