LCM
LCM Type Specification Language

The usage and features of the LCM type language.

Introduction

In addition to providing a set of communications primitives, LCM includes utilities for generating platform-independent marshalling and unmarshalling functions for user-defined data types. It is similar to XDR, though it is written with greater type safety in mind, and with the goal of first-class support for a variety of languages including C, Java, and Python. This document describes the data marshalling facility; the communications facility is described elsewhere. Note that it is possible to use the data marshalling features of LCM independently of LCM's communication facilities.

Design Goals

The primary design goals of the LCM marshalling facility are:

The current version of LCM achieves these goals with only a few compromises. In some cases, a least-common-denominator approach was used to ensure that all platforms supported the features provided by LCM.

Type Specifications

Type specifications are contained in files with an ".lcm" file type. They are conventionally named in lower case with underscores between words: e.g., the type "wind_speed_t" is defined in the file "wind_speed_t.lcm". The utility lcm-gen converts an LCM type specification into a language-dependent implementation.

Structs

LCM structs are compound types consisting of other types. We begin with a simple struct named "temperature_t" that contains a 64 bit integer named "utime" and a 64 bit floating point number named "degCelsius". Two types of comments are also illustrated.

1 struct temperature_t
2 {
3  int64_t utime; // Timestamp, in microseconds
4 
5  /* Temperature in degrees Celsius. A "float" would probably
6  * be good enough, unless we're measuring temperatures during
7  * the big bang. Note that the asterisk on the beginning of this
8  * line is not syntactically necessary, it's just pretty.
9  */
10  double degCelsius;
11 }

This declaration must appear in a file named temperature_t.lcm.

LCM types do not contain pointers (but arrays are supported, see below): this eliminates the possibility of circular references.

Before we go further, let's take a look at the various primitive types available.

Primitive Types

LCM supports a number of primitive types:

typeDescription
int8_t8-bit signed integer
int16_t16-bit signed integer
int32_t32-bit signed integer
int64_t64-bit signed integer
float32-bit IEEE floating point value
double64-bit IEEE floating point value
stringUTF-8 string
booleantrue/false logical value
byte8-bit value

The integer types are all signed (as is necessary to ensure easy inter-operation with Java, which lacks unsigned types) and are encoded in network byte order.

The type byte is encoded the same way as int8_t. However, there are instances in which buffers of data (such as a JPEG image) must be sent across the network. These buffers of data must be decoded by some additional software. The type byte can be used by a type designer to help convey the notion that the data is opaque, and should not be interpreted (literally) as an array of 8 bit integers.

Floating point types are encoded using the IEEE 32 and 64 bit formats. An LCM implementation may not use any other encoding. The 32 and 64 bit quantities are transmitted in network byte order.

The boolean type is encoded as a single byte whose value is either 0 or 1. An array of N booleans will require N bytes.

The string type encodes a NULL-terminated UTF-8 string. The string is sent as a 32 bit integer comprising the total length of string in bytes (including terminating NULL character) followed by the bytes of the string (again, including the NULL character).

Arrays

LCM supports multi-dimensional arrays consisting of primitives, structs, or constant declarations. The number of dimensions in the array are declared by the LCM type declaration: you cannot encode an LCM type that consists of a variable-dimension array. In contrast, variable-sized arrays are fine. Consider this example:

1 struct point2d_list_t
2 {
3  int32_t npoints;
4  double points[npoints][2];
5 }

This example shows a two-dimensional array declaration consisting of both variable-length and fixed-length components. In a variable-length declaration, the variable that contains the length must be declared prior to its use as an array length. Also note that the length variable (npoints, in the example above) must be an integer type, and must always have a value greater than or equal to zero.

When arrays are encoded and decoded, each dimension's size is already known: it is either a constant (given by the LCM type declaration), or it was a previously encoded/decoded variable. Thus, an array is encoded simply by recursively encoding each element of the array, with inner-most dimensions being encoded together. In other words, the array above would be encoded in the order points[0][0], points[0][1], points[1][0], points[1][1], points[2][0], points[2][1], etc.

Constants

LCM provides a simple way of declaring constants that can subsequently be used to populate other data fields. Users are free to use these constants in any way they choose: as magic numbers, enumerations, or bitfields.

Constants can be declared by using the const keyword.

1 struct my_constants_t
2 {
3  const int32_t YELLOW=1, GOLDENROD=2, CANARY=3;
4  const double E=2.8718;
5 }

Note that types must be declared for constants. All integer and floating point types are supported. String constants are not supported.

Namespaces

LCM allows types to be defined in a namespace, making it easier for users to use types from other organizations even if those types have the same name. The namespace mechanism is closely modeled after that of Java. In languages that support namespaces (such as Java and Python), the LCM namespace mechanism is mapped onto the native mechanism. In languages like C, namespaces are approximated by prepending the package name to the type name. See below for an example of namespaces. Note that the package keyword identifies the namespace of the structs defined in that file, and that fully-qualified types are formed by concatenating the package and type name, with a period between.

1 package mycorp;
2 
3 struct camera_image_t {
4  int64_t utime;
5  string camera_name;
6  jpeg.image_t jpeg_image;
7  mit.pose_t pose;
8 }

LCM users are encouraged to put their types into a unique namespace and to fully-qualify the types of all the member fields.

Performance Considerations

The runtime costs of encoding and decoding with LCM are generally not a system bottleneck. The marshalling functions are dramatically faster than an XML implementation, but since each member must be individually processed (in order to ensure correct byte ordering, for example), LCM is more expensive than using raw C structs. That said, LCM's first application used over 40MB/s.

Fingerprint Computation

Fingerprints ensure that the encoding and decoding methods agree on the format of a data type. The fingerprints are a function, recursively, of all of the types that a type contains. This creates a potential problem when types could be mutually recursive: we must avoid an infinite recursion.

The basic idea is for each type to have a "base" fingerprint, which we'll denote for type "A" as "K_A". K_A is a constant derived from the lcm type description (and it's stored as lcm_struct->hash). We wish to compute the actual fingerprint (or hash), A(), which is a function of all of A's contained types.

In addition, so that we can recognize a recursion, the A() function takes an argument, which is a list of the types already visited. E.g., C([A,B]) indicates that we wish to compute the hash of type C, given that C is a member of type B, which is a member of type A. We avoid recursions by setting C([list]) = 0 if [list] contains C.

The contribution of primitive types is handled via the K_A; there is no recursion for them.

A small wrinkle arises from the above definitions: if types A, B, and C are mutually recursive, we can have two types with the same hash. This is clearly undesirable. We fix this by making the order of recursion relevant: at each node in the tree, we rotate the value (bitwise) 1 bit to the left. A type that is included at recursion depth N has its contribution rotated by N bits.

Note that this mechanism is entirely unnecessary for enumerations (they cannot contain other types); for enumerations, we just use the hash in lcmenum->hash.

PSEUDO-CODE

1 v = compute_hash(type, parents)
2 
3 if type is member of parents
4  return 0
5 
6 v = K_type;
7 
8 for each members m of type
9  v += compute_hash(m, [parents, type])
10 
11 return rot_left(v);

When encoding/decode a type T, we would use compute_hash(T, []) as the hash function.

EXAMPLE

1 struct A
2 {
3  B b;
4  C c;
5 }
6 
7 struct B
8 {
9  A a;
10 }
11 
12 struct C
13 {
14  B b;
15 }

Diagrammatically, we can compute their hashes by showing the children of each branch. We use lower case to indicate a terminal leaf (where the leaf is the same class as one of its parents).

         A                B                  C
       /   \              |                  |
      B     C             A                  B
      |     |            / \                 |
      a     B           b   C                A
            |               |               / \
            a               b              b   c

A() = R{K_A + R{K_B}} + R{K_C + R{K_B}}}

B() = R{K_B + R{K_A + R{K_C}}}

C() = R{K_C + R{K_B + R{K_A}}}

Note that without the rotations, B() == C().

Related Work

LCM is most similar to XDR, which is used in RPC and is described by RFC4506. Both use a C-like syntax (and even C keywords like "struct"). LCM differs in that its language is smaller: rarely-used features like unions are not supported. LCM does not support pointers: this eliminates the pointer-chasing problems that can arise in XDR. Variable-length arrays are supported in a more natural way in LCM, and LCM includes a type "signature" in the encoded data. This type signature allows run-time error detection.

Data encoding representations are often compared to XML. XML and LCM serve very different functions. The verbosity and generic structure of XML are aids for agents to use information that they understand while safely skipping over properties that are alien to them. In contrast, LCM is designed for agents that are tightly coupled but that may not be in the same memory space. A more rigid type definition, along with space-efficient and computationally-efficient encodings, are better fits for these types of applications.

Development History

LCM's marshalling facilities were created for use on MIT's DARPA Urban Challenge vehicle, with development starting during the summer of 2006. Early versions supported many features that have since been deprecated: reducing the number of extraneous features has simplified the code base significantly, since most features typically impact several language back-ends (currently C, Java, and Python).