# LCM Type Specification Language The usage and features of the LCM type language. ## Introduction In addition to providing a set of communications primitives, LCM includes utilities for generating platform-independent marshalling and unmarshalling functions for user-defined data types. It is similar to XDR, though it is written with greater type safety in mind, and with the goal of first-class support for a variety of languages including C, Java, and Python. This document describes the data marshalling facility; the communications facility is described elsewhere. Note that it is possible to use the data marshalling features of LCM independently of LCM's communication facilities. ### Design Goals The primary design goals of the LCM marshalling facility are: - Provide a simple mechanism to define complex types that would be immediately comfortable to users of C and Java - Provide first class support for a wide variety of client languages - Abstract away platform-specific details such as byte ordering - Maximize the amount of compile-time and run-time type safety - Be able to detect message type incompatibilities, such as when two applications have different versions of the same datatype - Produce space-efficient encoded messages - Minimize encoding and decoding computational costs The current version of LCM achieves these goals with only a few compromises. In some cases, a least-common-denominator approach was used to ensure that all platforms supported the features provided by LCM. ## Type Specifications Type specifications are contained in files with an ".lcm" file type. They are conventionally named in lower case with underscores between words: e.g., the type "wind_speed_t" is defined in the file "wind_speed_t.lcm". The utility lcm-gen converts an LCM type specification into a language-dependent implementation. ### Structs LCM structs are compound types consisting of other types. We begin with a simple struct named "temperature_t" that contains a 64 bit integer named "utime" and a 64 bit floating point number named "degCelsius". Two types of comments are also illustrated. ``` C struct temperature_t { int64_t utime; // Timestamp, in microseconds /* Temperature in degrees Celsius. A "float" would probably * be good enough, unless we're measuring temperatures during * the big bang. Note that the asterisk on the beginning of this * line is not syntactically necessary, it's just pretty. */ double degCelsius; } ``` This declaration must appear in a file named temperature_t.lcm. LCM types do not contain pointers (but arrays are supported, see below): this eliminates the possibility of circular references. Before we go further, let's take a look at the various primitive types available. ### Primitive Types LCM supports a number of primitive types: | type | Description | | ---- | ----------- | | int8_t | 8-bit signed integer | | int16_t | 16-bit signed integer | | int32_t | 32-bit signed integer | | int64_t | 64-bit signed integer | | float | 32-bit IEEE floating point value | | double | 64-bit IEEE floating point value | | string | UTF-8 string | | boolean | true/false logical value | | byte | 8-bit value | The integer types are all signed (as is necessary to ensure easy inter-operation with Java, which lacks unsigned types) and are encoded in network byte order. The type `byte` is represented in C/C++ as `uint8_t`. Languages with a native `byte` representation use their respective native byte representations (e.g., type `byte` in Java). Floating point types are encoded using the IEEE 32 and 64 bit formats. An LCM implementation may not use any other encoding. The 32 and 64 bit quantities are transmitted in network byte order. The `boolean` type is encoded as a single byte whose value is either 0 or 1. An array of N booleans will require N bytes. The `string` type encodes a NULL-terminated UTF-8 string. The string is sent as a 32 bit integer comprising the total length of string in bytes (including terminating NULL character) followed by the bytes of the string (again, including the NULL character). ### Arrays LCM supports multi-dimensional arrays consisting of primitives, structs, or constant declarations. The number of dimensions in the array are declared by the LCM type declaration: you cannot encode an LCM type that consists of a variable-dimension array. In contrast, variable-sized arrays are fine. Consider this example: ``` C struct point2d_list_t { int32_t npoints; double points[npoints][2]; } ``` This example shows a two-dimensional array declaration consisting of both variable-length and fixed-length components. In a variable-length declaration, the variable that contains the length must be declared prior to its use as an array length. Also note that the length variable (npoints, in the example above) must be an integer type, and must always have a value greater than or equal to zero. When arrays are encoded and decoded, each dimension's size is already known: it is either a constant (given by the LCM type declaration), or it was a previously encoded/decoded variable. Thus, an array is encoded simply by recursively encoding each element of the array, with inner-most dimensions being encoded together. In other words, the array above would be encoded in the order points[0][0], points[0][1], points[1][0], points[1][1], points[2][0], points[2][1], etc. ### Constants LCM provides a simple way of declaring constants that can subsequently be used to populate other data fields. Users are free to use these constants in any way they choose: as magic numbers, enumerations, or bitfields. Constants can be declared by using the const keyword. ``` C struct my_constants_t { const int32_t YELLOW=1, GOLDENROD=2, CANARY=3; const double E=2.8718; } ``` Note that types must be declared for constants. All integer and floating point types are supported. String constants are not supported. ## Namespaces LCM allows types to be defined in a namespace, making it easier for users to use types from other organizations even if those types have the same name. The namespace mechanism is closely modeled after that of Java. In languages that support namespaces (such as Java and Python), the LCM namespace mechanism is mapped onto the native mechanism. In languages like C, namespaces are approximated by prepending the package name to the type name. See below for an example of namespaces. Note that the package keyword identifies the namespace of the structs defined in that file, and that fully-qualified types are formed by concatenating the package and type name, with a period between. ``` C package mycorp; struct camera_image_t { int64_t utime; string camera_name; jpeg.image_t jpeg_image; mit.pose_t pose; } ``` LCM users are encouraged to put their types into a unique namespace and to fully-qualify the types of all the member fields. ## Performance Considerations The runtime costs of encoding and decoding with LCM are generally not a system bottleneck. The marshalling functions are dramatically faster than an XML implementation, but since each member must be individually processed (in order to ensure correct byte ordering, for example), LCM is more expensive than using raw C structs. That said, LCM's first application used over 40MB/s. ## Fingerprint Computation Fingerprints ensure that the encoding and decoding methods agree on the format of a data type. The fingerprints are a function, recursively, of all of the types that a type contains. This creates a potential problem when types could be mutually recursive: we must avoid an infinite recursion. The basic idea is for each type to have a "base" fingerprint, which we'll denote for type "A" as "K_A". K_A is a constant derived from the lcm type description (and it's stored as lcm_struct->hash). We wish to compute the actual fingerprint (or hash), A(), which is a function of all of A's contained types. In addition, so that we can recognize a recursion, the A() function takes an argument, which is a list of the types already visited. E.g., C([A,B]) indicates that we wish to compute the hash of type C, given that C is a member of type B, which is a member of type A. We avoid recursions by setting C([list]) = 0 if [list] contains C. The contribution of primitive types is handled via the K_A; there is no recursion for them. A small wrinkle arises from the above definitions: if types A, B, and C are mutually recursive, we can have two types with the same hash. This is clearly undesirable. We fix this by making the order of recursion relevant: at each node in the tree, we rotate the value (bitwise) 1 bit to the left. A type that is included at recursion depth N has its contribution rotated by N bits. Note that this mechanism is entirely unnecessary for enumerations (they cannot contain other types); for enumerations, we just use the hash in lcmenum->hash. PSEUDO-CODE ``` v = compute_hash(type, parents) if type is member of parents return 0 v = K_type; for each members m of type v += compute_hash(m, [parents, type]) return rot_left(v); ``` When encoding/decode a type T, we would use compute_hash(T, []) as the hash function. EXAMPLE ``` C struct A { B b; C c; } struct B { A a; } struct C { B b; } ``` Diagrammatically, we can compute their hashes by showing the children of each branch. We use lower case to indicate a terminal leaf (where the leaf is the same class as one of its parents). ``` A B C / \ | | B C A B | | / \ | a B b C A | | / \ a b b c A() = R{K_A + R{K_B}} + R{K_C + R{K_B}}} B() = R{K_B + R{K_A + R{K_C}}} C() = R{K_C + R{K_B + R{K_A}}} ``` Note that without the rotations, B() == C(). ### Implementation For computing the fingerprint, the algorithm is simply: * Compute the **base hash** for the struct * Includes the fieldname in the base hash. * If the type is a primitive, include its typename in the base hash computation. * If the type is *not* a primitive, do not include in the base hash computation (as noted above). It will be included in the recursive computation. * Recursively compute the **fingerprint** for the struct. * This will recurse into non-primitive struct types, using their **base hash**. The following are pinned reference implementations for C on the parsing side, generating local struct hashes (non-recursive) as well as for the C binding type (where computation is then done recursively): * * Of course, this should reflect implementations for other languages. ### Related Work LCM is most similar to XDR, which is used in RPC and is described by RFC4506. Both use a C-like syntax (and even C keywords like "struct"). LCM differs in that its language is smaller: rarely-used features like unions are not supported. LCM does not support pointers: this eliminates the pointer-chasing problems that can arise in XDR. Variable-length arrays are supported in a more natural way in LCM, and LCM includes a type "signature" in the encoded data. This type signature allows run-time error detection. Data encoding representations are often compared to XML. XML and LCM serve very different functions. The verbosity and generic structure of XML are aids for agents to use information that they understand while safely skipping over properties that are alien to them. In contrast, LCM is designed for agents that are tightly coupled but that may not be in the same memory space. A more rigid type definition, along with space-efficient and computationally-efficient encodings, are better fits for these types of applications. ### Development History LCM's marshalling facilities were created for use on MIT's DARPA Urban Challenge vehicle, with development starting during the summer of 2006. Early versions supported many features that have since been deprecated: reducing the number of extraneous features has simplified the code base significantly, since most features typically impact several language back-ends (currently C, Java, and Python).