A high-level type system for the Free Desktops
==============================================
Desktop environments are not just for starting applications anymore.
Data is flowing freely between well-integrated components, and the
easier the data flows, the better the integration of the components.
Not all components are written in the same programming language, of
course, and when letting data flow between them, it needs to be
represented in many different ways. For example, GConf stores values
differently than they travel over D-Bus, which is different again from
how they are passed as GValues to signal handlers, which is different
from how Perl wants to store it.
The desktop environment is heading towards a cooperative, dynamic
environment, and it needs a rich and strong type system to tie its
components together. Sending lines of text over pipes and matching
them against ad-hoc regular expressions just doesn't cut it.
In an attempt to define such a common type system, this document
collects many different systems for representing values, and unifies
them by mapping the common dynamic type system into them.
The common type system defined here is rich enough to represent any
reasonable value; it's roughly equivalent to what dynamic languages
like Perl and Python have.
But it goes one crucial step further: it allows the definition of new
abstract, intentional types. Intentional types give additional
information about a value that is not available from the
representation alone.
For example, a integer can be used to denote a point in time by saying
that it is the number of seconds since a certain epoch. All places
that interact with such a value need to agree on this intention.
This agreement can happen informally, via documentation or just plain
common sense. Nothing wrong with that. It is, however, also helpful
to formalize this so that documentation can be precise without much
extra effort, up to a level where the machine itself is able to check
whether everybody agrees on the intentional types.
The age old battle between static and dynamic types also matters here:
how much type information should be associated with the values
themselves? The boundary is exactly between intentional and
representational types. Intentional types are those that only the
programmer or compiler know about, representational types are those
that are only known at run-time.
In a completely statically typed language, we only have raw bytes at
run-time without any representational type information. All parts of
the program need to agree that the intention is for those four bytes
over there to be used as a 32-bit integer. Statically typed programs
are littered with declarations of intentional types, and language
processors use them to (more or less) check program consistency and to
select the right division instruction based on whether the four bytes
over there are intended to be a signed number or an unsigned one.
In a dynamically typed language, values carry a lot of
representational type information. Code can easily be polymorphic and
do different things depending on whether a value is an integer or a
string. It can also perform consistency checks at run-time, which is
more robust than doing it at compile time, but doesn't go as far since
intentional types are not available.
Dynamic languages often don't have any means to declare intentional
types for the benefit of the compiler; they only exist in the head of
the programmer who expresses them by selecting the right operation
manually. For example, if a string is intended to be a URL, you need
to use 'utils.net.web.url.get_scheme (url)' explicitly. If the
intentional type could have been declared in the language, it could
have selected the right function automatically from just 'url.scheme()'.
Thus, and coming back to the ground now, we define a concrete type
system here with a rich representational part and a lightweight and
flexible intentional part.
For the representational part, we define how it is implemented for a
number of existing value systems. For the intentional part, we define
how it can be translated into a number of languages, both those with
static type declaration and those where intent is mainly expressed by
manually selecting the right operations.
Intentional types are not optional; they are generally needed to make
sense of values. A programmer learns about them by reading
documentation; if a debugging tool needs to find out a intentional
type at run-time, there must be some way to find it.
This means that declaration languages like D-Bus introspection
facilities and GConf schemas need to be extended to support our
intentional types. Thus, purely declarative languages like these are
also included in our list of supported languages.
[C]
----
/* Witty example here. */
----
We also give a list of common intentional types, of course.
This document then has three dimensions of extensibility:
- A new value system can be added by defining how the representational
part of the common type system maps to it.
- A new language can be added by defining how intentional types are
implemented in it, and by implementing all common intentional types.
- A new common intentional type can be added by defining it and
implementing it in all languages.
The representational part of the common type system is not supposed to
change frequently, but adding a new intentional type should be
considered routine.
The representation part of the common type system is restricted by the
lowest common denominator of all the value system implementations that
we want to include. We don't want to distort the existing value
systems too much, and force people to write code that feels unnatural
for them.
For example, not all value systems can directly represent complex
numbers or multiple precision integers, but any grown up type system
should include them. We solve this conflict by relying on the
intentional types: Instead of grafting complex numbers onto every
value system, we only define a intentional type for them.
Currently supported value systems: QVariant, D-Bus messages, GValue,
GConfValue, GVariant, Python values, Perl values, JavaScript values,
GKeyFile, JSON, YAML, Nepomuk ontologies, SQL, SparQL, Common Lisp
values.
Currently supported languages: Python, Perl, JavaScript, Java, C#, C++
with QVariant, plain C++, C with D-Bus/GValue/GConfValue/GVariant,
plain C, Vala, D-Bus introspection, D-Bus IDL (didl), GConf schema,
our own XML schema.
Representational types
----------------------
Converting a value from one representation to another is not
guaranteed to be loss-less: if you convert the value back, it might be
different and even have a different type. Intentional types are used
to make sense of the value anyway. [ XXX - so maybe we shouldn't
bother with representational types at all... ]
Whenever there is a choice of representation in the following table,
it should be taken to mean: Represent the value with the first
alternative in the list that is possible, even if that loses
precision.
For example, a 64 bit signed integer is represented in GConf as a
"int" if it fits, and as a "double" if not. It will always fit into a
double, but it might mean chopping off some low bits.
What we are defining then is nothing more than the rules for
converting values between different representations.
- null
+
The null value.
+
QVariant: QVariant::Null
D-Bus: '()'
GValue: G_TYPE_NONE
GConf: empty GCONF_VALUE_LIST with type GCONF_VALUE_BOOL
GVariant: '()'
Perl: undef
Python 2: None
CL: nil
- bool
+
A boolean
+
QVariant: QVariant::Bool
D-Bus: 'b'
GValue: G_TYPE_BOOLEAN
GConf: GCONF_VALUE_BOOL
GVariant: 'b'
Perl: number, 0 or 1.
Python 2: number, 0 or 1.
CL: nil or t
- int32
+
Signed 32 bit integer
+
QVariant: QVariant::Int
D-Bus: 'i'
GValue: G_TYPE_INT
GConf: GCONF_VALUE_INT
GVariant: 'i'
Perl: number
Python 2: int
CL: number
- int64
+
Signed 64 bit integer
+
QVariant: QVariant::LongLong
D-Bus: 'x'
GValue: G_TYPE_INT64
GConf: GCONF_VALUE_INT or GCONF_VALUE_DOUBLE
GVariant: 'x'
Perl: number
Python 2: int or long
CL: number
- uint32
+
Unsigned 32 bit integer
+
QVariant: QVariant::UInt
D-Bus: 'u'
GValue: G_TYPE_UINT
GConf: GCONF_VALUE_INT or GCONF_VALUE_DOUBLE
GVariant: 'u'
Perl: number
Python 2: int or long
CL: number
- uint64
+
Unsigned 64 bit integer
+
QVariant: QVariant::ULongLong
D-Bus: 't'
GValue: G_TYPE_UINT64
GConf: GCONF_VALUE_INT or GCONF_VALUE_DOUBLE
GVariant: 't'
Perl: number
Python 2: int or long
CL: number
- double
+
Double precision floating point number
+
QVariant: QVariant::Double
D-Bus: 'd'
GValue: G_TYPE_DOUBLE
GConf: GCONF_VALUE_DOUBLE
GVariant: 'd'
Perl: number
Python 2: double
CL: number
- string
+
String of Unicode code points
+
QVariant: QVariant::QString
D-Bus: 's'
GValue: G_TYPE_STRING
GConf: GCONF_VALUE_STRING, UTF-8.
GVariant: 's'
Perl: string
Python 2: unicode
CL: string
- list
+
List of values
+
QVariant: QVariant::List
D-Bus: 'av'
GValue: G_TYPE_POINTER pointing to a GSList of GValues.
(XXX - find something better, must be somewhere.)
GConf: GCONF_VALUE_LIST or chained GCONF_VALUE_PAIRs
GVariant: 'av'
Perl: array
Python 2: list
CL: list
- map
+
Mapping from strings to values, with no duplicated keys.
+
QVariant: QVariant::Map
D-Bus: 'a{sv}'
GValue: G_TYPE_HASH_TABLE (?)
GConf: Chain of GCONF_VALUE_PAIRs,
with the cars being a pair of GCONF_VALUE_STRING and an
arbitrary value.
GVariant: 'a{sv}'
Perl: hash
Python: dict
CL: alist
Association trees
-----------------
A map (or dictionary, or hashtable) is a useful data structure, and by
nesting them you can express a lot of interesting things. Two things
are missing however: maps do not preserve the order of their entries,
and they don't allow duplicate keys. These two things are often
necessary.
So, we like to use lists insteads of maps and define a couple of
conventions how to express associations between things with them. The
resulting values are called "association trees".
An association tree always has a name (which is always a string), and
has either a single value, or a list of subordinate association trees.
An association tree is represented as a list. The first element is
the name. If the association tree is of the first form, then the list
always has a second element, which is the value. In the second form,
the remaining elements of the list after the name are the subordinate
association trees.
A couple of example will hopefully show that this is all very simple:
- The first form, essentially a key/value pair:
[ 'foo', 12 ]
- The second form, with two sub-trees:
[ 'foo', [ 'bar', 42 ], [ 'baz', 'quux' ]]
The second example can be interpreted as mapping the "key path"
foo/bar to 12, and the path foo/baz to 'quux'.
As you can see, the two forms of a assocation tree can not be reliably
distinguished from each other just by looking at them. This tree
[ 'foo', [ 'bar', 12 ]]
can be interpreted both as mapping foo to the value [ 'bar', 12 ], and
as mapping foo/bar to 12. You have to know what is intended and use
the right interpretation.
As a special case, a association tree with an empty list of sub-trees
can be expressed just with a string, which is the name of the tree.
Thus, the following two lines are equivalent:
[ 'foo ' ]
'foo'
Language bindings for association trees usually offer the following
operations:
name (tree): returns the name of the tree (which is either the first
element of tree if it is a list, or tree itself if
it isn't)
value (tree): returns the value of the tree (which is the second
element of tree, which must be a list)
node (tree, name): returns the first sub-tree with the given name of tree.
Usually, convenience functions are provided as well that make
accessing values and nodes at the end of a key path less verbose:
value (tree, name1) == value (node (tree, name1))
value (tree, name1, name2) == value (node (node (tree, name1), name2))
..etc..
node (tree, name1, name2) == node (node (tree, name1), name2))
..etc..
Of course, association trees are one of the pre-defined intentional
types.
Association trees are very useful. This Desktop Types system uses
them for type definitions, for example, and also for type references.
A Nano-DOM
----------
Association trees (see above) can be used to represent a subset of
XML. This is useful when the small subset suffices but you still want
to be enterprise ready. Intentional type definitions use this subset,
for example, and are thus easily handled at run-time.
Association trees tht are used to represent XML are called Nano-DOM
here, since they can fulfill the role of a document object model.
Converting a piece of XML into its Nano-DOM representation proceeds
according to simple rules:
- First, all attributes of elements are converted to child elements,
in order and at the beginning. Thus, the following XML fragments
are equivalent:
/doc (string) The documentation for parameter
. - type/parms/
/type (type) The type for parameter
.
XXX - this doesn't really work since XML can only express association trees.
As an example, consider a hypothetical XML schema for describing
key-value pairs. Let's also assume that this schema follows our
Nano-DOM rules. It has a "key" element which needs name, doc and type
attributes. The "type" attribute should refer to an intentional type
of course. We can describe a key for the current temperature,
expressed as one of "low", "medium", "high", in the following ways.
First, we can refer to the predefined "three-level-enum" type, if
there would be such a type. Documentation of the possible values is
left to the definition of "three-level-enum", which presumably would
tell us that they are the strings "low", "medium", and "high".