DataShape Types¶
In addition to defining the grammar, datashape specifies
a standard set of types and some properties those types should have.
Type constructors can be classified as dimension
or dtype
, and a
datashape is always composed of zero or more dimensions followed by
a dtype.
Dimension Types¶
Fixed Dimension¶
fixed[4]
A dimension whose size is specified. This is the most common
dimension type used in Blaze, and 4 * int32
is syntactic sugar for
fixed[4] * int32
in datashape syntax.
Var Dimension¶
var
A dimension whose size may be different across instances.
A common use of this is a ragged array like 4 * var * int32
.
Type Variables¶
typevar['DimName']
Constructs a type variable. DimName
is syntactic sugar for
typevar['DimName']
. This is used for pattern matching types,
particularly for function prototypes. For example the
datashape (M * N * int32) -> N * int32
accepts an input
with two dimensions that are type variables, and returns a
one dimensional array using one of those dimension types.
Ellipsis¶
ellipsis
Constructs an ellipsis for matching multiple broadcast dimensions.
...
is syntactic sugar for ellipsis
.
ellipsis['DimVar']
Constructs a named ellipsis for matching multiple broadcast dimensions.
Dim...
is syntactic sugar for ellipsis['Dim']
.
DTypes¶
Boolean Type¶
bool
A boolean type which may take on two values, True
and False
.
In Blaze and DyND, this is stored as a single byte which may take on
the values 1 and 0.
Arbitrary-Precision Integer¶
bignum
or bigint
An integer type which has no minimum or maximum value. This is not implemented in Blaze or DyND presently and the final name for it hasn’t been locked down.
Signed Integer Types¶
int8
int16
int32
int64
int128
Integer types whose behavior follows that of twos-complement integers of the given size.
Unsigned Integer Types¶
uint8
uint16
uint32
uint64
uint128
Integer types whose behavior follows that of unsigned integers of the given size.
Platform-Specific Integer Aliases¶
intptr
uintptr
Aliases for int##
and uint##
where ## is the size of a pointer type on
the platform.
Binary Floating Point¶
float16
float32
float64
float128
Binary floating point types as defined by IEEE 754-2008. Each type
corresponds to the binary##
type defined in the standard.
Note that float128
is not a C/C++ long double
, except on such
platforms where they coincide. NumPy defines a float128
on
some platforms which is not IEEE binary128
, and is thus different
from DataShape’s type of the same name on those platforms.
- TODO: Support for C/C++
long double
. This is tricky given that - DataShape intends to be cross-platform, and maybe some inspiration can be taken from HDF5 for specifying them.
Decimal Floating Point¶
decimal32
decimal64
decimal128
Decimal floating point types as defined by IEEE 754-2008. These are not implemented in Blaze or DyND presently.
Void¶
void
A type which can store no data. It is not intended to be constructed
in concrete arrays, but to allow for things like function prototypes
with void
return type.
String¶
string
A unicode string that can be arbitrarily sized. In Blaze and DyND, this is a UTF-8 encoded string.
string[16]
A unicode string in a UTF-8 fixed-sized buffer. The string is zero-terminated, but as in NumPy, all bytes may be filled with character data so the buffer is not valid as a C-style string.
string['utf16']
A unicode string that can be arbitrarily sized, using the specified
encoding. Valid values for the encoding are 'ascii'
, 'utf8'
,
'utf16'
, 'utf32'
, 'ucs2'
, and 'cp###'
for valid
code pages.
string[16, 'utf16']
A unicode string in a fixed-size buffer of the specified number of bytes, encoded as the requested encoding. The string is zero-terminated, but as in NumPy, all bytes may be filled with character data so the buffer is not valid as a C-style string.
Character¶
char
A value which contains a single unicode code point. Typically stored as a 32-bit integer.
Bytes¶
bytes
An arbitrarily sized blob of bytes. This like bytes
in Python 3.
bytes[16]
A fixed-size blob of bytes. This is not zero-terminated as in the
string
case, it is always exactly the specified number of bytes.
Categorical¶
categorical[['low', 'medium', 'high'], type=string, ordered=True]
Constructs a type whose values are constrained to a particular set.
The type
parameter is optional and is inferred by the first argument.
The ordered
parameter is a boolean indicating whether the values in the
set are ordered, so certain functions like min and max work.
Note
The categorical type assumes that the input categories are unique.
Records¶
struct[['name', 'age', 'height'], [string, int, real]]
Constructs a record type with the given field names and types.
{name: string, age: int}
is syntactic sugar for
struct[['name', 'age'], [string, int]]
.
Tuples¶
tuple[[string, int, real]]
Constructs a tuple type with the given types. (string, int)
is syntactic sugar for tuple[[string, int]]
.
Function Prototype¶
funcproto[[string, int], bool]
Constructs a function prototype with the given argument and return types.
(string, int) -> bool
is syntactic sugar for
funcproto[[string, int], bool]
.
Type Variables¶
typevar['DTypeName']
Constructs a type variable. DTypeName
is syntactic sugar for
typevar['DTypeName']
. This is used for pattern matching types,
particularly for function prototypes. For example the
datashape (T, T) -> T
accepts any types as input, but requires
they have the same types.
Option/Missing Data¶
option[float32]
Constructs a type based on the provided type which may have missing
values. ?float32
is syntactic sugar for option[float32]
.
The type inside the option parameter may also have its own dimensions,
for example ?3 * float32
is syntactic sugar for option[3 * float32]
.
Pointer¶
pointer[target=2 * 3 * int32]
Constructs a type whose value is a pointer to values of the target type.
Maps¶
Represents the type of key-value pairs. This is used for discovering foreign key relationships in relational databases, but is meant to be useful outside of that context as well. For example the type of a column of Python dictionaries whose keys are strings and values are 64-bit integers would be written as:
var * map[string, int64]
Date, Time, and DateTime¶
date
A type which represents a single date in the Gregorian calendar.
In DyND and Blaze, it is represented as a 32-bit signed integer offset
from the date 1970-01-01
.
time
time[tz='UTC']
Represents a time in an abstract day (no time zone), or a day with the specified time zone.
Stored as a 64-bit integer offset from midnight, stored as ticks (100 ns units).
datetime
datetime[tz='UTC']
Represents a moment in time in an abstract time zone if no time zone is provided, otherwise stored as UTC but representing time in the specified time zone.
Stored as a 64-bit signed integer offset from
0001-01-01T00:00:00
in ticks (100 ns units), the “universal
time scale” from the ICU library. Follows the POSIX convention
of ignoring leap seconds.
http://userguide.icu-project.org/datetime/universaltimescale
units['second', int64]
A type which represents a value with the units and type specified. Initially only supporting time units, to support the datetime functionality without adding a special “timedelta” type.
Initial valid units are: ‘100*nanosecond’ (ticks as in the datetime storage), ‘microsecond’, ‘millisecond’, ‘second’, ‘minute’, ‘hour’, ‘day’. Need to decide on valid shortcuts in a context with more physical units, probably by adopting conventions from a good physical units library.
timetz
datetimetz
Represents a time/datetime with the time zone attached to the data. Not implemented in Blaze/DyND.