Skip to main content

Data formats

Hierarchical Data Format

Hierarchical Data Format(HDF) is a set of file formats (HDF4,**HDF5) designed to store and organize large amounts of data

HDF5 simplifies the file structure to include only two major types of object:

  • Datasets, which are multidimensional arrays of a homogeneous type
  • Groups, which are container structures which can hold datasets and other groups

In addition to these advances in the file format, HDF5 includes an improved type system, and dataspace objects which represent selections over dataset regions. The API is also object-oriented with respect to datasets, groups, attributes, types, dataspaces and property lists.

Because it uses B-trees to index table objects, HDF5 works well for time series data such as stock price series, network monitoring data, and 3D meteorological data. The bulk of the data goes into straightforward arrays (the table objects) that can be accessed much more quickly than the rows of an SQL database, but B-tree access is available for non-array data. The HDF5 data storage mechanism can be simpler and faster than an SQL star schema.

https://en.wikipedia.org/wiki/Hierarchical_Data_Format

Cap'n Proto

Cap'n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster.

Cap'n Proto gets a perfect score becausethere is no encoding/decoding step. The Cap'n Proto encoding is appropriate both as a data interchange format and an in-memory representation, so once your structure is built, you can simply write the bytes straight out to disk!

image

https://capnproto.org

https://github.com/capnproto/capnproto

Apache Thrift

Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous languages. It forms a remote procedure call(RPC) framework and was developed at Facebook for "scalable cross-language services development". It combines a software stack with a code generation engine to build cross-platform services which can connect applications written in a variety of languages and frameworks, including ActionScript, C, C++, C#, Cappuccino, Cocoa, Delphi, Erlang, Go, Haskell, Java, Node.js, Objective-C, OCaml, Perl, PHP, Python, Ruby and Smalltalk

https://en.wikipedia.org/wiki/Apache_Thrift