Monday, February 01, 2010

Google AppEngine and BigTable

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. 

  • A Bigtable is a sparse, distributed, persistent multidimensional sorted map.
  • The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
  • Unlike most map implementations, in BigTable the key/value pairs are kept in strict alphabetical order. That is to say that the row for the key "aaa" should be right next to the row with key "aab" and very far from the row with key "zzz".
  • BigTable are built upon distributed filesystems so that the underlying file storage can be spread out among an array of independent machines.
Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.



The AppEngine Datastore is
  • Transactional
  • Natively Partitioned
  • Hierarchical
  • Schema-less
  • Based on Bigtable
  • Not a relational database
  • Not a SQL Engine

The Basic unit of Datastore Storage Modal consists of the following
  1. Kind (table or more correctly Class)
  2. Key (Primary Key)
  3. Entity Group (Partition)
  4. 0...N typed (name value) properties (similar to columns in relational table)