.. _storage_layout: ============== Storage Layout ============== plateau structures your data using these concepts: - One whole unit of data that plateau manages is called a *dataset*. - A dataset consists of one or more *tables* that each have a *schema*. - Table rows are partitioned by any number of columns: Rows having the same combination of values in these columns are grouped together. - A partition consists of one or more Parquet files, which contain a chunk of rows that were written at a time. - plateau can also generate an index for any number of columns, which speeds up finding the relevant Parquet files for specific values for the indexed column. A general plateau storage layout thus looks as follows:: ─ .by-dataset-metadata.json ─ / ├── / │   ├── _common_metadata │   ├── =value/ │ │ ├── =value/ │ │ │ ├ ... │ │ │ ├── =value/ │ │ │      │ ├── df1.parquet │ │ │      │ ├── df2.parquet │ │ │     │   └── ... │ │ │ ├── =value/ │ │ │      │ ├── df1.parquet │ │ │      │ ├── df2.parquet │ │ │     │   └── ... │ │ │ └── ... │ │ ├── =value/ │ │ │ ├ ... │ │ │ ├── =value/ │ │ │      │ ├── df1.parquet │ │ │      │ ├── df2.parquet │ │ │     │   └── ... │ │ │ ├── =value/ │ │ │      │ ├── df1.parquet │ │ │      │ ├── df2.parquet │ │ │     │   └── ... │ │ │ └── ... │ │ └── =value/ ... │ ├── =value/ ... │ └── =value/ ... ├── / ... ├── / ... └── indices/    ├── /    │ └── .by-dataset-index.parquet    ├── / ... └── / ... Where: - ``.by-dataset-metadata.json`` contains the ``DatasetMetadata`` you have seen above. - ```` contains the data for any tables in the dataset, partitioned by N >= 0 columns. The directory structure will be N folders deep. - ``_common_metadata`` contains the table schema of ``dfN.parquet``. It is always identical for all Parquet files of a table. - ``indices`` contains a database index for each index column, for quick lookup of rows where the column value matches a given value.