plateau.io_components.utils module

This module is a collection of helper functions.

class plateau.io_components.utils.InvalidObject[source]

Bases: object

Sentinel to mark keys for removal.

plateau.io_components.utils.align_categories(dfs, categoricals)[source]

Takes a list of dataframes with categorical columns and determines the superset of categories. All specified columns will then be cast to the same pd.CategoricalDtype

Parameters:
  • dfs (List[pd.DataFrame]) – A list of dataframes for which the categoricals should be aligned

  • categoricals (List[str]) – Columns holding categoricals which should be aligned

Returns:

A list with aligned dataframes

Return type:

List[pd.DataFrame]

plateau.io_components.utils.combine_metadata(dataset_metadata: list[dict], append_to_list: bool = True) dict[source]

Merge a list of dictionaries.

The merge is performed in such a way, that only keys which are present in all dictionaries are kept in the final result.

If lists are encountered, the values of the result will be the concatenation of all list values in the order of the supplied dictionary list. This behaviour may be changed by using append_to_list

Parameters:
  • dataset_metadata – The list of dictionaries (usually metadata) to be combined.

  • append_to_list – If True, all values are concatenated. If False, only unique values are kept

plateau.io_components.utils.extract_duplicates(lst)[source]

Return all items of a list that occur more than once.

Parameters:

lst (List[Any])

Returns:

lst

Return type:

List[Any]

plateau.io_components.utils.raise_if_indices_overlap(partition_on, secondary_indices)[source]
plateau.io_components.utils.sort_values_categorical(df: DataFrame, columns: list[str] | str) DataFrame[source]

Sort a dataframe lexicographically by the categories of column column

plateau.io_components.utils.validate_partition_keys(dataset_uuid, store, ds_factory, default_metadata_version, partition_on)[source]