disdrodb.l0 package

Subpackages

Submodules

disdrodb.l0.check_configs module

class disdrodb.l0.check_configs.NetcdfEncodingSchema(*, contiguous: bool, dtype: str, zlib: bool, complevel: int, shuffle: bool, fletcher32: bool, chunksizes: Optional[Union[int, List[int]]] = None)[source]

Bases: BaseModel

classmethod check_chunksizes(v, values)[source]
classmethod check_fletcher32(v, values)[source]
classmethod check_zlib(v, values)[source]
chunksizes: Optional[Union[int, List[int]]]
complevel: int
contiguous: bool
dtype: str
fletcher32: bool
shuffle: bool
zlib: bool
class disdrodb.l0.check_configs.RawDataFormatSchema(*, n_digits: Optional[int] = None, n_characters: Optional[int] = None, n_decimals: Optional[int] = None, n_naturals: Optional[int] = None, data_range: Optional[List[float]] = None, nan_flags: Optional[str] = None, valid_values: Optional[List[float]] = None, dimension_order: Optional[List[str]] = None, n_values: Optional[int] = None)[source]

Bases: BaseModel

classmethod check_list_length(value)[source]
data_range: Optional[List[float]]
dimension_order: Optional[List[str]]
n_characters: Optional[int]
n_decimals: Optional[int]
n_digits: Optional[int]
n_naturals: Optional[int]
n_values: Optional[int]
nan_flags: Optional[str]
valid_values: Optional[List[float]]
exception disdrodb.l0.check_configs.SchemaValidationException[source]

Bases: Exception

Exception raised when schema validation fails

disdrodb.l0.check_configs.check_all_sensors_configs() None[source]

Check all sensors configs.

disdrodb.l0.check_configs.check_bin_consistency(sensor_name: str) None[source]

Check bin consistency from config file.

Do not check the first and last bin !

Parameters

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_cf_attributes(sensor_name: str) None[source]

Check that variable_description, variable_long_name, variable_units dict values are strings.

Parameters

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_l0a_encoding(sensor_name: str) None[source]

Check l0a_encodings.yml file.

Parameters

sensor_name (str) – Name of the sensor.

Raises

ValueError – Error raised if the value of a key is not in the list of accepted values.

disdrodb.l0.check_configs.check_l0b_encoding(sensor_name: str) None[source]

Check l0b_encodings.yml file based on the schema defined in the class NetcdfEncodingSchema.

Parameters

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_raw_array(sensor_name: str) None[source]

Check raw array consistency from config file.

Parameters

sensor_name (str) – Name of the sensor.

Raises

ValueError – Error if the chunksizes are not consistent.

disdrodb.l0.check_configs.check_raw_data_format(sensor_name: str) None[source]

check raw_data_format.yml file based on the schema defined in the class RawDataFormatSchema.

Parameters

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_sensor_configs(sensor_name: str) None[source]

check sensor configs.

Parameters

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_variable_consistency(sensor_name: str) None[source]

Check variable consistency across config files.

The variables specified within l0b_encoding.yml must be defined also in the other config files.

Parameters

sensor_name (str) – Name of the sensor.

Raises

ValueError – If the keys are not consistent.

disdrodb.l0.check_configs.check_yaml_files_exists(sensor_name: str) None[source]

Check if all config YAML files exist.

Parameters

sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.get_bins_measurement(sensor_name: str, file_name: str) list[source]

get bins measurement from config file.

Parameters
  • sensor_name (str) – Name of the sensor.

  • file_name (str) – File name (bins_velocity.yml or bins_diameter.yml)

Returns

List of chunksizes (center, bounds, width)

Return type

list

disdrodb.l0.check_configs.schema_error(object_to_validate: Union[str, list], schema: BaseModel, message) bool[source]

Function that validate the schema of a given object with a given schema.

Parameters
  • object_to_validate (Union[str,list]) – Object to validate

  • schema (BaseModel) – Base model

disdrodb.l0.check_metadata module

disdrodb.l0.check_metadata.check_archive_metadata_campaign_name(disdrodb_dir) bool[source]

Check metadata campaign_name.

Parameters

disdrodb_dir (str) – Path to the disdrodb directory.

Returns

If the check succeeds, the result is True, and if it fails, the result is False.

Return type

bool

disdrodb.l0.check_metadata.check_archive_metadata_compliance(disdrodb_dir)[source]
disdrodb.l0.check_metadata.check_archive_metadata_data_source(disdrodb_dir) bool[source]

Check metadata data_source.

Parameters

disdrodb_dir (str) – Path to the disdrodb directory.

Returns

If the check succeeds, the result is True, and if it fails, the result is False.

Return type

bool

disdrodb.l0.check_metadata.check_archive_metadata_geolocation(disdrodb_dir)[source]

Check the metadata files have missing or wrong geolocation..

Parameters

disdrodb_dir (str) – Path to the disdrodb directory.

Returns

If the check succeeds, the result is True, and if it fails, the result is False.

Return type

bool

disdrodb.l0.check_metadata.check_archive_metadata_keys(disdrodb_dir: str) bool[source]

Check that all metadata files have valid keys

Parameters

disdrodb_dir (str) – Path to the disdrodb directory.

Returns

If the check succeeds, the result is True, and if it fails, the result is False.

Return type

bool

disdrodb.l0.check_metadata.check_archive_metadata_reader(disdrodb_dir: str) bool[source]

Check if the reader key is available and there is the associated reader.

Parameters

disdrodb_dir (str) – Path to the disdrodb directory.

Returns

If the check succeeds, the result is True, and if it fails, the result is False.

Return type

bool

disdrodb.l0.check_metadata.check_archive_metadata_sensor_name(disdrodb_dir) bool[source]

Check metadata sensor name.

Parameters

disdrodb_dir (str) – Path to the disdrodb directory.

Returns

If the check succeeds, the result is True, and if it fails, the result is False.

Return type

bool

disdrodb.l0.check_metadata.check_archive_metadata_station_name(disdrodb_dir) bool[source]

Check metadata station name.

Parameters

disdrodb_dir (str) – Path to the disdrodb directory.

Returns

If the check succeeds, the result is True, and if it fails, the result is False.

Return type

bool

disdrodb.l0.check_metadata.check_metadata_geolocation(metadata) None[source]

Identify metadata with missing or wrong geolocation.

disdrodb.l0.check_metadata.get_archive_metadata_key_value(disdrodb_dir: str, key: str, return_tuple: bool = True)[source]

Return the values of a metadata key for all the archive. :param disdrodb_dir: Path to the disdrodb directory. :type disdrodb_dir: str :param key: Metadata key. :type key: str :param return_tuple: if True, returns a tuple of values with station, campaign and data source name (default is True)

if False, returns a list of values without station, campaign and data source name

Returns

List or tuple of values of the metadata key.

Return type

list or tuple

disdrodb.l0.check_metadata.identify_empty_metadata_keys(metadata_fpaths: list, keys: Union[str, list]) None[source]

Identify empty metadata keys.

Parameters
  • metadata_fpaths (str) – Input YAML file path.

  • keys (Union[str,list]) – Attributes to verify the presence.

disdrodb.l0.check_metadata.identify_missing_metadata_coords(metadata_fpaths: str) None[source]

Identify missing coordinates.

Parameters

metadata_fpaths (str) – Input YAML file path.

Raises

TypeError – Error if latitude or longitude coordinates are not present or are wrongly formatted.

disdrodb.l0.check_metadata.read_yaml(fpath: str) dict[source]

Read YAML file.

Parameters

fpath (str) – Input YAML file path.

Returns

Attributes read from the YAML file.

Return type

dict

disdrodb.l0.check_readers module

disdrodb.l0.check_readers.check_all_readers() None[source]

Test all readers that have data samples and ground truth.

Raises

Exception

If the reader validation has failed.

disdrodb.l0.check_readers.get_list_test_campaigns(data_source: str) list[source]

Get list of test campaigns for a given data source.

Parameters

data_source (str) – Data source.

Returns

List of test campaigns.

Return type

list

disdrodb.l0.check_readers.get_list_test_data_sources() list[source]

Get list of test data sources.

Returns

List of test data sources.

Return type

list

disdrodb.l0.check_readers.get_list_test_stations(data_source: str, campaign_name: str) list[source]

Get list of test stations for a given data source and campaign.

Parameters
  • data_source (str) – Data source.

  • campaign_name (str) – Name of the campaign.

Returns

List of test stations.

Return type

list

disdrodb.l0.check_readers.is_parquet_files_identical(file1: str, file2: str) bool[source]

Check if two parquet files are identical.

Parameters
  • file1 (str) – Path to the first file.

  • file2 (str) – Path to the second file.

Returns

True if the two files are identical, False otherwise.

Return type

bool

disdrodb.l0.check_readers.run_reader_on_test_data(data_source: str, campaign_name: str) None[source]

Run reader over the data sample.

Parameters
  • data_source (str) – Data source.

  • campaign_name (str) – Campaign name.

disdrodb.l0.check_standards module

disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) None[source]

Checks that the dataframe columns respects DISDRODB standards.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

Raises

ValueError – Error if some columns do not meet the DISDRODB standards or if the ‘time’ column is missing in the dataframe.

disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, verbose: bool = True) None[source]

Checks that a file respects the DISDRODB L0A standards.

Parameters
  • df (pd.DataFrame) – L0A dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool, optional) – Wheter to verbose the processing. The default is True.

Raises

ValueError – Error if some columns have inconsistent values.

disdrodb.l0.check_standards.check_l0b_standards(x: str) None[source]
disdrodb.l0.check_standards.check_sensor_name(sensor_name: str) None[source]

Check sensor name.

Parameters

sensor_name (str) – Name of the sensor.

Raises
  • TypeError – Error if sensor_name is not a string.

  • ValueError – Error if the input sensor name has not been found in the list of available sensors.

disdrodb.l0.io module

disdrodb.l0.io.check_glob_pattern(pattern: str) None[source]

Check if the input parameters is a string and if it can be used as pattern.

Parameters

pattern (str) – String to be checked.

Raises
  • TypeError – The input parameter is not a string.

  • ValueError – The input parameter can not be used as pattern.

disdrodb.l0.io.check_glob_patterns(patterns: Union[str, list]) list[source]

Check if glob patterns are valids.

disdrodb.l0.io.check_processed_dir(processed_dir)[source]

Check input, format and validity of the directory path

Parameters

processed_dir (str) – Path of the processed directory

Returns

Path of the processed directory

Return type

str

disdrodb.l0.io.check_raw_dir(raw_dir: str, verbose: bool = False) None[source]

Check validity of raw_dir.

Steps: 1. Check that ‘raw_dir’ is a valid directory path 2. Check that ‘raw_dir’ follows the expect directory structure 3. Check that each station_name directory contains data 4. Check that for each station_name the mandatory metadata.yml is specified. 4. Check that for each station_name the mandatory issue.yml is specified.

Parameters
  • raw_dir (str) – Input raw directory

  • verbose (bool, optional) – Wheter to verbose the processing. The default is False.

disdrodb.l0.io.create_directory_structure(processed_dir, product_level, station_name, force, verbose=False)[source]

Create directory structure for L0B and higher DISDRODB products.

disdrodb.l0.io.create_initial_directory_structure(raw_dir, processed_dir, station_name, force, verbose=False, product_level='L0A')[source]

Create directory structure for the first L0 DISDRODB product.

If the input data are raw text files –> product_level = “L0A” (run_l0a) If the input data are raw netCDF files –> product_level = “L0B” (run_l0b_nc)

disdrodb.l0.io.get_L0A_dir(processed_dir: str, station_name: str) str[source]

Define L0A directory.

Parameters
  • processed_dir (str) – Path of the processed directory

  • station_name (str) – Name of the station

Returns

L0A directory path.

Return type

str

disdrodb.l0.io.get_L0A_fname(df, processed_dir, station_name: str) str[source]

Define L0A file name.

Parameters
  • df (pd.DataFrame) – L0A DataFrame

  • processed_dir (str) – Path of the processed directory

  • station_name (str) – Name of the station

Returns

L0A file name.

Return type

str

disdrodb.l0.io.get_L0A_fpath(df: DataFrame, processed_dir: str, station_name: str) str[source]

Define L0A file path.

Parameters
  • df (pd.DataFrame) – L0A DataFrame.

  • processed_dir (str) – Path of the processed directory.

  • station_name (str) – Name of the station.

Returns

L0A file path.

Return type

str

disdrodb.l0.io.get_L0B_dir(processed_dir: str, station_name: str) str[source]

Define L0B directory.

Parameters
  • processed_dir (str) – Path of the processed directory

  • station_name (int) – Name of the station

Returns

Path of the L0B directory

Return type

str

disdrodb.l0.io.get_L0B_fname(ds, processed_dir, station_name: str) str[source]

Define L0B file name.

Parameters
  • ds (xr.Dataset) – L0B xarray Dataset

  • processed_dir (str) – Path of the processed directory

  • station_name (str) – Name of the station

Returns

L0B file name.

Return type

str

disdrodb.l0.io.get_L0B_fpath(ds: Dataset, processed_dir: str, station_name: str, l0b_concat=False) str[source]

Define L0B file path.

Parameters
  • ds (xr.Dataset) – L0B xarray Dataset.

  • processed_dir (str) – Path of the processed directory.

  • station_name (str) – ID of the station

  • l0b_concat (bool) – If False, the file is specified inside the station directory. If True, the file is specified outside the station directory.

Returns

L0B file path.

Return type

str

disdrodb.l0.io.get_campaign_name(path: str) str[source]

Return the campaign name from a file or directory path.

Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!

Parameters

base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.

Returns

Name of the campaign.

Return type

str

disdrodb.l0.io.get_data_source(path: str) str[source]

Return the data_source from a file or directory path.

Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!

Parameters

base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.

Returns

Name of the campaign.

Return type

str

disdrodb.l0.io.get_dataframe_min_max_time(df: DataFrame)[source]

Retrieves dataframe starting and ending time.

Parameters

df (pd.DataFrame) – Input dataframe

Returns

(starting_time, ending_time)

Return type

tuple

disdrodb.l0.io.get_dataset_min_max_time(ds: Dataset)[source]

Retrieves dataset starting and ending time.

Parameters

ds (xr.Dataset) – Input dataset

Returns

(starting_time, ending_time)

Return type

tuple

disdrodb.l0.io.get_disdrodb_dir(path: str) str[source]

Return the disdrodb base directory from a file or directory path.

Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!

Parameters

path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.

Returns

Path of the DISDRODB directory.

Return type

str

disdrodb.l0.io.get_disdrodb_path(path: str) str[source]

Return the path fron the disdrodb_dir directory.

Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!

Parameters

path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.

Returns

Path inside the DISDRODB archive. Format: DISDRODB/<Raw or Processed>/<data_source>/…

Return type

str

disdrodb.l0.io.get_l0a_file_list(processed_dir, station_name, debugging_mode)[source]

Retrieve L0A files for a give station.

Parameters
  • processed_dir (str) – Directory of the campaign where to search for the L0A files. Format <..>/DISDRODB/Processed/<data_source>/<campaign_name>

  • station_name (str) – ID of the station

  • debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.

Returns

list_fpaths – List of L0A file paths.

Return type

list

disdrodb.l0.io.get_raw_file_list(raw_dir, station_name, glob_patterns, verbose=False, debugging_mode=False)[source]

Get the list of files from a directory based on input parameters.

Currently concatenates all files provided by the glob patterns. In future, this might be modified to enable DISDRODB processing when raw data are separated in multiple files.

Parameters
  • raw_dir (str) – Directory of the campaign where to search for files. Format <..>/DISDRODB/Raw/<data_source>/<campaign_name>

  • station_name (str) – ID of the station

  • verbose (bool, optional) – Wheter to verbose the processing. The default is False.

  • debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.

Returns

list_fpaths – List of files file paths.

Return type

list

disdrodb.l0.io.read_L0A_dataframe(fpaths: Union[str, list], verbose: bool = False, debugging_mode: bool = False) DataFrame[source]

Read DISDRODB L0A Apache Parquet file(s).

Parameters
  • fpaths (str or list) – Either a list or a single filepath .

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is False.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. If fpaths is a list, it reads only the first 3 files For each file it select only the first 100 rows. The default is False.

Returns

L0A Dataframe.

Return type

pd.DataFrame

disdrodb.l0.issue module

class disdrodb.l0.issue.NoDatesSafeLoader(stream)[source]

Bases: SafeLoader

classmethod remove_implicit_resolver(tag_to_remove)[source]

Remove implicit resolvers for a particular tag

Takes care not to modify resolvers in super classes.

We want to load datetimes as strings, not dates, because we go on to serialise as json which doesn’t have the advanced types of yaml, and leads to incompatibilities down the track.

disdrodb.l0.issue.check_issue_dict(issue_dict)[source]

Check validity of the issue dictionary

disdrodb.l0.issue.check_issue_file(fpath: str) None[source]

Check issue YAML file validity.

Parameters

fpath (str) – Issue YAML file path.

disdrodb.l0.issue.check_time_periods(time_periods)[source]

Check time_periods validity.

disdrodb.l0.issue.check_timesteps(timesteps)[source]

Check timesteps validity.

It expects timesteps string in YYYY-mm-dd HH:MM:SS format with second accuracy. If timesteps is None, return None.

disdrodb.l0.issue.is_numpy_array_datetime(arr)[source]

Check if the numpy array contains datetime64

Parameters

arr (numpy array) – Numpy array to check.

Returns

Numpy array checked.

Return type

numpy array

disdrodb.l0.issue.is_numpy_array_string(arr)[source]

Check if the numpy array contains strings

Parameters

arr (numpy array) – Numpy array to check.

disdrodb.l0.issue.load_yaml_without_date_parsing(filepath)[source]

Read a YAML file without converting automatically date string to datetime.

disdrodb.l0.issue.read_issue(raw_dir: str, station_name: str) dict[source]

Read YAML issue file.

Parameters
  • raw_dir (str) – Path of the campaign raw directory.

  • station_name (int) – Station name.

Returns

Issue dictionary.

Return type

dict

disdrodb.l0.issue.read_issue_file(fpath: str) dict[source]

Read YAML issue file.

Parameters

fpath (str) – Filepath of the issue YAML.

Returns

Issue dictionary.

Return type

dict

disdrodb.l0.issue.write_default_issue(fpath: str) None[source]

Write an empty issue YAML file.

Parameters

fpath (str) – Filepath of the issue YAML to write.

disdrodb.l0.issue.write_issue_dict(fpath: str, issue_dict: dict) None[source]

Write the issue YAML file.

Parameters
  • fpath (str) – Filepath of the issue YAML to write.

  • issue_dict (dict) – Issue dictionary

disdrodb.l0.l0_processing module

disdrodb.l0.l0_processing.click_l0_archive_options(function: object)[source]

Click command line arguments for L0 processing archiving of a station.

Parameters

function (object) – Function.

disdrodb.l0.l0_processing.click_l0_processing_options(function: object)[source]

Click command line default parameters for L0 processing options.

Parameters

function (object) – Function.

disdrodb.l0.l0_processing.click_l0_station_arguments(function: object)[source]

Click command line arguments for L0 processing of a station.

Parameters

function (object) – Function.

disdrodb.l0.l0_processing.click_l0_stations_options(function: object)[source]

Click command line options for DISDRODB archive L0 processing.

Parameters

function (object) – Function.

disdrodb.l0.l0_processing.click_l0b_concat_options(function: object)[source]

Click command line default parameters for L0B concatenation.

Parameters

function (object) – Function.

disdrodb.l0.l0_processing.run_disdrodb_l0(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0 processing of DISDRODB stations.

This function enable to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.

Parameters
  • disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB

  • data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is None

  • campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is None

  • station_names (list) – Station names to process. The default is None

  • l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.

  • l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.

  • l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.

  • remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.

  • remove_l0b (bool) –

    Whether to remove the L0B files after having concatenated all L0B netCDF files.

    It takes places only if l0b_concat = True

    The default is False.

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is True.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default is False.

disdrodb.l0.l0_processing.run_disdrodb_l0_station(disdrodb_dir, data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0 processing of a specific DISDRODB station from the terminal.

Parameters
  • disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB

  • data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.

  • campaign_name (str) – Campaign name. Must be UPPER CASE.

  • station_name (str) – Station name

  • l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.

  • l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.

  • l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.

  • remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.

  • remove_l0b (bool) –

    Whether to remove the L0B files after having concatenated all L0B netCDF files.

    It takes places only if l0b_concat=True

    The default is False.

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is True.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default is False.

disdrodb.l0.l0_processing.run_disdrodb_l0a(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
disdrodb.l0.l0_processing.run_disdrodb_l0a_station(disdrodb_dir, data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0A processing of a station calling run_disdrodb_l0a_station in the terminal.

disdrodb.l0.l0_processing.run_disdrodb_l0b(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
disdrodb.l0.l0_processing.run_disdrodb_l0b_station(disdrodb_dir, data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0B processing of a station calling run_disdrodb_l0b_station in the terminal.

disdrodb.l0.l0_processing.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]

Run the L0A processing for a specific DISDRODB station.

Parameters
  • raw_dir (str) –

    The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

    <…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

    Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:

    • the raw_dir and processed_dir directory paths;

    • with the key ‘campaign_name’ within the metadata YAML files.

    • The campaign_name are expected to be UPPER CASE.

  • processed_dir (str) –

    The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:

    <…>/DISDRODB/Processed/<data_source>/<campaign_name>’

    For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).

  • station_name (str) – Station name

  • glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>

  • column_names (list) – Columns names of the raw text file.

  • reader_kwargs (dict) – Pandas read_csv arguments to open the text file.

  • df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame into DISDRODB L0A standard.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is False.

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 100 rows of 3 raw data files. The default is False.

disdrodb.l0.l0_processing.run_l0b(processed_dir, station_name, parallel, force, verbose, debugging_mode)[source]

Run the L0B processing for a specific DISDRODB station.

Parameters
  • raw_dir (str) –

    The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

    <…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

    Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:

    • the raw_dir and processed_dir directory paths;

    • with the key ‘campaign_name’ within the metadata YAML files.

    • The campaign_name are expected to be UPPER CASE.

  • processed_dir (str) –

    The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:

    <…>/DISDRODB/Processed/<data_source>/<campaign_name>’

    For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).

  • station_name (str) – Station name

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is True.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. Ensure that the threads_per_worker (number of thread per process) is set to 1 to avoid HDF errors. Also ensure to set the HDF5_USE_FILE_LOCKING environment variable to False. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just 3 raw data files. The default is False.

disdrodb.l0.l0_processing.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]

Run the L0B processing for a specific DISDRODB station with raw netCDFs.

Parameters
  • raw_dir (str) –

    The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

    <…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

    Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:

    • the raw_dir and processed_dir directory paths;

    • with the key ‘campaign_name’ within the metadata YAML files.

    • The campaign_name are expected to be UPPER CASE.

  • processed_dir (str) –

    The desired directory path for the processed DISDRODB L0B products. The path should have the following structure:

    <…>/DISDRODB/Processed/<data_source>/<campaign_name>’

    For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).

  • station_name (str) – Station name

  • glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>. Example: glob_patterns = “*.nc”

  • dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.

  • ds_sanitizer_fun (object, optional) – Sanitizer function to format the raw netCDF into DISDRODB L0B standard.

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is False.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw netCDF files. The default is False.

disdrodb.l0.l0_reader module

disdrodb.l0.l0_reader.available_readers(data_sources=None, reader_path=False)[source]

Retrieve available readers information.

disdrodb.l0.l0_reader.check_available_readers()[source]

Check the readers arguments of all package.

disdrodb.l0.l0_reader.check_reader_arguments(reader)[source]

Check the reader have the expected input arguments.

disdrodb.l0.l0_reader.check_reader_exists(reader_data_source: str, reader_name: str) str[source]

Check if the provided data source exists and reader names exists within the available readers.

Please run get_available_readers_dict() to get the list of all available reader.

Parameters
  • reader_data_source (str) – The directory within which the reader_name is located in the disdrodb.l0.readers directory.

  • reader_name (str) – Campaign name

Returns

If True : returns the reader name If False : Error - return None

Return type

str

Raises

ValueError – Error if the reader name provided for the campaign has not been found.

disdrodb.l0.l0_reader.get_available_readers_dict() dict[source]

Returns the readers description included into the current release of DISDRODB.

Returns

The dictionary has the following schema {“data_source”: {“reader_name”: “reader_file_path”}}

Return type

dict

disdrodb.l0.l0_reader.get_reader(reader_data_source: str, reader_name: str) object[source]

Returns the reader function based on input parameters.

Parameters
  • reader_data_source (str) – The directory within which the reader_name is located in the disdrodb.l0.readers directory.

  • reader_name (str) – The reader name.

Returns

The reader() function

Return type

object

disdrodb.l0.l0_reader.get_reader_from_metadata_reader_key(reader_data_source_name)[source]

Retrieve the reader from the reader metadata value.

The convention for metadata reader key: <data_source/reader_name> in disdrodb.l0.readers

disdrodb.l0.l0_reader.get_station_reader(disdrodb_dir, data_source, campaign_name, station_name)[source]

Retrieve reader form station metadata information.

disdrodb.l0.l0_reader.is_documented_by(original)[source]

Wrapper function to apply generic docstring to the decorated function.

Parameters

original (function) – Function to take the docstring from.

disdrodb.l0.l0_reader.reader_generic_docstring()[source]

Script to convert the raw data to L0A format.

Parameters
  • raw_dir (str) –

    The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

    <…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

    Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:

    • the raw_dir and processed_dir directory paths;

    • with the key ‘campaign_name’ within the metadata YAML files.

    • The campaign_name are expected to be UPPER CASE.

  • processed_dir (str) –

    The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:

    <…>/DISDRODB/Processed/<data_source>/<campaign_name>’

    For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).

  • station_name (str) – Station name

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is True.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw data files. The default is False.

disdrodb.l0.l0a_processing module

Functions to process raw text files into DISDRODB L0A Apache Parquet.

disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str, verbose: bool = False) DataFrame[source]

Convert ‘object’ dataframe columns into DISDRODB L0A dtype standards.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with corrected columns types.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str, verbose: bool = False) DataFrame[source]

Coerce corrupted values in dataframe numeric columns to np.nan.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with string columns without corrupted values.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, verbose: bool = False) DataFrame[source]

Concatenate a list of dataframes.

Parameters
  • list_df (list) – List of dataframes.

  • verbose (bool, optional) – If True, print messages. If False, no print.

Returns

Concatenated dataframe.

Return type

pd.DataFrame

Raises

ValueError – Concatenation can not be done.

disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source]

Drop problematic time_period.

disdrodb.l0.l0a_processing.drop_timesteps(df, timesteps)[source]

Drop problematic time steps.

disdrodb.l0.l0a_processing.preprocess_reader_kwargs(reader_kwargs: dict) dict[source]

Preprocess arguments required to read raw text file into Pandas.

Parameters

reader_kwargs (dict) – Initial parameter dictionary.

Returns

Parameter dictionary that matches either Pandas or Dask.

Return type

dict

disdrodb.l0.l0a_processing.process_raw_file(filepath, column_names, reader_kwargs, df_sanitizer_fun, sensor_name, verbose=True, issue_dict={})[source]

Read and parse a raw text files into a L0A dataframe.

Parameters
  • filepath (str) – File path

  • column_names (list) – Columns names.

  • reader_kwargs (dict) – Pandas read_csv arguments.

  • df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing. The default is True

  • issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are ‘timesteps’ and ‘time_periods’. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

Returns

Dataframe

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.read_raw_data(filepath: str, column_names: list, reader_kwargs: dict) DataFrame[source]

Read raw data into a dataframe.

Parameters
  • filepath (str) – Raw file path.

  • column_names (list) – Column names.

  • reader_kwargs (dict) – Pandas pd.read_csv arguments.

Returns

Pandas dataframe.

Return type

pandas.DataFrame

disdrodb.l0.l0a_processing.read_raw_file_list(file_list: Union[list, str], column_names: list, reader_kwargs: dict, sensor_name: str, verbose: bool, df_sanitizer_fun: Optional[object] = None) DataFrame[source]

Read and parse a list for raw files into a dataframe.

Parameters
  • file_list (Union[list,str]) – File(s) path(s)

  • column_names (list) – Columns names.

  • reader_kwargs (dict) – Pandas read_csv arguments.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

  • df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.

Returns

Dataframe

Return type

pd.DataFrame

Raises

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source]

Remove corrupted rows by checking conversion of raw fields to numeric.

Note: The raw array must be stripped away from delimiter at start and end !

disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, verbose: bool = False)[source]

Remove duplicated timesteps.

It keep only the first timestep occurence !

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with valid unique timesteps.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, verbose=False)[source]

Drop dataframe rows with timesteps listed in the issue dictionary.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • issue_dict (dict) – Issue dictionary

Returns

Dataframe with problematic timesteps removed.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: DataFrame, verbose: bool = False)[source]

Remove dataframe rows where the “time” is NaT.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with valid timesteps.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, verbose)[source]

Set values corresponding to nan_flags to np.nan.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe without nan_flags values.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, verbose)[source]

Set values outside the data range as np.nan.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe without values outside the expected data range.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.set_nan_unvalid_values(df, sensor_name, verbose)[source]

Set unvalid (class) values to np.nan.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe without unvalid values.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source]

Remove the first and last delimiter occurence from the raw array fields.

disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str, verbose: bool = False) DataFrame[source]

Strip leading/trailing spaces from dataframe string columns.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with string columns without leading/trailing spaces.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, fpath: str, force: bool = False, verbose: bool = False)[source]

Save the dataframe into an Apache Parquet file.

Parameters
  • df (pd.DataFrame) – Input dataframe.

  • fpath (str) – Output file path.

  • force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.

  • verbose (bool, optional) – Wheter to verbose the processing. The default is False.

Raises
  • ValueError – The input dataframe can not be written as an Apache Parquet file.

  • NotImplementedError – The input dataframe can not be processed.

disdrodb.l0.l0b_concat module

disdrodb.l0.l0b_concat.run_disdrodb_l0b_concat(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, remove_l0b=False, verbose=False)[source]

Concatenate the L0B files of the DISDRODB archive.

This function is called by the run_disdrodb_l0b_concat script.

disdrodb.l0.l0b_concat.run_disdrodb_l0b_concat_station(disdrodb_dir, data_source, campaign_name, station_name, remove_l0b=False, verbose=False)[source]

Concatenate the L0B files of a single DISDRODB station.

This function runs the run_disdrodb_l0b_concat_station script in the terminal.

disdrodb.l0.l0b_processing module

Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.

disdrodb.l0.l0b_processing.add_dataset_crs_coords(ds)[source]

Add the CRS coordinate to the xr.Dataset

disdrodb.l0.l0b_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source]

Add missing Dataset variables as nan DataArrays.

disdrodb.l0.l0b_processing.convert_object_variables_to_string(ds: Dataset) Dataset[source]

Convert variables with object dtype to string.

Parameters

ds (xr.Dataset) – Input dataset.

Returns

Output dataset.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.create_l0b_from_l0a(df: DataFrame, attrs: dict, verbose: bool = False) Dataset[source]

Transform the L0A dataframe to the L0B xr.Dataset.

Parameters
  • df (pd.DataFrame) – DISDRODB L0A dataframe.

  • attrs (dict) – Station metadata.

  • verbose (bool, optional) – Wheter to verbose the processing. The default is False.

Returns

DISDRODB L0B dataset.

Return type

xr.Dataset

Raises

ValueError – Error if the DISDRODB L0B xarray dataset can not be created.

disdrodb.l0.l0b_processing.format_string_array(string: str, n_values: int) array[source]

Split a string with multiple numbers separated by a delimiter into an 1D array.

e.g. : format_string_array(“2,44,22,33”, 4) will return [ 2. 44. 22. 33.]

If empty string (“”) –> Return an arrays of zeros If the list length is not n_values -> Return an arrays of np.nan

The function strip potential delimiters at start and end before splitting.

Parameters
  • string (str) – Input string

  • n_values (int) – Expected length of the output array.

Returns

array of float

Return type

np.array

disdrodb.l0.l0b_processing.get_bin_coords(sensor_name: str) dict[source]

Retrieve diameter (and velocity) bin coordinates.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Dictionary with coordinate arrays.

Return type

dict

disdrodb.l0.l0b_processing.infer_split_str(string: str) str[source]

Infer the delimeter inside a string.

Parameters

string (str) – Input string.

Returns

Inferred delimiter.

Return type

str

disdrodb.l0.l0b_processing.preprocess_raw_netcdf(ds, dict_names, sensor_name)[source]

This function preprocess raw netCDF to improve compatibility with DISDRODB standards.

This function checks validity of the dict_names, rename and subset the data accordingly. If some variables specified in the dict_names are missing, it adds a NaN DataArray !

Parameters
  • ds (xr.Dataset) – Raw netCDF to be converted to DISDRODB standards.

  • dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.

  • sensor_name (str) – Sensor name.

Returns

ds – xarray Dataset with DISDRODB-compliant variable naming conventions.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.process_raw_nc(filepath, dict_names, ds_sanitizer_fun, sensor_name, verbose, attrs)[source]

Read and convert a raw netCDF into a DISDRODB L0B netCDF.

Parameters
  • filepath (str) – netCDF file path.

  • dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.

  • ds_sanitizer_fun (function) – Sanitizer function to do ad-hoc processing of the xr.Dataset.

  • attrs (dict) – Global metadata to attach as global attributes to the xr.Dataset.

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

Returns

L0B xr.Dataset

Return type

xr.Dataset

disdrodb.l0.l0b_processing.rechunk_dataset(ds: Dataset, encoding_dict: dict) Dataset[source]

Coerce the dataset arrays to have the chunk size specified in the encoding dictionary.

Parameters
  • ds (xr.Dataset) – Input xarray dataset

  • encoding_dict (dict) – Dictionary containing the encoding to write the xarray dataset as a netCDF.

Returns

Output xarray dataset

Return type

xr.Dataset

disdrodb.l0.l0b_processing.rename_dataset(ds, dict_names)[source]

Rename Dataset variables, coordinates and dimensions.

disdrodb.l0.l0b_processing.replace_custom_nan_flags(ds, dict_nan_flags)[source]

Set values corresponding to nan_flags to np.nan.

Parameters
  • df (xr.Dataset) – Input xarray dataset

  • dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan

Returns

Dataset without nan_flags values.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.replace_nan_flags(ds, sensor_name, verbose)[source]

Set values corresponding to nan_flags to np.nan.

Parameters
  • ds (xr.Dataset) – Input xarray dataset

  • dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataset without nan_flags values.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.reshape_raw_spectrum(arr: array, dims_order: list, dims_size_dict: dict, n_timesteps: int) array[source]

Reshape the raw spectrum to a 2D+time array.

The array has dimensions [“time”] + dims_order

Parameters
  • arr (np.array) – Input array.

  • dims_order (list) –

    The order of dimension in the raw spectrum.

    Examples: - OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dims_order = [“diameter_bin_center”, “velocity_bin_center”] - Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dims_order = [“velocity_bin_center”, “diameter_bin_center”]

  • dims_size_dict (dict) – Dictionary with the number of bins for each dimension. For OTT_Parsivel: {“diameter_bin_center”: 32, “velocity_bin_center”: 32} For This_LPM {“diameter_bin_center”: 22, “velocity_bin_center”: 20}

  • n_timesteps (int) – Number of timesteps.

Returns

Output array.

Return type

np.array

Raises

ValueError – Impossible to reshape the raw_spectrum matrix

disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, verbose: bool = False) dict[source]

Retrieves the L0B data matrix.

Parameters
  • df (pd.DataFrame) – Input dataframe

  • sensor_name (str) – Name of the sensor

Returns

Dictionary with data arrays.

Return type

dict

disdrodb.l0.l0b_processing.sanitize_encodings_dict(encoding_dict: dict, ds: Dataset) dict[source]

Ensure chunk size to be smaller than the array shape.

Parameters
  • encoding_dict (dict) – Dictionary containing the encoding to write DISDRODB L0B netCDFs.

  • ds (xr.Dataset) – Input dataset.

Returns

Encoding dictionary.

Return type

dict

disdrodb.l0.l0b_processing.set_coordinate_attributes(ds)[source]
disdrodb.l0.l0b_processing.set_dataset_attrs(ds, sensor_name)[source]

Set variable and coordinates attributes.

disdrodb.l0.l0b_processing.set_encodings(ds: Dataset, sensor_name: str) Dataset[source]

Apply the encodings to the xarray Dataset.

Parameters
  • ds (xr.Dataset) – Input xarray dataset.

  • sensor_name (str) – Name of the sensor.

Returns

Output xarray dataset.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.set_nan_outside_data_range(ds, sensor_name, verbose)[source]

Set values outside the data range as np.nan.

Parameters
  • ds (xr.Dataset) – Input xarray dataset

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataset without values outside the expected data range.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.set_nan_unvalid_values(ds, sensor_name, verbose)[source]

Set unvalid (class) values to np.nan.

Parameters
  • ds (xr.Dataset) – Input xarray dataset

  • sensor_name (str) – Name of the sensor.

  • verbose (bool) – Wheter to verbose the processing.

Returns

Dataset without unvalid values.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.set_variable_attributes(ds: Dataset, sensor_name: str) Dataset[source]

Set attributes to each xr.Dataset variable.

Parameters
  • ds (xr.Dataset) – Input dataset.

  • sensor_name (str) – Name of the sensor.

Returns

xr.Dataset.

Return type

ds

disdrodb.l0.l0b_processing.subset_dataset(ds, dict_names, sensor_name)[source]
disdrodb.l0.l0b_processing.write_l0b(ds: Dataset, fpath: str, force=False) None[source]

Save the xarray dataset into a NetCDF file.

Parameters
  • ds (xr.Dataset) – Input xarray dataset.

  • fpath (str) – Output file path.

  • sensor_name (str) – Name of the sensor.

  • force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.

disdrodb.l0.metadata module

disdrodb.l0.metadata.add_missing_metadata_keys(metadata)[source]

Add missing keys to the metadata dictionary.

disdrodb.l0.metadata.check_metadata_compliance(disdrodb_dir, data_source, campaign_name, station_name)[source]

Check DISDRODB metadata compliance.

disdrodb.l0.metadata.create_campaign_default_metadata(disdrodb_dir, campaign_name, data_source)[source]

Create default YAML metadata files for all stations within a campaign.

Use the function with caution to avoid overwrite existing YAML files.

disdrodb.l0.metadata.get_default_metadata_dict() dict[source]

Get DISDRODB metadata default values.

Returns

Dictionary of attibutes standard

Return type

dict

disdrodb.l0.metadata.get_metadata_missing_keys(metadata)[source]

Return the DISDRODB metadata keys which are missing.

disdrodb.l0.metadata.get_metadata_unvalid_keys(metadata)[source]

Return the DISDRODB metadata keys which are not valid.

disdrodb.l0.metadata.get_valid_metadata_keys() list[source]

Get DISDRODB valid metadata list.

Returns

List of valid metadata keys

Return type

list

disdrodb.l0.metadata.read_metadata(campaign_dir: str, station_name: str) dict[source]

Read YAML metadata file.

Parameters
  • raw_dir (str) – Path of the raw directory

  • station_name (int) – Id of the station.

Returns

Dictionnary of the metadata.

Return type

dict

disdrodb.l0.metadata.remove_unvalid_metadata_keys(metadata)[source]

Remove unvalid keys from the metadata dictionary.

disdrodb.l0.metadata.sort_metadata_dictionary(metadata)[source]

Sort the keys of the metadata dictionary by valid_metadata_keys list order.

disdrodb.l0.metadata.write_default_metadata(fpath: str) None[source]

Create default YAML metadata file at the specified filepath.

Parameters

fpath (str) – File path

disdrodb.l0.metadata.write_metadata(metadata, fpath)[source]

Write dictionary to YAML file.

disdrodb.l0.standards module

disdrodb.l0.standards.available_sensor_name() sorted[source]

Get available names of sensors.

Returns

Sorted list of the available sensors

Return type

sorted

disdrodb.l0.standards.get_L0A_encodings_dict(sensor_name: str) dict[source]

Get a dictionary containing the L0A encodings

Parameters

sensor_name (str) – Name of the sensor.

Returns

L0A encodings

Return type

dict

disdrodb.l0.standards.get_L0B_encodings_dict(sensor_name: str) dict[source]

Get a dictionary containing the encoding to write L0B netCDFs.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Encoding to write L0B netCDFs

Return type

dict

disdrodb.l0.standards.get_configs_dir(sensor_name: str) str[source]

Retrieve configs directory.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Config directory.

Return type

str

Raises

ValueError – Error if the config directory does not exist.

disdrodb.l0.standards.get_coords_attrs_dict(ds)[source]

Return dictionary with DISDRODB coordinates attributes.

disdrodb.l0.standards.get_data_format_dict(sensor_name: str) dict[source]

Get a dictionary containing the data format of each sensor variable.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Data format of each sensor variable

Return type

dict

disdrodb.l0.standards.get_data_range_dict(sensor_name: str) dict[source]

Get the variable data range.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Dictionary with the expected data value range for each data field. It excludes variables without specified data_range key.

Return type

dict

disdrodb.l0.standards.get_description_dict(sensor_name: str) dict[source]

Get a dictionary containing the description of each sensor variable.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Description of each sensor variable.

Return type

dict

disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) list[source]

Get diameter bin center.

Parameters

sensor_name (str) – Name of the sensor

Returns

Diameter bin center

Return type

list

disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) list[source]

Get diameter bin lower bound.

Parameters

sensor_name (str) – Name of the sensor

Returns

Diameter bin lower bound

Return type

list

disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) list[source]

Get diameter bin upper bound.

Parameters

sensor_name (str) – Name of the sensor

Returns

Diameter bin upper bound

Return type

list

disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) list[source]

Get diameter bin width.

Parameters

sensor_name (str) – Name of the sensor

Returns

Diameter bin width

Return type

list

disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) dict[source]

Get dictionary with sensor_name diameter bins information.

Parameters

sensor_name (str) – Name of the sensor.

Returns

sensor_name diameter bins information

Return type

dict

disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) dict[source]

Get the number of bins for each dimension.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Dictionary with the number of bins for each dimension.

Return type

dict

disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) dict[source]

Get the total number of characters from the instrument default string standards.

Important note: it accounts also for the comma and the minus sign !!!

Parameters

sensor_name (str) – Name of the sensor.

Returns

Dictionary with the expected number of characters for each data field.

Return type

dict

disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) dict[source]

Get number of digits on the right side of the comma from the instrument default string standards.

Example: 123,45 -> 45 –> 2 decimal digits :param sensor_name: Name of the sensor. :type sensor_name: dict

Returns

Dictionary with the expected number of decimal digits for each data field.

Return type

dict

disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) dict[source]

Get number of digits from the instrument default string standards.

Important note: it excludes the comma but it counts the minus sign !!!

Parameters

sensor_name (str) – Name of the sensor.

Returns

Dictionary with the expected number of digits for each data field.

Return type

dict

disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) dict[source]

Get number of digits on the left side of the comma from the instrument default string standards.

Example: 123,45 -> 123 –> 3 natural digits

Parameters

sensor_name (str) – Name of the sensor.

Returns

Dictionary with the expected number of natural digits for each data field.

Return type

dict

disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) dict[source]

Get a dictionary containing the L0A dtype.

Parameters

sensor_name (str) – Name of the sensor.

Returns

L0A dtype

Return type

dict

disdrodb.l0.standards.get_long_name_dict(sensor_name: str) dict[source]

Get a dictionary containing the long name of each sensor variable.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Long name of each sensor variable.

Return type

dict

disdrodb.l0.standards.get_n_diameter_bins(sensor_name)[source]

Get the number of diameter bins.

disdrodb.l0.standards.get_n_velocity_bins(sensor_name)[source]

Get the number of velocity bins.

disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) dict[source]

Get the variable nan_flags.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Dictionary with the expected nan_flags list for each data field. It excludes variables without specified nan_flags key.

Return type

dict

disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) dict[source]

Get the dimension order of the raw fields.

The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.

Examples

OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]

Parameters

sensor_name (str) – Name of the sensor

Returns

Dimension order dictionary

Return type

dict

disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) dict[source]

Get a dictionary with the number of values expected for each raw array.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Field definition.

Return type

dict

disdrodb.l0.standards.get_sensor_variables(sensor_name: str) list[source]

Get sensor variable names list.

Parameters

sensor_name (str) – Name of the sensor.

Returns

List of the variables values

Return type

list

disdrodb.l0.standards.get_time_encoding() dict[source]

Create time encoding

Returns

Time encoding

Return type

dict

disdrodb.l0.standards.get_units_dict(sensor_name: str) dict[source]

Get a dictionary containing the unit of each sensor variable.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Unit of each sensor variable

Return type

dict

disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source]

Get list of valid coordinates.

disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source]

Get list of valid dimension names.

disdrodb.l0.standards.get_valid_names(sensor_name)[source]
disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) dict[source]

Get the list of valid values for a variable.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Dictionary with the expected values for specific variables. It excludes variables without specified valid_values key.

Return type

dict

disdrodb.l0.standards.get_valid_variable_names(sensor_name)[source]

Get list of valid variables.

disdrodb.l0.standards.get_variables_dict(sensor_name: str) dict[source]

Get a dictionary containing the variable name of the sensor field numbers.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Variables names

Return type

dict

disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source]

Returns a dictionary with the variable dimensions of a L0B product.

disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) list[source]

Get velocity bin center.

Parameters

sensor_name (str) – Name of the sensor

Returns

Velocity bin center

Return type

list

disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) list[source]

Get velocity bin lower bound.

Parameters

sensor_name (str) – Name of the sensor

Returns

Velocity bin lower bound.

Return type

list

disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) list[source]

Get velocity bin upper bound.

Parameters

sensor_name (str) – Name of the sensor

Returns

Velocity bin upper bound

Return type

list

disdrodb.l0.standards.get_velocity_bin_width(sensor_name: str) list[source]

Get velocity bin width.

Parameters

sensor_name (str) – Name of the sensor

Returns

Velocity bin width

Return type

list

disdrodb.l0.standards.get_velocity_bins_dict(sensor_name: str) dict[source]

Get velocity with sensor_name diameter bins information.

Parameters

sensor_name (str) – Name of the sensor.

Returns

Sensor_name diameter bins information

Return type

dict

disdrodb.l0.standards.read_config_yml(sensor_name: str, filename: str) dict[source]

Read a config yaml file and return the dictionary.

Parameters
  • sensor_name (str) – Name of the sensor.

  • filename (str) – Name of the file.

Returns

Content of the config file.

Return type

dict

Raises

ValueError – Error if file does not exist.

disdrodb.l0.standards.set_disdrodb_attrs(ds, product_level: str)[source]

Add DISDRODB processing information to the netCDF global attributes.

It assumes stations metadata are already added the dataset.

Parameters
  • ds (xarray dataset) – Dataset

  • product_level (str) – DISDRODB product_level

Returns

Dataset

Return type

xarray dataset

disdrodb.l0.summary module

disdrodb.l0.template_tools module

disdrodb.l0.template_tools.arr_has_constant_nchar(arr: array) bool[source]

Check if the content of an array has a constant number of characters

Parameters

arr (numpy.ndarray) – The array to analyse

Returns

True if the number of character is constant

Return type

booleen

disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) None[source]

Checks that the columnn names respects DISDRODB standards.

Parameters
  • column_names (list) – List of columns names.

  • sensor_name (str) – Name of the sensor.

Raises

TypeError – Error if some columns do not meet the DISDRODB standards.

disdrodb.l0.template_tools.get_decimal_ndigits(string: str) int[source]

Get the decimal number of digit.

Parameters

string (str) – Input string

Returns

The number of digit.

Return type

int

disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True)[source]

Create a dictionary {column: unique values}

Parameters
  • df (pd.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – column indices

  • column_names (bool, optional) – If true, print the column name, by default True

disdrodb.l0.template_tools.get_natural_ndigits(string: str) int[source]

Get the natural number of digit.

Parameters

string (str) – Input string

Returns

The number of digit.

Return type

int

disdrodb.l0.template_tools.get_nchar(string: str) int[source]

Get the number of charactar.

Parameters

string (str) – Input string

Returns

Number of charactar

Return type

int

disdrodb.l0.template_tools.get_ndigits(string: str) int[source]

Get the number of digit.

Parameters

string (str) – Input string

Returns

Number of digit

Return type

int

disdrodb.l0.template_tools.get_possible_keys(dict_options: dict, desired_value: str) set[source]

Get the possible keys from the input values

Parameters
  • dict_options (dict) – Input dictionnary

  • desired_value (str) – Input value

Returns

Keys that the value matches the desired input value.

Return type

set

disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 1)[source]

Try to guess the dataframe columns names based on string characteristics.

Parameters
  • df (numpy.ndarray) – The array to analyse

  • sensor_name (str) – name of the sensor

  • row_idx (int, optional) – The row ID of the array, by default 1

Returns

Dictionary with the keys being the column id and the values being the guessed column names

Return type

dict

disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) None[source]

Print dataframe columns names

Parameters

df (dataframe) – The dataframe

Returns

Nothing

Return type

None

disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True) None[source]

Print columns’ unique values

Parameters
  • df (pd.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – column indices

  • column_names (bool, optional) – If true, print the column name, by default True

disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, column_names: bool = True) None[source]

Print the n first n rows dataframe by column.

Parameters
  • df (pd.DataFrame) – Input dataframe

  • n (int, optional) – Number of row, by default 5

  • column_names (bool , optional) – If true columns name are printed, by default True

disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, with_column_names: bool = True) None[source]

Print the content of the dataframe by column, randomly chosen

Parameters
  • df (dataframe) – The dataframe

  • n (int, optional) – The number of row to print, by default 5

  • with_column_names (bool, optional) – If true, print the column name, by default True

Returns

Nothing

Return type

None

disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True)[source]

Create a columns statistics summary.

Parameters
  • df (pd.DataFrame) – Input dataframe

  • column_indices (Union[int,slice,list], optional) – column indices

  • column_names (bool, optional) – If true, print the column name, by default True

Raises

ValueError – Error if columns types is not numeric.

disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) None[source]

Print empty rows

Parameters

df (pd.DataFrame) – Input dataframe.

disdrodb.l0.template_tools.print_valid_L0_column_names(sensor_name: str) None[source]

Print valid columns names from the standard.

Parameters

sensor_name (str) – Name of the sensor.

disdrodb.l0.template_tools.search_possible_columns(string: str, sensor_name: str) list[source]

Define possible column

Parameters
  • string (str) – Inpur string

  • sensor_name (str) – Name of the sensor

Returns

list of possible columns

Return type

list

disdrodb.l0.template_tools.str_has_decimal_digits(string: str) bool[source]

Check if a string has decimals

Parameters

string – Input string

Returns

True if sting has digits.

Return type

bool

disdrodb.l0.template_tools.str_is_integer(string: str) bool[source]

Check if a string is an integer

Parameters

string (Input string) –

Returns

True if integer.

Return type

bool

disdrodb.l0.template_tools.str_is_not_number(string: str) bool[source]

Check if a string is not numeric

Parameters

string (Input string) –

Returns

True if not float.

Return type

bool

disdrodb.l0.template_tools.str_is_number(string: str) bool[source]

Check if a string is numeric

Parameters

string (Input string) –

Returns

True if float.

Return type

bool

disdrodb.l0.utils_nc module

Module contents

disdrodb.l0.available_readers(data_sources=None, reader_path=False)[source]

Retrieve available readers information.

disdrodb.l0.check_archive_metadata_compliance(disdrodb_dir)[source]
disdrodb.l0.check_archive_metadata_geolocation(disdrodb_dir)[source]

Check the metadata files have missing or wrong geolocation..

Parameters

disdrodb_dir (str) – Path to the disdrodb directory.

Returns

If the check succeeds, the result is True, and if it fails, the result is False.

Return type

bool

disdrodb.l0.run_disdrodb_l0(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0 processing of DISDRODB stations.

This function enable to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.

Parameters
  • disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB

  • data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is None

  • campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is None

  • station_names (list) – Station names to process. The default is None

  • l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.

  • l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.

  • l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.

  • remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.

  • remove_l0b (bool) –

    Whether to remove the L0B files after having concatenated all L0B netCDF files.

    It takes places only if l0b_concat = True

    The default is False.

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is True.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default is False.

disdrodb.l0.run_disdrodb_l0_station(disdrodb_dir, data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0 processing of a specific DISDRODB station from the terminal.

Parameters
  • disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB

  • data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.

  • campaign_name (str) – Campaign name. Must be UPPER CASE.

  • station_name (str) – Station name

  • l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.

  • l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.

  • l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.

  • remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.

  • remove_l0b (bool) –

    Whether to remove the L0B files after having concatenated all L0B netCDF files.

    It takes places only if l0b_concat=True

    The default is False.

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is True.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default is False.

disdrodb.l0.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]

Run the L0A processing for a specific DISDRODB station.

Parameters
  • raw_dir (str) –

    The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

    <…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

    Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:

    • the raw_dir and processed_dir directory paths;

    • with the key ‘campaign_name’ within the metadata YAML files.

    • The campaign_name are expected to be UPPER CASE.

  • processed_dir (str) –

    The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:

    <…>/DISDRODB/Processed/<data_source>/<campaign_name>’

    For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).

  • station_name (str) – Station name

  • glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>

  • column_names (list) – Columns names of the raw text file.

  • reader_kwargs (dict) – Pandas read_csv arguments to open the text file.

  • df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame into DISDRODB L0A standard.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is False.

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 100 rows of 3 raw data files. The default is False.

disdrodb.l0.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]

Run the L0B processing for a specific DISDRODB station with raw netCDFs.

Parameters
  • raw_dir (str) –

    The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

    <…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

    Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:

    • the raw_dir and processed_dir directory paths;

    • with the key ‘campaign_name’ within the metadata YAML files.

    • The campaign_name are expected to be UPPER CASE.

  • processed_dir (str) –

    The desired directory path for the processed DISDRODB L0B products. The path should have the following structure:

    <…>/DISDRODB/Processed/<data_source>/<campaign_name>’

    For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).

  • station_name (str) – Station name

  • glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>. Example: glob_patterns = “*.nc”

  • dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.

  • ds_sanitizer_fun (object, optional) – Sanitizer function to format the raw netCDF into DISDRODB L0B standard.

  • force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.

  • verbose (bool) – Whether to print detailed processing information into terminal. The default is False.

  • parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.

  • debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw netCDF files. The default is False.