disdrodb.l0 package

Subpackages

disdrodb.l0.readers package

Submodules

disdrodb.l0.check_configs module

class disdrodb.l0.check_configs.NetcdfEncodingSchema(*, contiguous: bool, dtype: str, zlib: bool, complevel: int, shuffle: bool, fletcher32: bool, chunksizes: Optional[Union[int, List[int]]] = None)[source]

Bases: BaseModel

classmethod check_chunksizes(v, values)[source]

classmethod check_fletcher32(v, values)[source]

classmethod check_zlib(v, values)[source]

chunksizes: Optional[Union[int, List[int]]]

complevel: int

contiguous: bool

dtype: str

fletcher32: bool

shuffle: bool

zlib: bool

class disdrodb.l0.check_configs.RawDataFormatSchema(*, n_digits: Optional[int] = None, n_characters: Optional[int] = None, n_decimals: Optional[int] = None, n_naturals: Optional[int] = None, data_range: Optional[List[float]] = None, nan_flags: Optional[str] = None, valid_values: Optional[List[float]] = None, dimension_order: Optional[List[str]] = None, n_values: Optional[int] = None)[source]

Bases: BaseModel

classmethod check_list_length(value)[source]

data_range: Optional[List[float]]

dimension_order: Optional[List[str]]

n_characters: Optional[int]

n_decimals: Optional[int]

n_digits: Optional[int]

n_naturals: Optional[int]

n_values: Optional[int]

nan_flags: Optional[str]

valid_values: Optional[List[float]]

exception disdrodb.l0.check_configs.SchemaValidationException[source]

Bases: Exception

Exception raised when schema validation fails

disdrodb.l0.check_configs.check_all_sensors_configs() → None[source]: Check all sensors configs.

disdrodb.l0.check_configs.check_bin_consistency(sensor_name: str) → None[source]

Check bin consistency from config file.

Do not check the first and last bin !

Parameters: sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_cf_attributes(sensor_name: str) → None[source]

Check that variable_description, variable_long_name, variable_units dict values are strings.

Parameters: sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_l0a_encoding(sensor_name: str) → None[source]

Check l0a_encodings.yml file.

Parameters: sensor_name (str) – Name of the sensor.
Raises: ValueError – Error raised if the value of a key is not in the list of accepted values.

disdrodb.l0.check_configs.check_l0b_encoding(sensor_name: str) → None[source]

Check l0b_encodings.yml file based on the schema defined in the class NetcdfEncodingSchema.

Parameters: sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_raw_array(sensor_name: str) → None[source]

Check raw array consistency from config file.

Parameters: sensor_name (str) – Name of the sensor.
Raises: ValueError – Error if the chunksizes are not consistent.

disdrodb.l0.check_configs.check_raw_data_format(sensor_name: str) → None[source]

check raw_data_format.yml file based on the schema defined in the class RawDataFormatSchema.

Parameters: sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_sensor_configs(sensor_name: str) → None[source]

check sensor configs.

Parameters: sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.check_variable_consistency(sensor_name: str) → None[source]

Check variable consistency across config files.

The variables specified within l0b_encoding.yml must be defined also in the other config files.

Parameters: sensor_name (str) – Name of the sensor.
Raises: ValueError – If the keys are not consistent.

disdrodb.l0.check_configs.check_yaml_files_exists(sensor_name: str) → None[source]

Check if all config YAML files exist.

Parameters: sensor_name (str) – Name of the sensor.

disdrodb.l0.check_configs.get_bins_measurement(sensor_name: str, file_name: str) → list[source]

get bins measurement from config file.

Parameters

sensor_name (str) – Name of the sensor.
file_name (str) – File name (bins_velocity.yml or bins_diameter.yml)

Returns

List of chunksizes (center, bounds, width)

Return type

list

disdrodb.l0.check_configs.schema_error(object_to_validate: Union[str, list], schema: BaseModel, message) → bool[source]

Function that validate the schema of a given object with a given schema.

Parameters

object_to_validate (Union[str,list]) – Object to validate
schema (BaseModel) – Base model

disdrodb.l0.check_metadata module

disdrodb.l0.check_metadata.check_archive_metadata_campaign_name(disdrodb_dir) → bool[source]

Check metadata campaign_name.

Parameters: disdrodb_dir (str) – Path to the disdrodb directory.
Returns: If the check succeeds, the result is True, and if it fails, the result is False.
Return type: bool

disdrodb.l0.check_metadata.check_archive_metadata_compliance(disdrodb_dir)[source]

disdrodb.l0.check_metadata.check_archive_metadata_data_source(disdrodb_dir) → bool[source]

Check metadata data_source.

Parameters: disdrodb_dir (str) – Path to the disdrodb directory.
Returns: If the check succeeds, the result is True, and if it fails, the result is False.
Return type: bool

disdrodb.l0.check_metadata.check_archive_metadata_geolocation(disdrodb_dir)[source]

Check the metadata files have missing or wrong geolocation..

Parameters: disdrodb_dir (str) – Path to the disdrodb directory.
Returns: If the check succeeds, the result is True, and if it fails, the result is False.
Return type: bool

disdrodb.l0.check_metadata.check_archive_metadata_keys(disdrodb_dir: str) → bool[source]

Check that all metadata files have valid keys

Parameters: disdrodb_dir (str) – Path to the disdrodb directory.
Returns: If the check succeeds, the result is True, and if it fails, the result is False.
Return type: bool

disdrodb.l0.check_metadata.check_archive_metadata_reader(disdrodb_dir: str) → bool[source]

Check if the reader key is available and there is the associated reader.

Parameters: disdrodb_dir (str) – Path to the disdrodb directory.
Returns: If the check succeeds, the result is True, and if it fails, the result is False.
Return type: bool

disdrodb.l0.check_metadata.check_archive_metadata_sensor_name(disdrodb_dir) → bool[source]

Check metadata sensor name.

Parameters: disdrodb_dir (str) – Path to the disdrodb directory.
Returns: If the check succeeds, the result is True, and if it fails, the result is False.
Return type: bool

disdrodb.l0.check_metadata.check_archive_metadata_station_name(disdrodb_dir) → bool[source]

Check metadata station name.

Parameters: disdrodb_dir (str) – Path to the disdrodb directory.
Returns: If the check succeeds, the result is True, and if it fails, the result is False.
Return type: bool

disdrodb.l0.check_metadata.check_metadata_geolocation(metadata) → None[source]: Identify metadata with missing or wrong geolocation.

disdrodb.l0.check_metadata.get_archive_metadata_key_value(disdrodb_dir: str, key: str, return_tuple: bool = True)[source]

Return the values of a metadata key for all the archive. :param disdrodb_dir: Path to the disdrodb directory. :type disdrodb_dir: str :param key: Metadata key. :type key: str :param return_tuple: if True, returns a tuple of values with station, campaign and data source name (default is True)

if False, returns a list of values without station, campaign and data source name

Returns: List or tuple of values of the metadata key.
Return type: list or tuple

disdrodb.l0.check_metadata.identify_empty_metadata_keys(metadata_fpaths: list, keys: Union[str, list]) → None[source]

Identify empty metadata keys.

Parameters

metadata_fpaths (str) – Input YAML file path.
keys (Union[str,list]) – Attributes to verify the presence.

disdrodb.l0.check_metadata.identify_missing_metadata_coords(metadata_fpaths: str) → None[source]

Identify missing coordinates.

Parameters: metadata_fpaths (str) – Input YAML file path.
Raises: TypeError – Error if latitude or longitude coordinates are not present or are wrongly formatted.

disdrodb.l0.check_metadata.read_yaml(fpath: str) → dict[source]

Read YAML file.

Parameters: fpath (str) – Input YAML file path.
Returns: Attributes read from the YAML file.
Return type: dict

disdrodb.l0.check_readers module

disdrodb.l0.check_readers.check_all_readers() → None[source]

Test all readers that have data samples and ground truth.

Raises

Exception: If the reader validation has failed.

disdrodb.l0.check_readers.get_list_test_campaigns(data_source: str) → list[source]

Get list of test campaigns for a given data source.

Parameters: data_source (str) – Data source.
Returns: List of test campaigns.
Return type: list

disdrodb.l0.check_readers.get_list_test_data_sources() → list[source]

Get list of test data sources.

Returns: List of test data sources.
Return type: list

disdrodb.l0.check_readers.get_list_test_stations(data_source: str, campaign_name: str) → list[source]

Get list of test stations for a given data source and campaign.

Parameters

data_source (str) – Data source.
campaign_name (str) – Name of the campaign.

Returns

List of test stations.

Return type

list

disdrodb.l0.check_readers.is_parquet_files_identical(file1: str, file2: str) → bool[source]

Check if two parquet files are identical.

Parameters

file1 (str) – Path to the first file.
file2 (str) – Path to the second file.

Returns

True if the two files are identical, False otherwise.

Return type

bool

disdrodb.l0.check_readers.run_reader_on_test_data(data_source: str, campaign_name: str) → None[source]

Run reader over the data sample.

Parameters

data_source (str) – Data source.
campaign_name (str) – Campaign name.

disdrodb.l0.check_standards module

disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) → None[source]

Checks that the dataframe columns respects DISDRODB standards.

Parameters

df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.

Raises

ValueError – Error if some columns do not meet the DISDRODB standards or if the ‘time’ column is missing in the dataframe.

disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, verbose: bool = True) → None[source]

Checks that a file respects the DISDRODB L0A standards.

Parameters

df (pd.DataFrame) – L0A dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool, optional) – Wheter to verbose the processing. The default is True.

Raises

ValueError – Error if some columns have inconsistent values.

disdrodb.l0.check_standards.check_l0b_standards(x: str) → None[source]

disdrodb.l0.check_standards.check_sensor_name(sensor_name: str) → None[source]

Check sensor name.

Parameters

sensor_name (str) – Name of the sensor.

Raises

TypeError – Error if sensor_name is not a string.
ValueError – Error if the input sensor name has not been found in the list of available sensors.

disdrodb.l0.io module

disdrodb.l0.io.check_glob_pattern(pattern: str) → None[source]

Check if the input parameters is a string and if it can be used as pattern.

Parameters

pattern (str) – String to be checked.

Raises

TypeError – The input parameter is not a string.
ValueError – The input parameter can not be used as pattern.

disdrodb.l0.io.check_glob_patterns(patterns: Union[str, list]) → list[source]: Check if glob patterns are valids.

disdrodb.l0.io.check_processed_dir(processed_dir)[source]

Check input, format and validity of the directory path

Parameters: processed_dir (str) – Path of the processed directory
Returns: Path of the processed directory
Return type: str

disdrodb.l0.io.check_raw_dir(raw_dir: str, verbose: bool = False) → None[source]

Check validity of raw_dir.

Steps: 1. Check that ‘raw_dir’ is a valid directory path 2. Check that ‘raw_dir’ follows the expect directory structure 3. Check that each station_name directory contains data 4. Check that for each station_name the mandatory metadata.yml is specified. 4. Check that for each station_name the mandatory issue.yml is specified.

Parameters

raw_dir (str) – Input raw directory
verbose (bool, optional) – Wheter to verbose the processing. The default is False.

disdrodb.l0.io.create_directory_structure(processed_dir, product_level, station_name, force, verbose=False)[source]: Create directory structure for L0B and higher DISDRODB products.

disdrodb.l0.io.create_initial_directory_structure(raw_dir, processed_dir, station_name, force, verbose=False, product_level='L0A')[source]

Create directory structure for the first L0 DISDRODB product.

If the input data are raw text files –> product_level = “L0A” (run_l0a) If the input data are raw netCDF files –> product_level = “L0B” (run_l0b_nc)

disdrodb.l0.io.get_L0A_dir(processed_dir: str, station_name: str) → str[source]

Define L0A directory.

Parameters

processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station

Returns

L0A directory path.

Return type

str

disdrodb.l0.io.get_L0A_fname(df, processed_dir, station_name: str) → str[source]

Define L0A file name.

Parameters

df (pd.DataFrame) – L0A DataFrame
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station

Returns

L0A file name.

Return type

str

disdrodb.l0.io.get_L0A_fpath(df: DataFrame, processed_dir: str, station_name: str) → str[source]

Define L0A file path.

Parameters

df (pd.DataFrame) – L0A DataFrame.
processed_dir (str) – Path of the processed directory.
station_name (str) – Name of the station.

Returns

L0A file path.

Return type

str

disdrodb.l0.io.get_L0B_dir(processed_dir: str, station_name: str) → str[source]

Define L0B directory.

Parameters

processed_dir (str) – Path of the processed directory
station_name (int) – Name of the station

Returns

Path of the L0B directory

Return type

str

disdrodb.l0.io.get_L0B_fname(ds, processed_dir, station_name: str) → str[source]

Define L0B file name.

Parameters

ds (xr.Dataset) – L0B xarray Dataset
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station

Returns

L0B file name.

Return type

str

disdrodb.l0.io.get_L0B_fpath(ds: Dataset, processed_dir: str, station_name: str, l0b_concat=False) → str[source]

Define L0B file path.

Parameters

ds (xr.Dataset) – L0B xarray Dataset.
processed_dir (str) – Path of the processed directory.
station_name (str) – ID of the station
l0b_concat (bool) – If False, the file is specified inside the station directory. If True, the file is specified outside the station directory.

Returns

L0B file path.

Return type

str

disdrodb.l0.io.get_campaign_name(path: str) → str[source]

Return the campaign name from a file or directory path.

Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!

Parameters: base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
Returns: Name of the campaign.
Return type: str

disdrodb.l0.io.get_data_source(path: str) → str[source]

Return the data_source from a file or directory path.

Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!

Parameters: base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
Returns: Name of the campaign.
Return type: str

disdrodb.l0.io.get_dataframe_min_max_time(df: DataFrame)[source]

Retrieves dataframe starting and ending time.

Parameters: df (pd.DataFrame) – Input dataframe
Returns: (starting_time, ending_time)
Return type: tuple

disdrodb.l0.io.get_dataset_min_max_time(ds: Dataset)[source]

Retrieves dataset starting and ending time.

Parameters: ds (xr.Dataset) – Input dataset
Returns: (starting_time, ending_time)
Return type: tuple

disdrodb.l0.io.get_disdrodb_dir(path: str) → str[source]

Return the disdrodb base directory from a file or directory path.

Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!

Parameters: path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
Returns: Path of the DISDRODB directory.
Return type: str

disdrodb.l0.io.get_disdrodb_path(path: str) → str[source]

Return the path fron the disdrodb_dir directory.

Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!

Parameters: path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
Returns: Path inside the DISDRODB archive. Format: DISDRODB/<Raw or Processed>/<data_source>/…
Return type: str

disdrodb.l0.io.get_l0a_file_list(processed_dir, station_name, debugging_mode)[source]

Retrieve L0A files for a give station.

Parameters

processed_dir (str) – Directory of the campaign where to search for the L0A files. Format <..>/DISDRODB/Processed/<data_source>/<campaign_name>
station_name (str) – ID of the station
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.

Returns

list_fpaths – List of L0A file paths.

Return type

list

disdrodb.l0.io.get_raw_file_list(raw_dir, station_name, glob_patterns, verbose=False, debugging_mode=False)[source]

Get the list of files from a directory based on input parameters.

Currently concatenates all files provided by the glob patterns. In future, this might be modified to enable DISDRODB processing when raw data are separated in multiple files.

Parameters

raw_dir (str) – Directory of the campaign where to search for files. Format <..>/DISDRODB/Raw/<data_source>/<campaign_name>
station_name (str) – ID of the station
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.

Returns

list_fpaths – List of files file paths.

Return type

list

disdrodb.l0.io.read_L0A_dataframe(fpaths: Union[str, list], verbose: bool = False, debugging_mode: bool = False) → DataFrame[source]

Read DISDRODB L0A Apache Parquet file(s).

Parameters

fpaths (str or list) – Either a list or a single filepath .
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. If fpaths is a list, it reads only the first 3 files For each file it select only the first 100 rows. The default is False.

Returns

L0A Dataframe.

Return type

pd.DataFrame

disdrodb.l0.issue module

class disdrodb.l0.issue.NoDatesSafeLoader(stream)[source]

Bases: SafeLoader

classmethod remove_implicit_resolver(tag_to_remove)[source]

Remove implicit resolvers for a particular tag

Takes care not to modify resolvers in super classes.

We want to load datetimes as strings, not dates, because we go on to serialise as json which doesn’t have the advanced types of yaml, and leads to incompatibilities down the track.

disdrodb.l0.issue.check_issue_dict(issue_dict)[source]: Check validity of the issue dictionary

disdrodb.l0.issue.check_issue_file(fpath: str) → None[source]

Check issue YAML file validity.

Parameters: fpath (str) – Issue YAML file path.

disdrodb.l0.issue.check_time_periods(time_periods)[source]: Check time_periods validity.

disdrodb.l0.issue.check_timesteps(timesteps)[source]

Check timesteps validity.

It expects timesteps string in YYYY-mm-dd HH:MM:SS format with second accuracy. If timesteps is None, return None.

disdrodb.l0.issue.is_numpy_array_datetime(arr)[source]

Check if the numpy array contains datetime64

Parameters: arr (numpy array) – Numpy array to check.
Returns: Numpy array checked.
Return type: numpy array

disdrodb.l0.issue.is_numpy_array_string(arr)[source]

Check if the numpy array contains strings

Parameters: arr (numpy array) – Numpy array to check.

disdrodb.l0.issue.load_yaml_without_date_parsing(filepath)[source]: Read a YAML file without converting automatically date string to datetime.

disdrodb.l0.issue.read_issue(raw_dir: str, station_name: str) → dict[source]

Read YAML issue file.

Parameters

raw_dir (str) – Path of the campaign raw directory.
station_name (int) – Station name.

Returns

Issue dictionary.

Return type

dict

disdrodb.l0.issue.read_issue_file(fpath: str) → dict[source]

Read YAML issue file.

Parameters: fpath (str) – Filepath of the issue YAML.
Returns: Issue dictionary.
Return type: dict

disdrodb.l0.issue.write_default_issue(fpath: str) → None[source]

Write an empty issue YAML file.

Parameters: fpath (str) – Filepath of the issue YAML to write.

disdrodb.l0.issue.write_issue_dict(fpath: str, issue_dict: dict) → None[source]

Write the issue YAML file.

Parameters

fpath (str) – Filepath of the issue YAML to write.
issue_dict (dict) – Issue dictionary

disdrodb.l0.l0_processing module

disdrodb.l0.l0_processing.click_l0_archive_options(function: object)[source]

Click command line arguments for L0 processing archiving of a station.

Parameters: function (object) – Function.

disdrodb.l0.l0_processing.click_l0_processing_options(function: object)[source]

Click command line default parameters for L0 processing options.

Parameters: function (object) – Function.

disdrodb.l0.l0_processing.click_l0_station_arguments(function: object)[source]

Click command line arguments for L0 processing of a station.

Parameters: function (object) – Function.

disdrodb.l0.l0_processing.click_l0_stations_options(function: object)[source]

Click command line options for DISDRODB archive L0 processing.

Parameters: function (object) – Function.

disdrodb.l0.l0_processing.click_l0b_concat_options(function: object)[source]

Click command line default parameters for L0B concatenation.

Parameters: function (object) – Function.

disdrodb.l0.l0_processing.run_disdrodb_l0(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0 processing of DISDRODB stations.

This function enable to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.

Parameters

disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is None
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is None
station_names (list) – Station names to process. The default is None
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –

Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat = True

The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default is False.

disdrodb.l0.l0_processing.run_disdrodb_l0_station(disdrodb_dir, data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0 processing of a specific DISDRODB station from the terminal.

Parameters

disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –

Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat=True

The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default is False.

disdrodb.l0.l0_processing.run_disdrodb_l0a(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

disdrodb.l0.l0_processing.run_disdrodb_l0a_station(disdrodb_dir, data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]: Run the L0A processing of a station calling run_disdrodb_l0a_station in the terminal.

disdrodb.l0.l0_processing.run_disdrodb_l0b(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

disdrodb.l0.l0_processing.run_disdrodb_l0b_station(disdrodb_dir, data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]: Run the L0B processing of a station calling run_disdrodb_l0b_station in the terminal.

disdrodb.l0.l0_processing.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]

Run the L0A processing for a specific DISDRODB station.

Parameters

raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
- the raw_dir and processed_dir directory paths;
- with the key ‘campaign_name’ within the metadata YAML files.
- The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:

<…>/DISDRODB/Processed/<data_source>/<campaign_name>’

For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>
column_names (list) – Columns names of the raw text file.
reader_kwargs (dict) – Pandas read_csv arguments to open the text file.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame into DISDRODB L0A standard.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 100 rows of 3 raw data files. The default is False.

disdrodb.l0.l0_processing.run_l0b(processed_dir, station_name, parallel, force, verbose, debugging_mode)[source]

Run the L0B processing for a specific DISDRODB station.

Parameters

raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
- the raw_dir and processed_dir directory paths;
- with the key ‘campaign_name’ within the metadata YAML files.
- The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:

<…>/DISDRODB/Processed/<data_source>/<campaign_name>’

For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. Ensure that the threads_per_worker (number of thread per process) is set to 1 to avoid HDF errors. Also ensure to set the HDF5_USE_FILE_LOCKING environment variable to False. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just 3 raw data files. The default is False.

disdrodb.l0.l0_processing.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]

Run the L0B processing for a specific DISDRODB station with raw netCDFs.

Parameters

raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
- the raw_dir and processed_dir directory paths;
- with the key ‘campaign_name’ within the metadata YAML files.
- The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0B products. The path should have the following structure:

<…>/DISDRODB/Processed/<data_source>/<campaign_name>’

For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>. Example: glob_patterns = “*.nc”
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (object, optional) – Sanitizer function to format the raw netCDF into DISDRODB L0B standard.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw netCDF files. The default is False.

disdrodb.l0.l0_reader module

disdrodb.l0.l0_reader.available_readers(data_sources=None, reader_path=False)[source]: Retrieve available readers information.

disdrodb.l0.l0_reader.check_available_readers()[source]: Check the readers arguments of all package.

disdrodb.l0.l0_reader.check_reader_arguments(reader)[source]: Check the reader have the expected input arguments.

disdrodb.l0.l0_reader.check_reader_exists(reader_data_source: str, reader_name: str) → str[source]

Check if the provided data source exists and reader names exists within the available readers.

Please run get_available_readers_dict() to get the list of all available reader.

Parameters

reader_data_source (str) – The directory within which the reader_name is located in the disdrodb.l0.readers directory.
reader_name (str) – Campaign name

Returns

If True : returns the reader name If False : Error - return None

Return type

str

Raises

ValueError – Error if the reader name provided for the campaign has not been found.

disdrodb.l0.l0_reader.get_available_readers_dict() → dict[source]

Returns the readers description included into the current release of DISDRODB.

Returns: The dictionary has the following schema {“data_source”: {“reader_name”: “reader_file_path”}}
Return type: dict

disdrodb.l0.l0_reader.get_reader(reader_data_source: str, reader_name: str) → object[source]

Returns the reader function based on input parameters.

Parameters

reader_data_source (str) – The directory within which the reader_name is located in the disdrodb.l0.readers directory.
reader_name (str) – The reader name.

Returns

The reader() function

Return type

object

disdrodb.l0.l0_reader.get_reader_from_metadata_reader_key(reader_data_source_name)[source]

Retrieve the reader from the reader metadata value.

The convention for metadata reader key: <data_source/reader_name> in disdrodb.l0.readers

disdrodb.l0.l0_reader.get_station_reader(disdrodb_dir, data_source, campaign_name, station_name)[source]: Retrieve reader form station metadata information.

disdrodb.l0.l0_reader.is_documented_by(original)[source]

Wrapper function to apply generic docstring to the decorated function.

Parameters: original (function) – Function to take the docstring from.

disdrodb.l0.l0_reader.reader_generic_docstring()[source]

Script to convert the raw data to L0A format.

Parameters

raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
- the raw_dir and processed_dir directory paths;
- with the key ‘campaign_name’ within the metadata YAML files.
- The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:

<…>/DISDRODB/Processed/<data_source>/<campaign_name>’

For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw data files. The default is False.

disdrodb.l0.l0a_processing module

Functions to process raw text files into DISDRODB L0A Apache Parquet.

disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str, verbose: bool = False) → DataFrame[source]

Convert ‘object’ dataframe columns into DISDRODB L0A dtype standards.

Parameters

df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with corrected columns types.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str, verbose: bool = False) → DataFrame[source]

Coerce corrupted values in dataframe numeric columns to np.nan.

Parameters

df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with string columns without corrupted values.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, verbose: bool = False) → DataFrame[source]

Concatenate a list of dataframes.

Parameters

list_df (list) – List of dataframes.
verbose (bool, optional) – If True, print messages. If False, no print.

Returns

Concatenated dataframe.

Return type

pd.DataFrame

Raises

ValueError – Concatenation can not be done.

disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source]: Drop problematic time_period.

disdrodb.l0.l0a_processing.drop_timesteps(df, timesteps)[source]: Drop problematic time steps.

disdrodb.l0.l0a_processing.preprocess_reader_kwargs(reader_kwargs: dict) → dict[source]

Preprocess arguments required to read raw text file into Pandas.

Parameters: reader_kwargs (dict) – Initial parameter dictionary.
Returns: Parameter dictionary that matches either Pandas or Dask.
Return type: dict

disdrodb.l0.l0a_processing.process_raw_file(filepath, column_names, reader_kwargs, df_sanitizer_fun, sensor_name, verbose=True, issue_dict={})[source]

Read and parse a raw text files into a L0A dataframe.

Parameters

filepath (str) – File path
column_names (list) – Columns names.
reader_kwargs (dict) – Pandas read_csv arguments.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing. The default is True
issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are ‘timesteps’ and ‘time_periods’. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.

Returns

Dataframe

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.read_raw_data(filepath: str, column_names: list, reader_kwargs: dict) → DataFrame[source]

Read raw data into a dataframe.

Parameters

filepath (str) – Raw file path.
column_names (list) – Column names.
reader_kwargs (dict) – Pandas pd.read_csv arguments.

Returns

Pandas dataframe.

Return type

pandas.DataFrame

disdrodb.l0.l0a_processing.read_raw_file_list(file_list: Union[list, str], column_names: list, reader_kwargs: dict, sensor_name: str, verbose: bool, df_sanitizer_fun: Optional[object] = None) → DataFrame[source]

Read and parse a list for raw files into a dataframe.

Parameters

file_list (Union[list,str]) – File(s) path(s)
column_names (list) – Columns names.
reader_kwargs (dict) – Pandas read_csv arguments.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.

Returns

Dataframe

Return type

pd.DataFrame

Raises

ValueError – Input parameters can not be used or the raw file can not be processed.

disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source]

Remove corrupted rows by checking conversion of raw fields to numeric.

Note: The raw array must be stripped away from delimiter at start and end !

disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, verbose: bool = False)[source]

Remove duplicated timesteps.

It keep only the first timestep occurence !

Parameters

df (pd.DataFrame) – Input dataframe.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with valid unique timesteps.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, verbose=False)[source]

Drop dataframe rows with timesteps listed in the issue dictionary.

Parameters

df (pd.DataFrame) – Input dataframe.
issue_dict (dict) – Issue dictionary

Returns

Dataframe with problematic timesteps removed.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: DataFrame, verbose: bool = False)[source]

Remove dataframe rows where the “time” is NaT.

Parameters

df (pd.DataFrame) – Input dataframe.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with valid timesteps.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, verbose)[source]

Set values corresponding to nan_flags to np.nan.

Parameters

df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe without nan_flags values.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, verbose)[source]

Set values outside the data range as np.nan.

Parameters

df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe without values outside the expected data range.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.set_nan_unvalid_values(df, sensor_name, verbose)[source]

Set unvalid (class) values to np.nan.

Parameters

df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe without unvalid values.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source]: Remove the first and last delimiter occurence from the raw array fields.

disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str, verbose: bool = False) → DataFrame[source]

Strip leading/trailing spaces from dataframe string columns.

Parameters

df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataframe with string columns without leading/trailing spaces.

Return type

pd.DataFrame

disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, fpath: str, force: bool = False, verbose: bool = False)[source]

Save the dataframe into an Apache Parquet file.

Parameters

df (pd.DataFrame) – Input dataframe.
fpath (str) – Output file path.
force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.
verbose (bool, optional) – Wheter to verbose the processing. The default is False.

Raises

ValueError – The input dataframe can not be written as an Apache Parquet file.
NotImplementedError – The input dataframe can not be processed.

disdrodb.l0.l0b_concat module

disdrodb.l0.l0b_concat.run_disdrodb_l0b_concat(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, remove_l0b=False, verbose=False)[source]

Concatenate the L0B files of the DISDRODB archive.

This function is called by the run_disdrodb_l0b_concat script.

disdrodb.l0.l0b_concat.run_disdrodb_l0b_concat_station(disdrodb_dir, data_source, campaign_name, station_name, remove_l0b=False, verbose=False)[source]

Concatenate the L0B files of a single DISDRODB station.

This function runs the run_disdrodb_l0b_concat_station script in the terminal.

disdrodb.l0.l0b_processing module

Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.

disdrodb.l0.l0b_processing.add_dataset_crs_coords(ds)[source]: Add the CRS coordinate to the xr.Dataset

disdrodb.l0.l0b_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source]: Add missing Dataset variables as nan DataArrays.

disdrodb.l0.l0b_processing.convert_object_variables_to_string(ds: Dataset) → Dataset[source]

Convert variables with object dtype to string.

Parameters: ds (xr.Dataset) – Input dataset.
Returns: Output dataset.
Return type: xr.Dataset

disdrodb.l0.l0b_processing.create_l0b_from_l0a(df: DataFrame, attrs: dict, verbose: bool = False) → Dataset[source]

Transform the L0A dataframe to the L0B xr.Dataset.

Parameters

df (pd.DataFrame) – DISDRODB L0A dataframe.
attrs (dict) – Station metadata.
verbose (bool, optional) – Wheter to verbose the processing. The default is False.

Returns

DISDRODB L0B dataset.

Return type

xr.Dataset

Raises

ValueError – Error if the DISDRODB L0B xarray dataset can not be created.

disdrodb.l0.l0b_processing.format_string_array(string: str, n_values: int) → array[source]

Split a string with multiple numbers separated by a delimiter into an 1D array.

e.g. : format_string_array(“2,44,22,33”, 4) will return [ 2. 44. 22. 33.]

If empty string (“”) –> Return an arrays of zeros If the list length is not n_values -> Return an arrays of np.nan

The function strip potential delimiters at start and end before splitting.

Parameters

string (str) – Input string
n_values (int) – Expected length of the output array.

Returns

array of float

Return type

np.array

disdrodb.l0.l0b_processing.get_bin_coords(sensor_name: str) → dict[source]

Retrieve diameter (and velocity) bin coordinates.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Dictionary with coordinate arrays.
Return type: dict

disdrodb.l0.l0b_processing.infer_split_str(string: str) → str[source]

Infer the delimeter inside a string.

Parameters: string (str) – Input string.
Returns: Inferred delimiter.
Return type: str

disdrodb.l0.l0b_processing.preprocess_raw_netcdf(ds, dict_names, sensor_name)[source]

This function preprocess raw netCDF to improve compatibility with DISDRODB standards.

This function checks validity of the dict_names, rename and subset the data accordingly. If some variables specified in the dict_names are missing, it adds a NaN DataArray !

Parameters

ds (xr.Dataset) – Raw netCDF to be converted to DISDRODB standards.
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
sensor_name (str) – Sensor name.

Returns

ds – xarray Dataset with DISDRODB-compliant variable naming conventions.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.process_raw_nc(filepath, dict_names, ds_sanitizer_fun, sensor_name, verbose, attrs)[source]

Read and convert a raw netCDF into a DISDRODB L0B netCDF.

Parameters

filepath (str) – netCDF file path.
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (function) – Sanitizer function to do ad-hoc processing of the xr.Dataset.
attrs (dict) – Global metadata to attach as global attributes to the xr.Dataset.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.

Returns

L0B xr.Dataset

Return type

xr.Dataset

disdrodb.l0.l0b_processing.rechunk_dataset(ds: Dataset, encoding_dict: dict) → Dataset[source]

Coerce the dataset arrays to have the chunk size specified in the encoding dictionary.

Parameters

ds (xr.Dataset) – Input xarray dataset
encoding_dict (dict) – Dictionary containing the encoding to write the xarray dataset as a netCDF.

Returns

Output xarray dataset

Return type

xr.Dataset

disdrodb.l0.l0b_processing.rename_dataset(ds, dict_names)[source]: Rename Dataset variables, coordinates and dimensions.

disdrodb.l0.l0b_processing.replace_custom_nan_flags(ds, dict_nan_flags)[source]

Set values corresponding to nan_flags to np.nan.

Parameters

df (xr.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan

Returns

Dataset without nan_flags values.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.replace_nan_flags(ds, sensor_name, verbose)[source]

Set values corresponding to nan_flags to np.nan.

Parameters

ds (xr.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan
verbose (bool) – Wheter to verbose the processing.

Returns

Dataset without nan_flags values.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.reshape_raw_spectrum(arr: array, dims_order: list, dims_size_dict: dict, n_timesteps: int) → array[source]

Reshape the raw spectrum to a 2D+time array.

The array has dimensions [“time”] + dims_order

Parameters

arr (np.array) – Input array.
dims_order (list) –
The order of dimension in the raw spectrum.

Examples: - OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dims_order = [“diameter_bin_center”, “velocity_bin_center”] - Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dims_order = [“velocity_bin_center”, “diameter_bin_center”]
dims_size_dict (dict) – Dictionary with the number of bins for each dimension. For OTT_Parsivel: {“diameter_bin_center”: 32, “velocity_bin_center”: 32} For This_LPM {“diameter_bin_center”: 22, “velocity_bin_center”: 20}
n_timesteps (int) – Number of timesteps.

Returns

Output array.

Return type

np.array

Raises

ValueError – Impossible to reshape the raw_spectrum matrix

disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, verbose: bool = False) → dict[source]

Retrieves the L0B data matrix.

Parameters

df (pd.DataFrame) – Input dataframe
sensor_name (str) – Name of the sensor

Returns

Dictionary with data arrays.

Return type

dict

disdrodb.l0.l0b_processing.sanitize_encodings_dict(encoding_dict: dict, ds: Dataset) → dict[source]

Ensure chunk size to be smaller than the array shape.

Parameters

encoding_dict (dict) – Dictionary containing the encoding to write DISDRODB L0B netCDFs.
ds (xr.Dataset) – Input dataset.

Returns

Encoding dictionary.

Return type

dict

disdrodb.l0.l0b_processing.set_coordinate_attributes(ds)[source]

disdrodb.l0.l0b_processing.set_dataset_attrs(ds, sensor_name)[source]: Set variable and coordinates attributes.

disdrodb.l0.l0b_processing.set_encodings(ds: Dataset, sensor_name: str) → Dataset[source]

Apply the encodings to the xarray Dataset.

Parameters

ds (xr.Dataset) – Input xarray dataset.
sensor_name (str) – Name of the sensor.

Returns

Output xarray dataset.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.set_nan_outside_data_range(ds, sensor_name, verbose)[source]

Set values outside the data range as np.nan.

Parameters

ds (xr.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataset without values outside the expected data range.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.set_nan_unvalid_values(ds, sensor_name, verbose)[source]

Set unvalid (class) values to np.nan.

Parameters

ds (xr.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.

Returns

Dataset without unvalid values.

Return type

xr.Dataset

disdrodb.l0.l0b_processing.set_variable_attributes(ds: Dataset, sensor_name: str) → Dataset[source]

Set attributes to each xr.Dataset variable.

Parameters

ds (xr.Dataset) – Input dataset.
sensor_name (str) – Name of the sensor.

Returns

xr.Dataset.

Return type

ds

disdrodb.l0.l0b_processing.subset_dataset(ds, dict_names, sensor_name)[source]

disdrodb.l0.l0b_processing.write_l0b(ds: Dataset, fpath: str, force=False) → None[source]

Save the xarray dataset into a NetCDF file.

Parameters

ds (xr.Dataset) – Input xarray dataset.
fpath (str) – Output file path.
sensor_name (str) – Name of the sensor.
force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.

disdrodb.l0.metadata module

disdrodb.l0.metadata.add_missing_metadata_keys(metadata)[source]: Add missing keys to the metadata dictionary.

disdrodb.l0.metadata.check_metadata_compliance(disdrodb_dir, data_source, campaign_name, station_name)[source]: Check DISDRODB metadata compliance.

disdrodb.l0.metadata.create_campaign_default_metadata(disdrodb_dir, campaign_name, data_source)[source]

Create default YAML metadata files for all stations within a campaign.

Use the function with caution to avoid overwrite existing YAML files.

disdrodb.l0.metadata.get_default_metadata_dict() → dict[source]

Get DISDRODB metadata default values.

Returns: Dictionary of attibutes standard
Return type: dict

disdrodb.l0.metadata.get_metadata_missing_keys(metadata)[source]: Return the DISDRODB metadata keys which are missing.

disdrodb.l0.metadata.get_metadata_unvalid_keys(metadata)[source]: Return the DISDRODB metadata keys which are not valid.

disdrodb.l0.metadata.get_valid_metadata_keys() → list[source]

Get DISDRODB valid metadata list.

Returns: List of valid metadata keys
Return type: list

disdrodb.l0.metadata.read_metadata(campaign_dir: str, station_name: str) → dict[source]

Read YAML metadata file.

Parameters

raw_dir (str) – Path of the raw directory
station_name (int) – Id of the station.

Returns

Dictionnary of the metadata.

Return type

dict

disdrodb.l0.metadata.remove_unvalid_metadata_keys(metadata)[source]: Remove unvalid keys from the metadata dictionary.

disdrodb.l0.metadata.sort_metadata_dictionary(metadata)[source]: Sort the keys of the metadata dictionary by valid_metadata_keys list order.

disdrodb.l0.metadata.write_default_metadata(fpath: str) → None[source]

Create default YAML metadata file at the specified filepath.

Parameters: fpath (str) – File path

disdrodb.l0.metadata.write_metadata(metadata, fpath)[source]: Write dictionary to YAML file.

disdrodb.l0.standards module

disdrodb.l0.standards.available_sensor_name() → sorted[source]

Get available names of sensors.

Returns: Sorted list of the available sensors
Return type: sorted

disdrodb.l0.standards.get_L0A_encodings_dict(sensor_name: str) → dict[source]

Get a dictionary containing the L0A encodings

Parameters: sensor_name (str) – Name of the sensor.
Returns: L0A encodings
Return type: dict

disdrodb.l0.standards.get_L0B_encodings_dict(sensor_name: str) → dict[source]

Get a dictionary containing the encoding to write L0B netCDFs.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Encoding to write L0B netCDFs
Return type: dict

disdrodb.l0.standards.get_configs_dir(sensor_name: str) → str[source]

Retrieve configs directory.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Config directory.
Return type: str
Raises: ValueError – Error if the config directory does not exist.

disdrodb.l0.standards.get_coords_attrs_dict(ds)[source]: Return dictionary with DISDRODB coordinates attributes.

disdrodb.l0.standards.get_data_format_dict(sensor_name: str) → dict[source]

Get a dictionary containing the data format of each sensor variable.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Data format of each sensor variable
Return type: dict

disdrodb.l0.standards.get_data_range_dict(sensor_name: str) → dict[source]

Get the variable data range.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Dictionary with the expected data value range for each data field. It excludes variables without specified data_range key.
Return type: dict

disdrodb.l0.standards.get_description_dict(sensor_name: str) → dict[source]

Get a dictionary containing the description of each sensor variable.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Description of each sensor variable.
Return type: dict

disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) → list[source]

Get diameter bin center.

Parameters: sensor_name (str) – Name of the sensor
Returns: Diameter bin center
Return type: list

disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) → list[source]

Get diameter bin lower bound.

Parameters: sensor_name (str) – Name of the sensor
Returns: Diameter bin lower bound
Return type: list

disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) → list[source]

Get diameter bin upper bound.

Parameters: sensor_name (str) – Name of the sensor
Returns: Diameter bin upper bound
Return type: list

disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) → list[source]

Get diameter bin width.

Parameters: sensor_name (str) – Name of the sensor
Returns: Diameter bin width
Return type: list

disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) → dict[source]

Get dictionary with sensor_name diameter bins information.

Parameters: sensor_name (str) – Name of the sensor.
Returns: sensor_name diameter bins information
Return type: dict

disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) → dict[source]

Get the number of bins for each dimension.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Dictionary with the number of bins for each dimension.
Return type: dict

disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) → dict[source]

Get the total number of characters from the instrument default string standards.

Important note: it accounts also for the comma and the minus sign !!!

Parameters: sensor_name (str) – Name of the sensor.
Returns: Dictionary with the expected number of characters for each data field.
Return type: dict

disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) → dict[source]

Get number of digits on the right side of the comma from the instrument default string standards.

Example: 123,45 -> 45 –> 2 decimal digits :param sensor_name: Name of the sensor. :type sensor_name: dict

Returns: Dictionary with the expected number of decimal digits for each data field.
Return type: dict

disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) → dict[source]

Get number of digits from the instrument default string standards.

Important note: it excludes the comma but it counts the minus sign !!!

Parameters: sensor_name (str) – Name of the sensor.
Returns: Dictionary with the expected number of digits for each data field.
Return type: dict

disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) → dict[source]

Get number of digits on the left side of the comma from the instrument default string standards.

Example: 123,45 -> 123 –> 3 natural digits

Parameters: sensor_name (str) – Name of the sensor.
Returns: Dictionary with the expected number of natural digits for each data field.
Return type: dict

disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) → dict[source]

Get a dictionary containing the L0A dtype.

Parameters: sensor_name (str) – Name of the sensor.
Returns: L0A dtype
Return type: dict

disdrodb.l0.standards.get_long_name_dict(sensor_name: str) → dict[source]

Get a dictionary containing the long name of each sensor variable.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Long name of each sensor variable.
Return type: dict

disdrodb.l0.standards.get_n_diameter_bins(sensor_name)[source]: Get the number of diameter bins.

disdrodb.l0.standards.get_n_velocity_bins(sensor_name)[source]: Get the number of velocity bins.

disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) → dict[source]

Get the variable nan_flags.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Dictionary with the expected nan_flags list for each data field. It excludes variables without specified nan_flags key.
Return type: dict

disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) → dict[source]

Get the dimension order of the raw fields.

The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.

Examples

OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]

Parameters: sensor_name (str) – Name of the sensor
Returns: Dimension order dictionary
Return type: dict

disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) → dict[source]

Get a dictionary with the number of values expected for each raw array.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Field definition.
Return type: dict

disdrodb.l0.standards.get_sensor_variables(sensor_name: str) → list[source]

Get sensor variable names list.

Parameters: sensor_name (str) – Name of the sensor.
Returns: List of the variables values
Return type: list

disdrodb.l0.standards.get_time_encoding() → dict[source]

Create time encoding

Returns: Time encoding
Return type: dict

disdrodb.l0.standards.get_units_dict(sensor_name: str) → dict[source]

Get a dictionary containing the unit of each sensor variable.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Unit of each sensor variable
Return type: dict

disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source]: Get list of valid coordinates.

disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source]: Get list of valid dimension names.

disdrodb.l0.standards.get_valid_names(sensor_name)[source]

disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) → dict[source]

Get the list of valid values for a variable.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Dictionary with the expected values for specific variables. It excludes variables without specified valid_values key.
Return type: dict

disdrodb.l0.standards.get_valid_variable_names(sensor_name)[source]: Get list of valid variables.

disdrodb.l0.standards.get_variables_dict(sensor_name: str) → dict[source]

Get a dictionary containing the variable name of the sensor field numbers.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Variables names
Return type: dict

disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source]: Returns a dictionary with the variable dimensions of a L0B product.

disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) → list[source]

Get velocity bin center.

Parameters: sensor_name (str) – Name of the sensor
Returns: Velocity bin center
Return type: list

disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) → list[source]

Get velocity bin lower bound.

Parameters: sensor_name (str) – Name of the sensor
Returns: Velocity bin lower bound.
Return type: list

disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) → list[source]

Get velocity bin upper bound.

Parameters: sensor_name (str) – Name of the sensor
Returns: Velocity bin upper bound
Return type: list

disdrodb.l0.standards.get_velocity_bin_width(sensor_name: str) → list[source]

Get velocity bin width.

Parameters: sensor_name (str) – Name of the sensor
Returns: Velocity bin width
Return type: list

disdrodb.l0.standards.get_velocity_bins_dict(sensor_name: str) → dict[source]

Get velocity with sensor_name diameter bins information.

Parameters: sensor_name (str) – Name of the sensor.
Returns: Sensor_name diameter bins information
Return type: dict

disdrodb.l0.standards.read_config_yml(sensor_name: str, filename: str) → dict[source]

Read a config yaml file and return the dictionary.

Parameters

sensor_name (str) – Name of the sensor.
filename (str) – Name of the file.

Returns

Content of the config file.

Return type

dict

Raises

ValueError – Error if file does not exist.

disdrodb.l0.standards.set_disdrodb_attrs(ds, product_level: str)[source]

Add DISDRODB processing information to the netCDF global attributes.

It assumes stations metadata are already added the dataset.

Parameters

ds (xarray dataset) – Dataset
product_level (str) – DISDRODB product_level

Returns

Dataset

Return type

xarray dataset

disdrodb.l0.summary module

disdrodb.l0.template_tools module

disdrodb.l0.template_tools.arr_has_constant_nchar(arr: array) → bool[source]

Check if the content of an array has a constant number of characters

Parameters: arr (numpy.ndarray) – The array to analyse
Returns: True if the number of character is constant
Return type: booleen

disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) → None[source]

Checks that the columnn names respects DISDRODB standards.

Parameters

column_names (list) – List of columns names.
sensor_name (str) – Name of the sensor.

Raises

TypeError – Error if some columns do not meet the DISDRODB standards.

disdrodb.l0.template_tools.get_decimal_ndigits(string: str) → int[source]

Get the decimal number of digit.

Parameters: string (str) – Input string
Returns: The number of digit.
Return type: int

disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True)[source]

Create a dictionary {column: unique values}

Parameters

df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – column indices
column_names (bool, optional) – If true, print the column name, by default True

disdrodb.l0.template_tools.get_natural_ndigits(string: str) → int[source]

Get the natural number of digit.

Parameters: string (str) – Input string
Returns: The number of digit.
Return type: int

disdrodb.l0.template_tools.get_nchar(string: str) → int[source]

Get the number of charactar.

Parameters: string (str) – Input string
Returns: Number of charactar
Return type: int

disdrodb.l0.template_tools.get_ndigits(string: str) → int[source]

Get the number of digit.

Parameters: string (str) – Input string
Returns: Number of digit
Return type: int

disdrodb.l0.template_tools.get_possible_keys(dict_options: dict, desired_value: str) → set[source]

Get the possible keys from the input values

Parameters

dict_options (dict) – Input dictionnary
desired_value (str) – Input value

Returns

Keys that the value matches the desired input value.

Return type

set

disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 1)[source]

Try to guess the dataframe columns names based on string characteristics.

Parameters

df (numpy.ndarray) – The array to analyse
sensor_name (str) – name of the sensor
row_idx (int, optional) – The row ID of the array, by default 1

Returns

Dictionary with the keys being the column id and the values being the guessed column names

Return type

dict

disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) → None[source]

Print dataframe columns names

Parameters: df (dataframe) – The dataframe
Returns: Nothing
Return type: None

disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True) → None[source]

Print columns’ unique values

Parameters

df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – column indices
column_names (bool, optional) – If true, print the column name, by default True

disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, column_names: bool = True) → None[source]

Print the n first n rows dataframe by column.

Parameters

df (pd.DataFrame) – Input dataframe
n (int, optional) – Number of row, by default 5
column_names (bool , optional) – If true columns name are printed, by default True

disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, with_column_names: bool = True) → None[source]

Print the content of the dataframe by column, randomly chosen

Parameters

df (dataframe) – The dataframe
n (int, optional) – The number of row to print, by default 5
with_column_names (bool, optional) – If true, print the column name, by default True

Returns

Nothing

Return type

None

disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True)[source]

Create a columns statistics summary.

Parameters

df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – column indices
column_names (bool, optional) – If true, print the column name, by default True

Raises

ValueError – Error if columns types is not numeric.

disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) → None[source]

Print empty rows

Parameters: df (pd.DataFrame) – Input dataframe.

disdrodb.l0.template_tools.print_valid_L0_column_names(sensor_name: str) → None[source]

Print valid columns names from the standard.

Parameters: sensor_name (str) – Name of the sensor.

disdrodb.l0.template_tools.search_possible_columns(string: str, sensor_name: str) → list[source]

Define possible column

Parameters

string (str) – Inpur string
sensor_name (str) – Name of the sensor

Returns

list of possible columns

Return type

list

disdrodb.l0.template_tools.str_has_decimal_digits(string: str) → bool[source]

Check if a string has decimals

Parameters: string – Input string
Returns: True if sting has digits.
Return type: bool

disdrodb.l0.template_tools.str_is_integer(string: str) → bool[source]

Check if a string is an integer

Parameters: string (Input string) –
Returns: True if integer.
Return type: bool

disdrodb.l0.template_tools.str_is_not_number(string: str) → bool[source]

Check if a string is not numeric

Parameters: string (Input string) –
Returns: True if not float.
Return type: bool

disdrodb.l0.template_tools.str_is_number(string: str) → bool[source]

Check if a string is numeric

Parameters: string (Input string) –
Returns: True if float.
Return type: bool

disdrodb.l0.utils_nc module

Module contents

disdrodb.l0.available_readers(data_sources=None, reader_path=False)[source]: Retrieve available readers information.

disdrodb.l0.check_archive_metadata_compliance(disdrodb_dir)[source]

disdrodb.l0.check_archive_metadata_geolocation(disdrodb_dir)[source]

Check the metadata files have missing or wrong geolocation..

Parameters: disdrodb_dir (str) – Path to the disdrodb directory.
Returns: If the check succeeds, the result is True, and if it fails, the result is False.
Return type: bool

disdrodb.l0.run_disdrodb_l0(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0 processing of DISDRODB stations.

This function enable to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.

Parameters

disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is None
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is None
station_names (list) – Station names to process. The default is None
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –

Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat = True

The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default is False.

disdrodb.l0.run_disdrodb_l0_station(disdrodb_dir, data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]

Run the L0 processing of a specific DISDRODB station from the terminal.

Parameters

disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –

Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat=True

The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default is False.

disdrodb.l0.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]

Run the L0A processing for a specific DISDRODB station.

Parameters

raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
- the raw_dir and processed_dir directory paths;
- with the key ‘campaign_name’ within the metadata YAML files.
- The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:

<…>/DISDRODB/Processed/<data_source>/<campaign_name>’

For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>
column_names (list) – Columns names of the raw text file.
reader_kwargs (dict) – Pandas read_csv arguments to open the text file.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame into DISDRODB L0A standard.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 100 rows of 3 raw data files. The default is False.

disdrodb.l0.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]

Run the L0B processing for a specific DISDRODB station with raw netCDFs.

Parameters

raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:

<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.

Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
- the raw_dir and processed_dir directory paths;
- with the key ‘campaign_name’ within the metadata YAML files.
- The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0B products. The path should have the following structure:

<…>/DISDRODB/Processed/<data_source>/<campaign_name>’

For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>. Example: glob_patterns = “*.nc”
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (object, optional) – Sanitizer function to format the raw netCDF into DISDRODB L0B standard.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw netCDF files. The default is False.