disdrodb.l0 package
Subpackages
Submodules
disdrodb.l0.check_configs module
- class disdrodb.l0.check_configs.NetcdfEncodingSchema(*, contiguous: bool, dtype: str, zlib: bool, complevel: int, shuffle: bool, fletcher32: bool, chunksizes: Optional[Union[int, List[int]]] = None)[source]
Bases:
BaseModel- chunksizes: Optional[Union[int, List[int]]]
- complevel: int
- contiguous: bool
- dtype: str
- fletcher32: bool
- shuffle: bool
- zlib: bool
- class disdrodb.l0.check_configs.RawDataFormatSchema(*, n_digits: Optional[int] = None, n_characters: Optional[int] = None, n_decimals: Optional[int] = None, n_naturals: Optional[int] = None, data_range: Optional[List[float]] = None, nan_flags: Optional[str] = None, valid_values: Optional[List[float]] = None, dimension_order: Optional[List[str]] = None, n_values: Optional[int] = None)[source]
Bases:
BaseModel- data_range: Optional[List[float]]
- dimension_order: Optional[List[str]]
- n_characters: Optional[int]
- n_decimals: Optional[int]
- n_digits: Optional[int]
- n_naturals: Optional[int]
- n_values: Optional[int]
- nan_flags: Optional[str]
- valid_values: Optional[List[float]]
- exception disdrodb.l0.check_configs.SchemaValidationException[source]
Bases:
ExceptionException raised when schema validation fails
- disdrodb.l0.check_configs.check_bin_consistency(sensor_name: str) None[source]
Check bin consistency from config file.
Do not check the first and last bin !
- Parameters
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_cf_attributes(sensor_name: str) None[source]
Check that variable_description, variable_long_name, variable_units dict values are strings.
- Parameters
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_l0a_encoding(sensor_name: str) None[source]
Check l0a_encodings.yml file.
- Parameters
sensor_name (str) – Name of the sensor.
- Raises
ValueError – Error raised if the value of a key is not in the list of accepted values.
- disdrodb.l0.check_configs.check_l0b_encoding(sensor_name: str) None[source]
Check l0b_encodings.yml file based on the schema defined in the class NetcdfEncodingSchema.
- Parameters
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_raw_array(sensor_name: str) None[source]
Check raw array consistency from config file.
- Parameters
sensor_name (str) – Name of the sensor.
- Raises
ValueError – Error if the chunksizes are not consistent.
- disdrodb.l0.check_configs.check_raw_data_format(sensor_name: str) None[source]
check raw_data_format.yml file based on the schema defined in the class RawDataFormatSchema.
- Parameters
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_sensor_configs(sensor_name: str) None[source]
check sensor configs.
- Parameters
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.check_variable_consistency(sensor_name: str) None[source]
Check variable consistency across config files.
The variables specified within l0b_encoding.yml must be defined also in the other config files.
- Parameters
sensor_name (str) – Name of the sensor.
- Raises
ValueError – If the keys are not consistent.
- disdrodb.l0.check_configs.check_yaml_files_exists(sensor_name: str) None[source]
Check if all config YAML files exist.
- Parameters
sensor_name (str) – Name of the sensor.
- disdrodb.l0.check_configs.get_bins_measurement(sensor_name: str, file_name: str) list[source]
get bins measurement from config file.
- Parameters
sensor_name (str) – Name of the sensor.
file_name (str) – File name (bins_velocity.yml or bins_diameter.yml)
- Returns
List of chunksizes (center, bounds, width)
- Return type
list
disdrodb.l0.check_metadata module
- disdrodb.l0.check_metadata.check_archive_metadata_campaign_name(disdrodb_dir) bool[source]
Check metadata campaign_name.
- Parameters
disdrodb_dir (str) – Path to the disdrodb directory.
- Returns
If the check succeeds, the result is True, and if it fails, the result is False.
- Return type
bool
- disdrodb.l0.check_metadata.check_archive_metadata_data_source(disdrodb_dir) bool[source]
Check metadata data_source.
- Parameters
disdrodb_dir (str) – Path to the disdrodb directory.
- Returns
If the check succeeds, the result is True, and if it fails, the result is False.
- Return type
bool
- disdrodb.l0.check_metadata.check_archive_metadata_geolocation(disdrodb_dir)[source]
Check the metadata files have missing or wrong geolocation..
- Parameters
disdrodb_dir (str) – Path to the disdrodb directory.
- Returns
If the check succeeds, the result is True, and if it fails, the result is False.
- Return type
bool
- disdrodb.l0.check_metadata.check_archive_metadata_keys(disdrodb_dir: str) bool[source]
Check that all metadata files have valid keys
- Parameters
disdrodb_dir (str) – Path to the disdrodb directory.
- Returns
If the check succeeds, the result is True, and if it fails, the result is False.
- Return type
bool
- disdrodb.l0.check_metadata.check_archive_metadata_reader(disdrodb_dir: str) bool[source]
Check if the reader key is available and there is the associated reader.
- Parameters
disdrodb_dir (str) – Path to the disdrodb directory.
- Returns
If the check succeeds, the result is True, and if it fails, the result is False.
- Return type
bool
- disdrodb.l0.check_metadata.check_archive_metadata_sensor_name(disdrodb_dir) bool[source]
Check metadata sensor name.
- Parameters
disdrodb_dir (str) – Path to the disdrodb directory.
- Returns
If the check succeeds, the result is True, and if it fails, the result is False.
- Return type
bool
- disdrodb.l0.check_metadata.check_archive_metadata_station_name(disdrodb_dir) bool[source]
Check metadata station name.
- Parameters
disdrodb_dir (str) – Path to the disdrodb directory.
- Returns
If the check succeeds, the result is True, and if it fails, the result is False.
- Return type
bool
- disdrodb.l0.check_metadata.check_metadata_geolocation(metadata) None[source]
Identify metadata with missing or wrong geolocation.
- disdrodb.l0.check_metadata.get_archive_metadata_key_value(disdrodb_dir: str, key: str, return_tuple: bool = True)[source]
Return the values of a metadata key for all the archive. :param disdrodb_dir: Path to the disdrodb directory. :type disdrodb_dir: str :param key: Metadata key. :type key: str :param return_tuple: if True, returns a tuple of values with station, campaign and data source name (default is True)
if False, returns a list of values without station, campaign and data source name
- Returns
List or tuple of values of the metadata key.
- Return type
list or tuple
- disdrodb.l0.check_metadata.identify_empty_metadata_keys(metadata_fpaths: list, keys: Union[str, list]) None[source]
Identify empty metadata keys.
- Parameters
metadata_fpaths (str) – Input YAML file path.
keys (Union[str,list]) – Attributes to verify the presence.
disdrodb.l0.check_readers module
- disdrodb.l0.check_readers.check_all_readers() None[source]
Test all readers that have data samples and ground truth.
Raises
- Exception
If the reader validation has failed.
- disdrodb.l0.check_readers.get_list_test_campaigns(data_source: str) list[source]
Get list of test campaigns for a given data source.
- Parameters
data_source (str) – Data source.
- Returns
List of test campaigns.
- Return type
list
- disdrodb.l0.check_readers.get_list_test_data_sources() list[source]
Get list of test data sources.
- Returns
List of test data sources.
- Return type
list
- disdrodb.l0.check_readers.get_list_test_stations(data_source: str, campaign_name: str) list[source]
Get list of test stations for a given data source and campaign.
- Parameters
data_source (str) – Data source.
campaign_name (str) – Name of the campaign.
- Returns
List of test stations.
- Return type
list
disdrodb.l0.check_standards module
- disdrodb.l0.check_standards.check_l0a_column_names(df: DataFrame, sensor_name: str) None[source]
Checks that the dataframe columns respects DISDRODB standards.
- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
- Raises
ValueError – Error if some columns do not meet the DISDRODB standards or if the ‘time’ column is missing in the dataframe.
- disdrodb.l0.check_standards.check_l0a_standards(df: DataFrame, sensor_name: str, verbose: bool = True) None[source]
Checks that a file respects the DISDRODB L0A standards.
- Parameters
df (pd.DataFrame) – L0A dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool, optional) – Wheter to verbose the processing. The default is True.
- Raises
ValueError – Error if some columns have inconsistent values.
- disdrodb.l0.check_standards.check_sensor_name(sensor_name: str) None[source]
Check sensor name.
- Parameters
sensor_name (str) – Name of the sensor.
- Raises
TypeError – Error if sensor_name is not a string.
ValueError – Error if the input sensor name has not been found in the list of available sensors.
disdrodb.l0.io module
- disdrodb.l0.io.check_glob_pattern(pattern: str) None[source]
Check if the input parameters is a string and if it can be used as pattern.
- Parameters
pattern (str) – String to be checked.
- Raises
TypeError – The input parameter is not a string.
ValueError – The input parameter can not be used as pattern.
- disdrodb.l0.io.check_glob_patterns(patterns: Union[str, list]) list[source]
Check if glob patterns are valids.
- disdrodb.l0.io.check_processed_dir(processed_dir)[source]
Check input, format and validity of the directory path
- Parameters
processed_dir (str) – Path of the processed directory
- Returns
Path of the processed directory
- Return type
str
- disdrodb.l0.io.check_raw_dir(raw_dir: str, verbose: bool = False) None[source]
Check validity of raw_dir.
Steps: 1. Check that ‘raw_dir’ is a valid directory path 2. Check that ‘raw_dir’ follows the expect directory structure 3. Check that each station_name directory contains data 4. Check that for each station_name the mandatory metadata.yml is specified. 4. Check that for each station_name the mandatory issue.yml is specified.
- Parameters
raw_dir (str) – Input raw directory
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
- disdrodb.l0.io.create_directory_structure(processed_dir, product_level, station_name, force, verbose=False)[source]
Create directory structure for L0B and higher DISDRODB products.
- disdrodb.l0.io.create_initial_directory_structure(raw_dir, processed_dir, station_name, force, verbose=False, product_level='L0A')[source]
Create directory structure for the first L0 DISDRODB product.
If the input data are raw text files –> product_level = “L0A” (run_l0a) If the input data are raw netCDF files –> product_level = “L0B” (run_l0b_nc)
- disdrodb.l0.io.get_L0A_dir(processed_dir: str, station_name: str) str[source]
Define L0A directory.
- Parameters
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station
- Returns
L0A directory path.
- Return type
str
- disdrodb.l0.io.get_L0A_fname(df, processed_dir, station_name: str) str[source]
Define L0A file name.
- Parameters
df (pd.DataFrame) – L0A DataFrame
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station
- Returns
L0A file name.
- Return type
str
- disdrodb.l0.io.get_L0A_fpath(df: DataFrame, processed_dir: str, station_name: str) str[source]
Define L0A file path.
- Parameters
df (pd.DataFrame) – L0A DataFrame.
processed_dir (str) – Path of the processed directory.
station_name (str) – Name of the station.
- Returns
L0A file path.
- Return type
str
- disdrodb.l0.io.get_L0B_dir(processed_dir: str, station_name: str) str[source]
Define L0B directory.
- Parameters
processed_dir (str) – Path of the processed directory
station_name (int) – Name of the station
- Returns
Path of the L0B directory
- Return type
str
- disdrodb.l0.io.get_L0B_fname(ds, processed_dir, station_name: str) str[source]
Define L0B file name.
- Parameters
ds (xr.Dataset) – L0B xarray Dataset
processed_dir (str) – Path of the processed directory
station_name (str) – Name of the station
- Returns
L0B file name.
- Return type
str
- disdrodb.l0.io.get_L0B_fpath(ds: Dataset, processed_dir: str, station_name: str, l0b_concat=False) str[source]
Define L0B file path.
- Parameters
ds (xr.Dataset) – L0B xarray Dataset.
processed_dir (str) – Path of the processed directory.
station_name (str) – ID of the station
l0b_concat (bool) – If False, the file is specified inside the station directory. If True, the file is specified outside the station directory.
- Returns
L0B file path.
- Return type
str
- disdrodb.l0.io.get_campaign_name(path: str) str[source]
Return the campaign name from a file or directory path.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters
base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns
Name of the campaign.
- Return type
str
- disdrodb.l0.io.get_data_source(path: str) str[source]
Return the data_source from a file or directory path.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters
base_dir (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns
Name of the campaign.
- Return type
str
- disdrodb.l0.io.get_dataframe_min_max_time(df: DataFrame)[source]
Retrieves dataframe starting and ending time.
- Parameters
df (pd.DataFrame) – Input dataframe
- Returns
(starting_time, ending_time)
- Return type
tuple
- disdrodb.l0.io.get_dataset_min_max_time(ds: Dataset)[source]
Retrieves dataset starting and ending time.
- Parameters
ds (xr.Dataset) – Input dataset
- Returns
(starting_time, ending_time)
- Return type
tuple
- disdrodb.l0.io.get_disdrodb_dir(path: str) str[source]
Return the disdrodb base directory from a file or directory path.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters
path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns
Path of the DISDRODB directory.
- Return type
str
- disdrodb.l0.io.get_disdrodb_path(path: str) str[source]
Return the path fron the disdrodb_dir directory.
Current assumption: no data_source, campaign_name, station_name or file contain the word DISDRODB!
- Parameters
path (str) – path can be a campaign_dir (‘raw_dir’ or ‘processed_dir’), or a DISDRODB file path.
- Returns
Path inside the DISDRODB archive. Format: DISDRODB/<Raw or Processed>/<data_source>/…
- Return type
str
- disdrodb.l0.io.get_l0a_file_list(processed_dir, station_name, debugging_mode)[source]
Retrieve L0A files for a give station.
- Parameters
processed_dir (str) – Directory of the campaign where to search for the L0A files. Format <..>/DISDRODB/Processed/<data_source>/<campaign_name>
station_name (str) – ID of the station
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.
- Returns
list_fpaths – List of L0A file paths.
- Return type
list
- disdrodb.l0.io.get_raw_file_list(raw_dir, station_name, glob_patterns, verbose=False, debugging_mode=False)[source]
Get the list of files from a directory based on input parameters.
Currently concatenates all files provided by the glob patterns. In future, this might be modified to enable DISDRODB processing when raw data are separated in multiple files.
- Parameters
raw_dir (str) – Directory of the campaign where to search for files. Format <..>/DISDRODB/Raw/<data_source>/<campaign_name>
station_name (str) – ID of the station
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
debugging_mode (bool, optional) – If True, it select maximum 3 files for debugging purposes. The default is False.
- Returns
list_fpaths – List of files file paths.
- Return type
list
- disdrodb.l0.io.read_L0A_dataframe(fpaths: Union[str, list], verbose: bool = False, debugging_mode: bool = False) DataFrame[source]
Read DISDRODB L0A Apache Parquet file(s).
- Parameters
fpaths (str or list) – Either a list or a single filepath .
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. If fpaths is a list, it reads only the first 3 files For each file it select only the first 100 rows. The default is False.
- Returns
L0A Dataframe.
- Return type
pd.DataFrame
disdrodb.l0.issue module
- class disdrodb.l0.issue.NoDatesSafeLoader(stream)[source]
Bases:
SafeLoader- classmethod remove_implicit_resolver(tag_to_remove)[source]
Remove implicit resolvers for a particular tag
Takes care not to modify resolvers in super classes.
We want to load datetimes as strings, not dates, because we go on to serialise as json which doesn’t have the advanced types of yaml, and leads to incompatibilities down the track.
- disdrodb.l0.issue.check_issue_file(fpath: str) None[source]
Check issue YAML file validity.
- Parameters
fpath (str) – Issue YAML file path.
- disdrodb.l0.issue.check_timesteps(timesteps)[source]
Check timesteps validity.
It expects timesteps string in YYYY-mm-dd HH:MM:SS format with second accuracy. If timesteps is None, return None.
- disdrodb.l0.issue.is_numpy_array_datetime(arr)[source]
Check if the numpy array contains datetime64
- Parameters
arr (numpy array) – Numpy array to check.
- Returns
Numpy array checked.
- Return type
numpy array
- disdrodb.l0.issue.is_numpy_array_string(arr)[source]
Check if the numpy array contains strings
- Parameters
arr (numpy array) – Numpy array to check.
- disdrodb.l0.issue.load_yaml_without_date_parsing(filepath)[source]
Read a YAML file without converting automatically date string to datetime.
- disdrodb.l0.issue.read_issue(raw_dir: str, station_name: str) dict[source]
Read YAML issue file.
- Parameters
raw_dir (str) – Path of the campaign raw directory.
station_name (int) – Station name.
- Returns
Issue dictionary.
- Return type
dict
- disdrodb.l0.issue.read_issue_file(fpath: str) dict[source]
Read YAML issue file.
- Parameters
fpath (str) – Filepath of the issue YAML.
- Returns
Issue dictionary.
- Return type
dict
disdrodb.l0.l0_processing module
- disdrodb.l0.l0_processing.click_l0_archive_options(function: object)[source]
Click command line arguments for L0 processing archiving of a station.
- Parameters
function (object) – Function.
- disdrodb.l0.l0_processing.click_l0_processing_options(function: object)[source]
Click command line default parameters for L0 processing options.
- Parameters
function (object) – Function.
- disdrodb.l0.l0_processing.click_l0_station_arguments(function: object)[source]
Click command line arguments for L0 processing of a station.
- Parameters
function (object) – Function.
- disdrodb.l0.l0_processing.click_l0_stations_options(function: object)[source]
Click command line options for DISDRODB archive L0 processing.
- Parameters
function (object) – Function.
- disdrodb.l0.l0_processing.click_l0b_concat_options(function: object)[source]
Click command line default parameters for L0B concatenation.
- Parameters
function (object) – Function.
- disdrodb.l0.l0_processing.run_disdrodb_l0(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0 processing of DISDRODB stations.
This function enable to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters
disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is None
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is None
station_names (list) – Station names to process. The default is None
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –
- Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat = True
The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default is False.
- disdrodb.l0.l0_processing.run_disdrodb_l0_station(disdrodb_dir, data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0 processing of a specific DISDRODB station from the terminal.
- Parameters
disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –
- Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat=True
The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default is False.
- disdrodb.l0.l0_processing.run_disdrodb_l0a(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
- disdrodb.l0.l0_processing.run_disdrodb_l0a_station(disdrodb_dir, data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0A processing of a station calling run_disdrodb_l0a_station in the terminal.
- disdrodb.l0.l0_processing.run_disdrodb_l0b(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
- disdrodb.l0.l0_processing.run_disdrodb_l0b_station(disdrodb_dir, data_source, campaign_name, station_name, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0B processing of a station calling run_disdrodb_l0b_station in the terminal.
- disdrodb.l0.l0_processing.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]
Run the L0A processing for a specific DISDRODB station.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>
column_names (list) – Columns names of the raw text file.
reader_kwargs (dict) – Pandas read_csv arguments to open the text file.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame into DISDRODB L0A standard.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 100 rows of 3 raw data files. The default is False.
- disdrodb.l0.l0_processing.run_l0b(processed_dir, station_name, parallel, force, verbose, debugging_mode)[source]
Run the L0B processing for a specific DISDRODB station.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. Ensure that the threads_per_worker (number of thread per process) is set to 1 to avoid HDF errors. Also ensure to set the HDF5_USE_FILE_LOCKING environment variable to False. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just 3 raw data files. The default is False.
- disdrodb.l0.l0_processing.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]
Run the L0B processing for a specific DISDRODB station with raw netCDFs.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>. Example: glob_patterns = “*.nc”
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (object, optional) – Sanitizer function to format the raw netCDF into DISDRODB L0B standard.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw netCDF files. The default is False.
disdrodb.l0.l0_reader module
- disdrodb.l0.l0_reader.available_readers(data_sources=None, reader_path=False)[source]
Retrieve available readers information.
- disdrodb.l0.l0_reader.check_available_readers()[source]
Check the readers arguments of all package.
- disdrodb.l0.l0_reader.check_reader_arguments(reader)[source]
Check the reader have the expected input arguments.
- disdrodb.l0.l0_reader.check_reader_exists(reader_data_source: str, reader_name: str) str[source]
Check if the provided data source exists and reader names exists within the available readers.
Please run get_available_readers_dict() to get the list of all available reader.
- Parameters
reader_data_source (str) – The directory within which the reader_name is located in the disdrodb.l0.readers directory.
reader_name (str) – Campaign name
- Returns
If True : returns the reader name If False : Error - return None
- Return type
str
- Raises
ValueError – Error if the reader name provided for the campaign has not been found.
- disdrodb.l0.l0_reader.get_available_readers_dict() dict[source]
Returns the readers description included into the current release of DISDRODB.
- Returns
The dictionary has the following schema {“data_source”: {“reader_name”: “reader_file_path”}}
- Return type
dict
- disdrodb.l0.l0_reader.get_reader(reader_data_source: str, reader_name: str) object[source]
Returns the reader function based on input parameters.
- Parameters
reader_data_source (str) – The directory within which the reader_name is located in the disdrodb.l0.readers directory.
reader_name (str) – The reader name.
- Returns
The reader() function
- Return type
object
- disdrodb.l0.l0_reader.get_reader_from_metadata_reader_key(reader_data_source_name)[source]
Retrieve the reader from the reader metadata value.
The convention for metadata reader key: <data_source/reader_name> in disdrodb.l0.readers
- disdrodb.l0.l0_reader.get_station_reader(disdrodb_dir, data_source, campaign_name, station_name)[source]
Retrieve reader form station metadata information.
- disdrodb.l0.l0_reader.is_documented_by(original)[source]
Wrapper function to apply generic docstring to the decorated function.
- Parameters
original (function) – Function to take the docstring from.
- disdrodb.l0.l0_reader.reader_generic_docstring()[source]
Script to convert the raw data to L0A format.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw data files. The default is False.
disdrodb.l0.l0a_processing module
Functions to process raw text files into DISDRODB L0A Apache Parquet.
- disdrodb.l0.l0a_processing.cast_column_dtypes(df: DataFrame, sensor_name: str, verbose: bool = False) DataFrame[source]
Convert ‘object’ dataframe columns into DISDRODB L0A dtype standards.
- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataframe with corrected columns types.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.coerce_corrupted_values_to_nan(df: DataFrame, sensor_name: str, verbose: bool = False) DataFrame[source]
Coerce corrupted values in dataframe numeric columns to np.nan.
- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataframe with string columns without corrupted values.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.concatenate_dataframe(list_df: list, verbose: bool = False) DataFrame[source]
Concatenate a list of dataframes.
- Parameters
list_df (list) – List of dataframes.
verbose (bool, optional) – If True, print messages. If False, no print.
- Returns
Concatenated dataframe.
- Return type
pd.DataFrame
- Raises
ValueError – Concatenation can not be done.
- disdrodb.l0.l0a_processing.drop_time_periods(df, time_periods)[source]
Drop problematic time_period.
- disdrodb.l0.l0a_processing.preprocess_reader_kwargs(reader_kwargs: dict) dict[source]
Preprocess arguments required to read raw text file into Pandas.
- Parameters
reader_kwargs (dict) – Initial parameter dictionary.
- Returns
Parameter dictionary that matches either Pandas or Dask.
- Return type
dict
- disdrodb.l0.l0a_processing.process_raw_file(filepath, column_names, reader_kwargs, df_sanitizer_fun, sensor_name, verbose=True, issue_dict={})[source]
Read and parse a raw text files into a L0A dataframe.
- Parameters
filepath (str) – File path
column_names (list) – Columns names.
reader_kwargs (dict) – Pandas read_csv arguments.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing. The default is True
issue_dict (dict) – Issue dictionary providing information on timesteps to remove. The default is an empty dictionary {}. Valid issue_dict key are ‘timesteps’ and ‘time_periods’. Valid issue_dict values are list of datetime64 values (with second accuracy). To correctly format and check the validity of the issue_dict, use the disdrodb.l0.issue.check_issue_dict function.
- Returns
Dataframe
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.read_raw_data(filepath: str, column_names: list, reader_kwargs: dict) DataFrame[source]
Read raw data into a dataframe.
- Parameters
filepath (str) – Raw file path.
column_names (list) – Column names.
reader_kwargs (dict) – Pandas pd.read_csv arguments.
- Returns
Pandas dataframe.
- Return type
pandas.DataFrame
- disdrodb.l0.l0a_processing.read_raw_file_list(file_list: Union[list, str], column_names: list, reader_kwargs: dict, sensor_name: str, verbose: bool, df_sanitizer_fun: Optional[object] = None) DataFrame[source]
Read and parse a list for raw files into a dataframe.
- Parameters
file_list (Union[list,str]) – File(s) path(s)
column_names (list) – Columns names.
reader_kwargs (dict) – Pandas read_csv arguments.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame.
- Returns
Dataframe
- Return type
pd.DataFrame
- Raises
ValueError – Input parameters can not be used or the raw file can not be processed.
- disdrodb.l0.l0a_processing.remove_corrupted_rows(df)[source]
Remove corrupted rows by checking conversion of raw fields to numeric.
Note: The raw array must be stripped away from delimiter at start and end !
- disdrodb.l0.l0a_processing.remove_duplicated_timesteps(df: DataFrame, verbose: bool = False)[source]
Remove duplicated timesteps.
It keep only the first timestep occurence !
- Parameters
df (pd.DataFrame) – Input dataframe.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataframe with valid unique timesteps.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.remove_issue_timesteps(df, issue_dict, verbose=False)[source]
Drop dataframe rows with timesteps listed in the issue dictionary.
- Parameters
df (pd.DataFrame) – Input dataframe.
issue_dict (dict) – Issue dictionary
- Returns
Dataframe with problematic timesteps removed.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.remove_rows_with_missing_time(df: DataFrame, verbose: bool = False)[source]
Remove dataframe rows where the “time” is NaT.
- Parameters
df (pd.DataFrame) – Input dataframe.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataframe with valid timesteps.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.replace_nan_flags(df, sensor_name, verbose)[source]
Set values corresponding to nan_flags to np.nan.
- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataframe without nan_flags values.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.set_nan_outside_data_range(df, sensor_name, verbose)[source]
Set values outside the data range as np.nan.
- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataframe without values outside the expected data range.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.set_nan_unvalid_values(df, sensor_name, verbose)[source]
Set unvalid (class) values to np.nan.
- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataframe without unvalid values.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.strip_delimiter_from_raw_arrays(df)[source]
Remove the first and last delimiter occurence from the raw array fields.
- disdrodb.l0.l0a_processing.strip_string_spaces(df: DataFrame, sensor_name: str, verbose: bool = False) DataFrame[source]
Strip leading/trailing spaces from dataframe string columns.
- Parameters
df (pd.DataFrame) – Input dataframe.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataframe with string columns without leading/trailing spaces.
- Return type
pd.DataFrame
- disdrodb.l0.l0a_processing.write_l0a(df: DataFrame, fpath: str, force: bool = False, verbose: bool = False)[source]
Save the dataframe into an Apache Parquet file.
- Parameters
df (pd.DataFrame) – Input dataframe.
fpath (str) – Output file path.
force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
- Raises
ValueError – The input dataframe can not be written as an Apache Parquet file.
NotImplementedError – The input dataframe can not be processed.
disdrodb.l0.l0b_concat module
disdrodb.l0.l0b_processing module
Functions to process DISDRODB L0A files into DISDRODB L0B netCDF files.
- disdrodb.l0.l0b_processing.add_dataset_crs_coords(ds)[source]
Add the CRS coordinate to the xr.Dataset
- disdrodb.l0.l0b_processing.add_dataset_missing_variables(ds, missing_vars, sensor_name)[source]
Add missing Dataset variables as nan DataArrays.
- disdrodb.l0.l0b_processing.convert_object_variables_to_string(ds: Dataset) Dataset[source]
Convert variables with object dtype to string.
- Parameters
ds (xr.Dataset) – Input dataset.
- Returns
Output dataset.
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.create_l0b_from_l0a(df: DataFrame, attrs: dict, verbose: bool = False) Dataset[source]
Transform the L0A dataframe to the L0B xr.Dataset.
- Parameters
df (pd.DataFrame) – DISDRODB L0A dataframe.
attrs (dict) – Station metadata.
verbose (bool, optional) – Wheter to verbose the processing. The default is False.
- Returns
DISDRODB L0B dataset.
- Return type
xr.Dataset
- Raises
ValueError – Error if the DISDRODB L0B xarray dataset can not be created.
- disdrodb.l0.l0b_processing.format_string_array(string: str, n_values: int) array[source]
Split a string with multiple numbers separated by a delimiter into an 1D array.
e.g. : format_string_array(“2,44,22,33”, 4) will return [ 2. 44. 22. 33.]
If empty string (“”) –> Return an arrays of zeros If the list length is not n_values -> Return an arrays of np.nan
The function strip potential delimiters at start and end before splitting.
- Parameters
string (str) – Input string
n_values (int) – Expected length of the output array.
- Returns
array of float
- Return type
np.array
- disdrodb.l0.l0b_processing.get_bin_coords(sensor_name: str) dict[source]
Retrieve diameter (and velocity) bin coordinates.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with coordinate arrays.
- Return type
dict
- disdrodb.l0.l0b_processing.infer_split_str(string: str) str[source]
Infer the delimeter inside a string.
- Parameters
string (str) – Input string.
- Returns
Inferred delimiter.
- Return type
str
- disdrodb.l0.l0b_processing.preprocess_raw_netcdf(ds, dict_names, sensor_name)[source]
This function preprocess raw netCDF to improve compatibility with DISDRODB standards.
This function checks validity of the dict_names, rename and subset the data accordingly. If some variables specified in the dict_names are missing, it adds a NaN DataArray !
- Parameters
ds (xr.Dataset) – Raw netCDF to be converted to DISDRODB standards.
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
sensor_name (str) – Sensor name.
- Returns
ds – xarray Dataset with DISDRODB-compliant variable naming conventions.
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.process_raw_nc(filepath, dict_names, ds_sanitizer_fun, sensor_name, verbose, attrs)[source]
Read and convert a raw netCDF into a DISDRODB L0B netCDF.
- Parameters
filepath (str) – netCDF file path.
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (function) – Sanitizer function to do ad-hoc processing of the xr.Dataset.
attrs (dict) – Global metadata to attach as global attributes to the xr.Dataset.
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns
L0B xr.Dataset
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.rechunk_dataset(ds: Dataset, encoding_dict: dict) Dataset[source]
Coerce the dataset arrays to have the chunk size specified in the encoding dictionary.
- Parameters
ds (xr.Dataset) – Input xarray dataset
encoding_dict (dict) – Dictionary containing the encoding to write the xarray dataset as a netCDF.
- Returns
Output xarray dataset
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.rename_dataset(ds, dict_names)[source]
Rename Dataset variables, coordinates and dimensions.
- disdrodb.l0.l0b_processing.replace_custom_nan_flags(ds, dict_nan_flags)[source]
Set values corresponding to nan_flags to np.nan.
- Parameters
df (xr.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan
- Returns
Dataset without nan_flags values.
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.replace_nan_flags(ds, sensor_name, verbose)[source]
Set values corresponding to nan_flags to np.nan.
- Parameters
ds (xr.Dataset) – Input xarray dataset
dict_nan_flags (dict) – Dictionary with nan flags value to set as np.nan
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataset without nan_flags values.
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.reshape_raw_spectrum(arr: array, dims_order: list, dims_size_dict: dict, n_timesteps: int) array[source]
Reshape the raw spectrum to a 2D+time array.
The array has dimensions [“time”] + dims_order
- Parameters
arr (np.array) – Input array.
dims_order (list) –
The order of dimension in the raw spectrum.
Examples: - OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dims_order = [“diameter_bin_center”, “velocity_bin_center”] - Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dims_order = [“velocity_bin_center”, “diameter_bin_center”]
dims_size_dict (dict) – Dictionary with the number of bins for each dimension. For OTT_Parsivel: {“diameter_bin_center”: 32, “velocity_bin_center”: 32} For This_LPM {“diameter_bin_center”: 22, “velocity_bin_center”: 20}
n_timesteps (int) – Number of timesteps.
- Returns
Output array.
- Return type
np.array
- Raises
ValueError – Impossible to reshape the raw_spectrum matrix
- disdrodb.l0.l0b_processing.retrieve_l0b_arrays(df: DataFrame, sensor_name: str, verbose: bool = False) dict[source]
Retrieves the L0B data matrix.
- Parameters
df (pd.DataFrame) – Input dataframe
sensor_name (str) – Name of the sensor
- Returns
Dictionary with data arrays.
- Return type
dict
- disdrodb.l0.l0b_processing.sanitize_encodings_dict(encoding_dict: dict, ds: Dataset) dict[source]
Ensure chunk size to be smaller than the array shape.
- Parameters
encoding_dict (dict) – Dictionary containing the encoding to write DISDRODB L0B netCDFs.
ds (xr.Dataset) – Input dataset.
- Returns
Encoding dictionary.
- Return type
dict
- disdrodb.l0.l0b_processing.set_dataset_attrs(ds, sensor_name)[source]
Set variable and coordinates attributes.
- disdrodb.l0.l0b_processing.set_encodings(ds: Dataset, sensor_name: str) Dataset[source]
Apply the encodings to the xarray Dataset.
- Parameters
ds (xr.Dataset) – Input xarray dataset.
sensor_name (str) – Name of the sensor.
- Returns
Output xarray dataset.
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.set_nan_outside_data_range(ds, sensor_name, verbose)[source]
Set values outside the data range as np.nan.
- Parameters
ds (xr.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataset without values outside the expected data range.
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.set_nan_unvalid_values(ds, sensor_name, verbose)[source]
Set unvalid (class) values to np.nan.
- Parameters
ds (xr.Dataset) – Input xarray dataset
sensor_name (str) – Name of the sensor.
verbose (bool) – Wheter to verbose the processing.
- Returns
Dataset without unvalid values.
- Return type
xr.Dataset
- disdrodb.l0.l0b_processing.set_variable_attributes(ds: Dataset, sensor_name: str) Dataset[source]
Set attributes to each xr.Dataset variable.
- Parameters
ds (xr.Dataset) – Input dataset.
sensor_name (str) – Name of the sensor.
- Returns
xr.Dataset.
- Return type
ds
- disdrodb.l0.l0b_processing.write_l0b(ds: Dataset, fpath: str, force=False) None[source]
Save the xarray dataset into a NetCDF file.
- Parameters
ds (xr.Dataset) – Input xarray dataset.
fpath (str) – Output file path.
sensor_name (str) – Name of the sensor.
force (bool, optional) – Whether to overwrite existing data. If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. This is the default.
disdrodb.l0.metadata module
- disdrodb.l0.metadata.add_missing_metadata_keys(metadata)[source]
Add missing keys to the metadata dictionary.
- disdrodb.l0.metadata.check_metadata_compliance(disdrodb_dir, data_source, campaign_name, station_name)[source]
Check DISDRODB metadata compliance.
- disdrodb.l0.metadata.create_campaign_default_metadata(disdrodb_dir, campaign_name, data_source)[source]
Create default YAML metadata files for all stations within a campaign.
Use the function with caution to avoid overwrite existing YAML files.
- disdrodb.l0.metadata.get_default_metadata_dict() dict[source]
Get DISDRODB metadata default values.
- Returns
Dictionary of attibutes standard
- Return type
dict
- disdrodb.l0.metadata.get_metadata_missing_keys(metadata)[source]
Return the DISDRODB metadata keys which are missing.
- disdrodb.l0.metadata.get_metadata_unvalid_keys(metadata)[source]
Return the DISDRODB metadata keys which are not valid.
- disdrodb.l0.metadata.get_valid_metadata_keys() list[source]
Get DISDRODB valid metadata list.
- Returns
List of valid metadata keys
- Return type
list
- disdrodb.l0.metadata.read_metadata(campaign_dir: str, station_name: str) dict[source]
Read YAML metadata file.
- Parameters
raw_dir (str) – Path of the raw directory
station_name (int) – Id of the station.
- Returns
Dictionnary of the metadata.
- Return type
dict
- disdrodb.l0.metadata.remove_unvalid_metadata_keys(metadata)[source]
Remove unvalid keys from the metadata dictionary.
- disdrodb.l0.metadata.sort_metadata_dictionary(metadata)[source]
Sort the keys of the metadata dictionary by valid_metadata_keys list order.
disdrodb.l0.standards module
- disdrodb.l0.standards.available_sensor_name() sorted[source]
Get available names of sensors.
- Returns
Sorted list of the available sensors
- Return type
sorted
- disdrodb.l0.standards.get_L0A_encodings_dict(sensor_name: str) dict[source]
Get a dictionary containing the L0A encodings
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
L0A encodings
- Return type
dict
- disdrodb.l0.standards.get_L0B_encodings_dict(sensor_name: str) dict[source]
Get a dictionary containing the encoding to write L0B netCDFs.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Encoding to write L0B netCDFs
- Return type
dict
- disdrodb.l0.standards.get_configs_dir(sensor_name: str) str[source]
Retrieve configs directory.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Config directory.
- Return type
str
- Raises
ValueError – Error if the config directory does not exist.
- disdrodb.l0.standards.get_coords_attrs_dict(ds)[source]
Return dictionary with DISDRODB coordinates attributes.
- disdrodb.l0.standards.get_data_format_dict(sensor_name: str) dict[source]
Get a dictionary containing the data format of each sensor variable.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Data format of each sensor variable
- Return type
dict
- disdrodb.l0.standards.get_data_range_dict(sensor_name: str) dict[source]
Get the variable data range.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected data value range for each data field. It excludes variables without specified data_range key.
- Return type
dict
- disdrodb.l0.standards.get_description_dict(sensor_name: str) dict[source]
Get a dictionary containing the description of each sensor variable.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Description of each sensor variable.
- Return type
dict
- disdrodb.l0.standards.get_diameter_bin_center(sensor_name: str) list[source]
Get diameter bin center.
- Parameters
sensor_name (str) – Name of the sensor
- Returns
Diameter bin center
- Return type
list
- disdrodb.l0.standards.get_diameter_bin_lower(sensor_name: str) list[source]
Get diameter bin lower bound.
- Parameters
sensor_name (str) – Name of the sensor
- Returns
Diameter bin lower bound
- Return type
list
- disdrodb.l0.standards.get_diameter_bin_upper(sensor_name: str) list[source]
Get diameter bin upper bound.
- Parameters
sensor_name (str) – Name of the sensor
- Returns
Diameter bin upper bound
- Return type
list
- disdrodb.l0.standards.get_diameter_bin_width(sensor_name: str) list[source]
Get diameter bin width.
- Parameters
sensor_name (str) – Name of the sensor
- Returns
Diameter bin width
- Return type
list
- disdrodb.l0.standards.get_diameter_bins_dict(sensor_name: str) dict[source]
Get dictionary with sensor_name diameter bins information.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
sensor_name diameter bins information
- Return type
dict
- disdrodb.l0.standards.get_dims_size_dict(sensor_name: str) dict[source]
Get the number of bins for each dimension.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the number of bins for each dimension.
- Return type
dict
- disdrodb.l0.standards.get_field_nchar_dict(sensor_name: str) dict[source]
Get the total number of characters from the instrument default string standards.
Important note: it accounts also for the comma and the minus sign !!!
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected number of characters for each data field.
- Return type
dict
- disdrodb.l0.standards.get_field_ndigits_decimals_dict(sensor_name: dict) dict[source]
Get number of digits on the right side of the comma from the instrument default string standards.
Example: 123,45 -> 45 –> 2 decimal digits :param sensor_name: Name of the sensor. :type sensor_name: dict
- Returns
Dictionary with the expected number of decimal digits for each data field.
- Return type
dict
- disdrodb.l0.standards.get_field_ndigits_dict(sensor_name: str) dict[source]
Get number of digits from the instrument default string standards.
Important note: it excludes the comma but it counts the minus sign !!!
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected number of digits for each data field.
- Return type
dict
- disdrodb.l0.standards.get_field_ndigits_natural_dict(sensor_name: str) dict[source]
Get number of digits on the left side of the comma from the instrument default string standards.
Example: 123,45 -> 123 –> 3 natural digits
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected number of natural digits for each data field.
- Return type
dict
- disdrodb.l0.standards.get_l0a_dtype(sensor_name: str) dict[source]
Get a dictionary containing the L0A dtype.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
L0A dtype
- Return type
dict
- disdrodb.l0.standards.get_long_name_dict(sensor_name: str) dict[source]
Get a dictionary containing the long name of each sensor variable.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Long name of each sensor variable.
- Return type
dict
- disdrodb.l0.standards.get_nan_flags_dict(sensor_name: str) dict[source]
Get the variable nan_flags.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected nan_flags list for each data field. It excludes variables without specified nan_flags key.
- Return type
dict
- disdrodb.l0.standards.get_raw_array_dims_order(sensor_name: str) dict[source]
Get the dimension order of the raw fields.
The order of dimension specified for raw_drop_number controls the reshaping of the precipitation raw spectrum.
Examples
OTT Parsivel spectrum [v1d1 … v1d32, v2d1, …, v2d32] –> dimension_order = [“velocity_bin_center”, “diameter_bin_center”] Thies LPM spectrum [v1d1 … v20d1, v1d2, …, v20d2] –> dimension_order = [“diameter_bin_center”, “velocity_bin_center”]
- Parameters
sensor_name (str) – Name of the sensor
- Returns
Dimension order dictionary
- Return type
dict
- disdrodb.l0.standards.get_raw_array_nvalues(sensor_name: str) dict[source]
Get a dictionary with the number of values expected for each raw array.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Field definition.
- Return type
dict
- disdrodb.l0.standards.get_sensor_variables(sensor_name: str) list[source]
Get sensor variable names list.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
List of the variables values
- Return type
list
- disdrodb.l0.standards.get_time_encoding() dict[source]
Create time encoding
- Returns
Time encoding
- Return type
dict
- disdrodb.l0.standards.get_units_dict(sensor_name: str) dict[source]
Get a dictionary containing the unit of each sensor variable.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Unit of each sensor variable
- Return type
dict
- disdrodb.l0.standards.get_valid_coordinates_names(sensor_name)[source]
Get list of valid coordinates.
- disdrodb.l0.standards.get_valid_dimension_names(sensor_name)[source]
Get list of valid dimension names.
- disdrodb.l0.standards.get_valid_values_dict(sensor_name: str) dict[source]
Get the list of valid values for a variable.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Dictionary with the expected values for specific variables. It excludes variables without specified valid_values key.
- Return type
dict
- disdrodb.l0.standards.get_variables_dict(sensor_name: str) dict[source]
Get a dictionary containing the variable name of the sensor field numbers.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Variables names
- Return type
dict
- disdrodb.l0.standards.get_variables_dimension(sensor_name: str)[source]
Returns a dictionary with the variable dimensions of a L0B product.
- disdrodb.l0.standards.get_velocity_bin_center(sensor_name: str) list[source]
Get velocity bin center.
- Parameters
sensor_name (str) – Name of the sensor
- Returns
Velocity bin center
- Return type
list
- disdrodb.l0.standards.get_velocity_bin_lower(sensor_name: str) list[source]
Get velocity bin lower bound.
- Parameters
sensor_name (str) – Name of the sensor
- Returns
Velocity bin lower bound.
- Return type
list
- disdrodb.l0.standards.get_velocity_bin_upper(sensor_name: str) list[source]
Get velocity bin upper bound.
- Parameters
sensor_name (str) – Name of the sensor
- Returns
Velocity bin upper bound
- Return type
list
- disdrodb.l0.standards.get_velocity_bin_width(sensor_name: str) list[source]
Get velocity bin width.
- Parameters
sensor_name (str) – Name of the sensor
- Returns
Velocity bin width
- Return type
list
- disdrodb.l0.standards.get_velocity_bins_dict(sensor_name: str) dict[source]
Get velocity with sensor_name diameter bins information.
- Parameters
sensor_name (str) – Name of the sensor.
- Returns
Sensor_name diameter bins information
- Return type
dict
- disdrodb.l0.standards.read_config_yml(sensor_name: str, filename: str) dict[source]
Read a config yaml file and return the dictionary.
- Parameters
sensor_name (str) – Name of the sensor.
filename (str) – Name of the file.
- Returns
Content of the config file.
- Return type
dict
- Raises
ValueError – Error if file does not exist.
- disdrodb.l0.standards.set_disdrodb_attrs(ds, product_level: str)[source]
Add DISDRODB processing information to the netCDF global attributes.
It assumes stations metadata are already added the dataset.
- Parameters
ds (xarray dataset) – Dataset
product_level (str) – DISDRODB product_level
- Returns
Dataset
- Return type
xarray dataset
disdrodb.l0.summary module
disdrodb.l0.template_tools module
- disdrodb.l0.template_tools.arr_has_constant_nchar(arr: array) bool[source]
Check if the content of an array has a constant number of characters
- Parameters
arr (numpy.ndarray) – The array to analyse
- Returns
True if the number of character is constant
- Return type
booleen
- disdrodb.l0.template_tools.check_column_names(column_names: list, sensor_name: str) None[source]
Checks that the columnn names respects DISDRODB standards.
- Parameters
column_names (list) – List of columns names.
sensor_name (str) – Name of the sensor.
- Raises
TypeError – Error if some columns do not meet the DISDRODB standards.
- disdrodb.l0.template_tools.get_decimal_ndigits(string: str) int[source]
Get the decimal number of digit.
- Parameters
string (str) – Input string
- Returns
The number of digit.
- Return type
int
- disdrodb.l0.template_tools.get_df_columns_unique_values_dict(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True)[source]
Create a dictionary {column: unique values}
- Parameters
df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – column indices
column_names (bool, optional) – If true, print the column name, by default True
- disdrodb.l0.template_tools.get_natural_ndigits(string: str) int[source]
Get the natural number of digit.
- Parameters
string (str) – Input string
- Returns
The number of digit.
- Return type
int
- disdrodb.l0.template_tools.get_nchar(string: str) int[source]
Get the number of charactar.
- Parameters
string (str) – Input string
- Returns
Number of charactar
- Return type
int
- disdrodb.l0.template_tools.get_ndigits(string: str) int[source]
Get the number of digit.
- Parameters
string (str) – Input string
- Returns
Number of digit
- Return type
int
- disdrodb.l0.template_tools.get_possible_keys(dict_options: dict, desired_value: str) set[source]
Get the possible keys from the input values
- Parameters
dict_options (dict) – Input dictionnary
desired_value (str) – Input value
- Returns
Keys that the value matches the desired input value.
- Return type
set
- disdrodb.l0.template_tools.infer_column_names(df: DataFrame, sensor_name: str, row_idx: int = 1)[source]
Try to guess the dataframe columns names based on string characteristics.
- Parameters
df (numpy.ndarray) – The array to analyse
sensor_name (str) – name of the sensor
row_idx (int, optional) – The row ID of the array, by default 1
- Returns
Dictionary with the keys being the column id and the values being the guessed column names
- Return type
dict
- disdrodb.l0.template_tools.print_df_column_names(df: DataFrame) None[source]
Print dataframe columns names
- Parameters
df (dataframe) – The dataframe
- Returns
Nothing
- Return type
None
- disdrodb.l0.template_tools.print_df_columns_unique_values(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True) None[source]
Print columns’ unique values
- Parameters
df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – column indices
column_names (bool, optional) – If true, print the column name, by default True
- disdrodb.l0.template_tools.print_df_first_n_rows(df: DataFrame, n: int = 5, column_names: bool = True) None[source]
Print the n first n rows dataframe by column.
- Parameters
df (pd.DataFrame) – Input dataframe
n (int, optional) – Number of row, by default 5
column_names (bool , optional) – If true columns name are printed, by default True
- disdrodb.l0.template_tools.print_df_random_n_rows(df: DataFrame, n: int = 5, with_column_names: bool = True) None[source]
Print the content of the dataframe by column, randomly chosen
- Parameters
df (dataframe) – The dataframe
n (int, optional) – The number of row to print, by default 5
with_column_names (bool, optional) – If true, print the column name, by default True
- Returns
Nothing
- Return type
None
- disdrodb.l0.template_tools.print_df_summary_stats(df: DataFrame, column_indices: Optional[Union[int, slice, list]] = None, column_names: bool = True)[source]
Create a columns statistics summary.
- Parameters
df (pd.DataFrame) – Input dataframe
column_indices (Union[int,slice,list], optional) – column indices
column_names (bool, optional) – If true, print the column name, by default True
- Raises
ValueError – Error if columns types is not numeric.
- disdrodb.l0.template_tools.print_df_with_any_nan_rows(df: DataFrame) None[source]
Print empty rows
- Parameters
df (pd.DataFrame) – Input dataframe.
- disdrodb.l0.template_tools.print_valid_L0_column_names(sensor_name: str) None[source]
Print valid columns names from the standard.
- Parameters
sensor_name (str) – Name of the sensor.
- disdrodb.l0.template_tools.search_possible_columns(string: str, sensor_name: str) list[source]
Define possible column
- Parameters
string (str) – Inpur string
sensor_name (str) – Name of the sensor
- Returns
list of possible columns
- Return type
list
- disdrodb.l0.template_tools.str_has_decimal_digits(string: str) bool[source]
Check if a string has decimals
- Parameters
string – Input string
- Returns
True if sting has digits.
- Return type
bool
- disdrodb.l0.template_tools.str_is_integer(string: str) bool[source]
Check if a string is an integer
- Parameters
string (Input string) –
- Returns
True if integer.
- Return type
bool
disdrodb.l0.utils_nc module
Module contents
- disdrodb.l0.available_readers(data_sources=None, reader_path=False)[source]
Retrieve available readers information.
- disdrodb.l0.check_archive_metadata_geolocation(disdrodb_dir)[source]
Check the metadata files have missing or wrong geolocation..
- Parameters
disdrodb_dir (str) – Path to the disdrodb directory.
- Returns
If the check succeeds, the result is True, and if it fails, the result is False.
- Return type
bool
- disdrodb.l0.run_disdrodb_l0(disdrodb_dir, data_sources=None, campaign_names=None, station_names=None, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = False, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0 processing of DISDRODB stations.
This function enable to launch the processing of many DISDRODB stations with a single command. From the list of all available DISDRODB stations, it runs the processing of the stations matching the provided data_sources, campaign_names and station_names.
- Parameters
disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_sources (list) – Name of data source(s) to process. The name(s) must be UPPER CASE. If campaign_names and station are not specified, process all stations. The default is None
campaign_names (list) – Name of the campaign(s) to process. The name(s) must be UPPER CASE. The default is None
station_names (list) – Station names to process. The default is None
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –
- Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat = True
The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files. For L0B, it processes just the first 100 rows of 3 L0A files. The default is False.
- disdrodb.l0.run_disdrodb_l0_station(disdrodb_dir, data_source, campaign_name, station_name, l0a_processing: bool = True, l0b_processing: bool = True, l0b_concat: bool = True, remove_l0a: bool = False, remove_l0b: bool = False, force: bool = False, verbose: bool = False, debugging_mode: bool = False, parallel: bool = True)[source]
Run the L0 processing of a specific DISDRODB station from the terminal.
- Parameters
disdrodb_dir (str) – Base directory of DISDRODB Format: <…>/DISDRODB
data_source (str) – Institution name (when campaign data spans more than 1 country), or country (when all campaigns (or sensor networks) are inside a given country). Must be UPPER CASE.
campaign_name (str) – Campaign name. Must be UPPER CASE.
station_name (str) – Station name
l0a_processing (bool) – Whether to launch processing to generate L0A Apache Parquet file(s) from raw data. The default is True.
l0b_processing (bool) – Whether to launch processing to generate L0B netCDF4 file(s) from L0A data. The default is True.
l0b_concat (bool) – Whether to concatenate all raw files into a single L0B netCDF file. If l0b_concat=True, all raw files will be saved into a single L0B netCDF file. If l0b_concat=False, each raw file will be converted into the corresponding L0B netCDF file. The default is False.
remove_l0a (bool) – Whether to keep the L0A files after having generated the L0B netCDF products. The default is False.
remove_l0b (bool) –
- Whether to remove the L0B files after having concatenated all L0B netCDF files.
It takes places only if l0b_concat=True
The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is True.
parallel (bool) – If True, the files are processed simultanously in multiple processes. Each process will use a single thread to avoid issues with the HDF/netCDF library. By default, the number of process is defined with os.cpu_count(). If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. For L0A, it processes just the first 3 raw data files for each station. For L0B, it processes just the first 100 rows of 3 L0A files for each station. The default is False.
- disdrodb.l0.run_l0a(raw_dir, processed_dir, station_name, glob_patterns, column_names, reader_kwargs, df_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]
Run the L0A processing for a specific DISDRODB station.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0A and L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>
column_names (list) – Columns names of the raw text file.
reader_kwargs (dict) – Pandas read_csv arguments to open the text file.
df_sanitizer_fun (object, optional) – Sanitizer function to format the datafame into DISDRODB L0A standard.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 100 rows of 3 raw data files. The default is False.
- disdrodb.l0.run_l0b_from_nc(raw_dir, processed_dir, station_name, glob_patterns, dict_names, ds_sanitizer_fun, parallel, verbose, force, debugging_mode)[source]
Run the L0B processing for a specific DISDRODB station with raw netCDFs.
- Parameters
raw_dir (str) –
The directory path where all the raw content of a specific campaign is stored. The path must have the following structure:
<…>/DISDRODB/Raw/<data_source>/<campaign_name>’.
Inside the raw_dir directory, it is required to adopt the following structure: - /data/<station_name>/<raw_files> - /metadata/<station_name>.yaml Important points: - For each <station_name> there must be a corresponding YAML file in the metadata subfolder. - The <campaign_name> must semantically match between:
the raw_dir and processed_dir directory paths;
with the key ‘campaign_name’ within the metadata YAML files.
The campaign_name are expected to be UPPER CASE.
processed_dir (str) –
The desired directory path for the processed DISDRODB L0B products. The path should have the following structure:
<…>/DISDRODB/Processed/<data_source>/<campaign_name>’
For testing purpose, this function exceptionally accept also a directory path simply ending with <campaign_name> (i.e. /tmp/<campaign_name>).
station_name (str) – Station name
glob_patterns (str) – Glob pattern to search data files in <raw_dir>/data/<station_name>. Example: glob_patterns = “*.nc”
dict_names (dict) – Dictionary mapping raw netCDF variables/coordinates/dimension names to DISDRODB standards.
ds_sanitizer_fun (object, optional) – Sanitizer function to format the raw netCDF into DISDRODB L0B standard.
force (bool) – If True, overwrite existing data into destination directories. If False, raise an error if there are already data into destination directories. The default is False.
verbose (bool) – Whether to print detailed processing information into terminal. The default is False.
parallel (bool) – If True, the files are processed simultanously in multiple processes. The number of simultaneous processes can be customized using the dask.distributed LocalCluster. If False, the files are processed sequentially in a single process. If False, multi-threading is automatically exploited to speed up I/0 tasks.
debugging_mode (bool) – If True, it reduces the amount of data to process. It processes just the first 3 raw netCDF files. The default is False.