Audio
- korus.audio.collect_audiofile_metadata(path: str, ext: str | list[str] = 'WAV', timestamp_parser: callable | None = None, earliest_start_utc: datetime | date | None = None, latest_start_utc: datetime | date | None = None, subset: str | list[str] | None = None, subset_filename: str | list[str] | None = None, tar_path: str = '', progress_bar: bool = False, by_date: bool = False, inspect_files: bool = True, tmp_path: str = './korus-tmp')[source]
Collect metadata records for all audio files in a specified directory.
In order to extract timestamps embedded in the filenames, you must specify a parser function using the @timestamp_parser argument. This function must take the relative path to the audio file as input (as a string) and return the UTC start time of the file (as a datetime.datetime object).
- Args:
- path: str
Path to the directory or tar archive where the audio files are stored.
- ext: str | list[str]
Audio file extension(s). Default is WAV.
- timestamp_parser: callable
Function that takes a string as input and returns a datetime.datetime object.
- earliest_start_utc: datetime.datetime | datetime.date
Only consider files starting at or after this UTC time.
- latest_start_utc: datetime.datetime | datetime.date
Only consider files starting at or before this UTC time.
- subset: str | list(str)
Paths relative to the top directory given by the path argument. Use this argument to restrict attention to a subset of the files.
- subset_filename: str | list(str)
Same as subset except only requires the filename(s) to be specified.
- tar_path: str
Path within tar archive. Only relavant if @path points to a tar archive.
- progress_bar: bool
Display progress bar. Default is False.
- by_date: bool
If audio files are organized in date-stamped subfolders with format yyyymmdd, and both the earliest and latest start time have been specified, this argument can be used to restrict the search space to only the relevant subfolders. Default is False.
- inspect_files: bool
Inspect files to obtain no. samples and sampling rate. If False, the returned metadata table does not have the columns num_samples, sample_rate, and end_utc. Default is True.
- tmp_path: str
If the audio files are stored in tar archive, and @inspect_files is True, audio files will be extracted to this folder temporarily to allow the file size and sampling rate to be determined.
- Returns:
- df: pandas DataFrame
Metadata table
Examples:
- korus.audio.extract_num_samples_and_samplerate(path: str | list[str], base_path: str = '', tmp_path: str = './korus-tmp', progress_bar: bool = False)[source]
Obtain duration and samplerate of a set of audio files
- TODO: implement error handling; return args should include which
files were succesfully read and which could not be read
- Args:
- path: str | list[str]
Relative paths including filename to the audio files
- base_path: str
Top directory
- tmp_path: str
If the audio files are stored in tar archive, and @inspect_files is True, audio files will be extracted to this folder temporarily to allow the file size and sampling rate to be determined.
- progress_bar: bool
Display progress bar. Default is False.
- Returns:
- num_samples: list
Number of samples per file
- sample_rate: list
Samplerate in samples/s.
- korus.audio.find_files(path, substr=None, subdirs=False, tar_path='', progress_bar=False)[source]
Search a directory or tar archive for files with a specified sequence of characters in their path.
- Args:
- path: str
Path to directory or tar archive file
- substr: str | list(str)
Search for files that have this string/these strings in their path.
- subdirs: bool
If True, also search all subdirectories.
- tar_path: str
Path within tar archive. Only relavant if path points to a tar archive.
- progress_bar: bool
Display progress bar. Default is False.
- Returns:
- files: list (str)
Alphabetically sorted list of relative file paths
Examples:
- korus.audio.group_by_date(filenames: list[str], timestamp_parser: callable)[source]
Helper function for grouping audiofiles by their start date.
- Args:
- filenames: list[str]
Filenames or paths
- timestamp_parser: callable
Function that takes a string as input and returns a datetime.datetime object.
- Returns:
- grouped: dict[datetime.date, list[str]]
Dictionary mapping of dates in the form %Y%m%d to filenames. OBS: If timestamp parsing fails for ANY of the files, ALL files are grouped together with null key.
- korus.audio.parse_timestamp(x, timestamp_parser, progress_bar=False)[source]
Parses timestamps from a list of strings using a user-specified function.
- Args:
- x: list(str)
Strings to be parsed
- timestamp_parser: function
Function that takes a single str as input and returns a datetime object
- progress_bar: bool
Display progress bar. Default is False.
- Returns:
- indices: list(int)
Indices of the strings that were successfully parsed
- timestamps: list(datetime)
Parsed datetime values
Examples: