Utilities
- korus.util.collect_audiofile_metadata(path, ext='WAV', timestamp_parser=None, earliest_start_utc=None, latest_start_utc=None, subset=None, tar_path='', progress_bar=False, date_subfolder=False, inspect_files=True, tmp_path='./korus-tmp')[source]
Collect metadata records for all audio files in a specified directory.
In order to extract timestamps embedded in the filenames, you must specify a parser function using the @timestamp_parser argument. This function must take the relative path to the audio file as input (as a string) and return the UTC start time of the file (as a datetime.datetime object).
- Args:
- path: str
Path to the directory or tar archive where the audio files are stored.
- ext: str
Audio file extension. Default is WAV.
- timestamp_parser: callable
Function that takes a string as input and returns a datetime.datetime object.
- earliest_start_utc: datetime.datetime
Only consider files starting at or after this UTC time.
- latest_start_utc: datetime.datetime
Only consider files starting at or before this UTC time.
- subset: str, list(str)
File paths relative to the top directory given by the @path argument. Use this argument to restrict attention to a subset of the files.
- tar_path: str
Path within tar archive. Only relavant if @path points to a tar archive.
- progress_bar: bool
Display progress bar. Default is False.
- date_subfolder: bool
If audio files are organized in date-stamped subfolders with format yyyymmdd, and both the earliest and latest start time have been specified, this argument can be used to restrict the search space to only the relevant subfolders. Default is False.
- inspect_files: bool
Inspect files to obtain no. samples and sampling rate. If False, the returned metadata table does not have the columns num_samples, sample_rate, and end_utc. Default is True.
- tmp_path: str
If the audio files are stored in tar archive, and @inspect_files is True, audio files will be extracted to this folder temporarily to allow the file size and sampling rate to be determined.
- Returns:
- df: pandas DataFrame
Metadata table
Examples:
- korus.util.find_files(path, substr=None, subdirs=False, tar_path='', progress_bar=False)[source]
Search a directory or tar archive for files with a specified sequence of characters in their path.
- Args:
- path: str
Path to directory or tar archive file
- substr: str or list(str)
Search for files that have this string/these strings in their path.
- subdirs: bool
If True, also search all subdirectories.
- tar_path: str
Path within tar archive. Only relavant if @path points to a tar archive.
- progress_bar: bool
Display progress bar. Default is False.
- Returns:
- files: list (str)
Alphabetically sorted list of relative file paths
Examples:
- korus.util.get_num_samples_and_rate(path)[source]
Determine the number of samples and sampling rate of a given audio file.
- Args:
- path: str
Full path to the audio file
- Returns:
- : int, int
No. samples and sampling rate in Hz
- korus.util.list_to_str(l)[source]
Transform a list to a string, suitably formatted for forming SQLite queries.
Example query: SELECT * FROM y WHERE z IN {list_to_str(x)}
- Args:
- l: list or numpy array
List of values
- Returns:
- : str
String
- korus.util.parse_timestamp(x, timestamp_parser, progress_bar=False)[source]
Parses timestamps from a list of strings using a user-specified function.
- Args:
- x: list(str)
Strings to be parsed
- timestamp_parser: function
Function that takes a single str as input and returns a datetime object
- progress_bar: bool
Display progress bar. Default is False.
- Returns:
- indices: list(int)
Indices of the strings that were successfully parsed
- timestamps: list(datetime)
Parsed datetime values
Examples: