Utilities

korus.util.collect_audiofile_metadata(path, ext='WAV', timestamp_parser=None, earliest_start_utc=None, latest_start_utc=None, subset=None, tar_path='', progress_bar=False, date_subfolder=False, inspect_files=True, tmp_path='./korus-tmp')[source]

Collect metadata records for all audio files in a specified directory.

In order to extract timestamps embedded in the filenames, you must specify a parser function using the @timestamp_parser argument. This function must take the relative path to the audio file as input (as a string) and return the UTC start time of the file (as a datetime.datetime object).

Args:
path: str

Path to the directory or tar archive where the audio files are stored.

ext: str

Audio file extension. Default is WAV.

timestamp_parser: callable

Function that takes a string as input and returns a datetime.datetime object.

earliest_start_utc: datetime.datetime

Only consider files starting at or after this UTC time.

latest_start_utc: datetime.datetime

Only consider files starting at or before this UTC time.

subset: str, list(str)

File paths relative to the top directory given by the @path argument. Use this argument to restrict attention to a subset of the files.

tar_path: str

Path within tar archive. Only relavant if @path points to a tar archive.

progress_bar: bool

Display progress bar. Default is False.

date_subfolder: bool

If audio files are organized in date-stamped subfolders with format yyyymmdd, and both the earliest and latest start time have been specified, this argument can be used to restrict the search space to only the relevant subfolders. Default is False.

inspect_files: bool

Inspect files to obtain no. samples and sampling rate. If False, the returned metadata table does not have the columns num_samples, sample_rate, and end_utc. Default is True.

tmp_path: str

If the audio files are stored in tar archive, and @inspect_files is True, audio files will be extracted to this folder temporarily to allow the file size and sampling rate to be determined.

Returns:
df: pandas DataFrame

Metadata table

Examples:

korus.util.find_files(path, substr=None, subdirs=False, tar_path='', progress_bar=False)[source]

Search a directory or tar archive for files with a specified sequence of characters in their path.

Args:
path: str

Path to directory or tar archive file

substr: str or list(str)

Search for files that have this string/these strings in their path.

subdirs: bool

If True, also search all subdirectories.

tar_path: str

Path within tar archive. Only relavant if @path points to a tar archive.

progress_bar: bool

Display progress bar. Default is False.

Returns:
files: list (str)

Alphabetically sorted list of relative file paths

Examples:

korus.util.get_num_samples_and_rate(path)[source]

Determine the number of samples and sampling rate of a given audio file.

Args:
path: str

Full path to the audio file

Returns:
: int, int

No. samples and sampling rate in Hz

korus.util.list_to_str(l)[source]

Transform a list to a string, suitably formatted for forming SQLite queries.

Example query: SELECT * FROM y WHERE z IN {list_to_str(x)}

Args:
l: list or numpy array

List of values

Returns:
: str

String

korus.util.parse_timestamp(x, timestamp_parser, progress_bar=False)[source]

Parses timestamps from a list of strings using a user-specified function.

Args:
x: list(str)

Strings to be parsed

timestamp_parser: function

Function that takes a single str as input and returns a datetime object

progress_bar: bool

Display progress bar. Default is False.

Returns:
indices: list(int)

Indices of the strings that were successfully parsed

timestamps: list(datetime)

Parsed datetime values

Examples: