Audio

korus.audio.collect_audiofile_metadata(path: str, ext: str | list[str] = 'WAV', timestamp_parser: callable | None = None, earliest_start_utc: datetime | date | None = None, latest_start_utc: datetime | date | None = None, subset: str | list[str] | None = None, subset_filename: str | list[str] | None = None, tar_path: str = '', progress_bar: bool = False, by_date: bool = False, inspect_files: bool = True, tmp_path: str = './korus-tmp')[source]

Collect metadata records for all audio files in a specified directory.

In order to extract timestamps embedded in the filenames, you must specify a parser function using the @timestamp_parser argument. This function must take the relative path to the audio file as input (as a string) and return the UTC start time of the file (as a datetime.datetime object).

Args:
path: str

Path to the directory or tar archive where the audio files are stored.

ext: str | list[str]

Audio file extension(s). Default is WAV.

timestamp_parser: callable

Function that takes a string as input and returns a datetime.datetime object.

earliest_start_utc: datetime.datetime | datetime.date

Only consider files starting at or after this UTC time.

latest_start_utc: datetime.datetime | datetime.date

Only consider files starting at or before this UTC time.

subset: str | list(str)

Paths relative to the top directory given by the path argument. Use this argument to restrict attention to a subset of the files.

subset_filename: str | list(str)

Same as subset except only requires the filename(s) to be specified.

tar_path: str

Path within tar archive. Only relavant if @path points to a tar archive.

progress_bar: bool

Display progress bar. Default is False.

by_date: bool

If audio files are organized in date-stamped subfolders with format yyyymmdd, and both the earliest and latest start time have been specified, this argument can be used to restrict the search space to only the relevant subfolders. Default is False.

inspect_files: bool

Inspect files to obtain no. samples and sampling rate. If False, the returned metadata table does not have the columns num_samples, sample_rate, and end_utc. Default is True.

tmp_path: str

If the audio files are stored in tar archive, and @inspect_files is True, audio files will be extracted to this folder temporarily to allow the file size and sampling rate to be determined.

Returns:
df: pandas DataFrame

Metadata table

Examples:

korus.audio.extract_num_samples_and_samplerate(path: str | list[str], base_path: str = '', tmp_path: str = './korus-tmp', progress_bar: bool = False)[source]

Obtain duration and samplerate of a set of audio files

TODO: implement error handling; return args should include which

files were succesfully read and which could not be read

Args:
path: str | list[str]

Relative paths including filename to the audio files

base_path: str

Top directory

tmp_path: str

If the audio files are stored in tar archive, and @inspect_files is True, audio files will be extracted to this folder temporarily to allow the file size and sampling rate to be determined.

progress_bar: bool

Display progress bar. Default is False.

Returns:
num_samples: list

Number of samples per file

sample_rate: list

Samplerate in samples/s.

korus.audio.find_files(path, substr=None, subdirs=False, tar_path='', progress_bar=False)[source]

Search a directory or tar archive for files with a specified sequence of characters in their path.

Args:
path: str

Path to directory or tar archive file

substr: str | list(str)

Search for files that have this string/these strings in their path.

subdirs: bool

If True, also search all subdirectories.

tar_path: str

Path within tar archive. Only relavant if path points to a tar archive.

progress_bar: bool

Display progress bar. Default is False.

Returns:
files: list (str)

Alphabetically sorted list of relative file paths

Examples:

korus.audio.group_by_date(filenames: list[str], timestamp_parser: callable)[source]

Helper function for grouping audiofiles by their start date.

Args:
filenames: list[str]

Filenames or paths

timestamp_parser: callable

Function that takes a string as input and returns a datetime.datetime object.

Returns:
grouped: dict[datetime.date, list[str]]

Dictionary mapping of dates in the form %Y%m%d to filenames. OBS: If timestamp parsing fails for ANY of the files, ALL files are grouped together with null key.

korus.audio.parse_timestamp(x, timestamp_parser, progress_bar=False)[source]

Parses timestamps from a list of strings using a user-specified function.

Args:
x: list(str)

Strings to be parsed

timestamp_parser: function

Function that takes a single str as input and returns a datetime object

progress_bar: bool

Display progress bar. Default is False.

Returns:
indices: list(int)

Indices of the strings that were successfully parsed

timestamps: list(datetime)

Parsed datetime values

Examples:

korus.audio.read_num_samples_and_samplerate(path)[source]

Determine the number of samples and sampling rate of a given audio file.

Args:
path: str

Full path to the audio file

Returns:
: int, int

No. samples and sampling rate in Hz