Database

korus.db.add_annotations(conn, annot_tbl, job_id, progress_bar=False, error='replace')[source]

Add a set of annotations to the database.

The annotations must be provided in the form of Pandas DataFrame or Python Dictionary with the following structure,

Annotation table
Name	Type	Default Value	Description
job_id	int		Annotation job index
deployment_id	int		Hydrophone deployment index
file_id	int		Audio file index
sound_source	str		Sound-source assignment (confident)
sound_type	str		Sound-type assignment (confident)
tentative_sound_source	str	None	Tentative sound-source assignment
tentative_sound_type	str	None	Tentative sound-type assignment
ambiguous_sound_source	str or list(str)	None	A range of possible sound-source assignments, if specified as a str use commas to separate the individual assignments
ambiguous_sound_type	str or list(str)	None	A range of possible sound-type assignments, if specified as a str use commas to separate the individual assignments
tag	str or list(str)	None	Keywords that can be used in database queries, if specified as a str use commas to separate individual tags
start_utc	datetime or str	Deduced from the audio file timestamp (if available) and the within-file offset (@start_ms)	UTC start time, if specified as a str use a ISO-8601 compatible format
duration_ms	int	Computed as the audio file duration minus the within-file offset (@start_ms)	Duration in milliseconds
start_ms	int	0	Within-file offset in milliseconds
freq_min_hz	int	0	Minimum frequency in Hz
freq_max_hz	int	Nyquist frequency deduced from the audio file sampling rate	Maximum frequency in Hz
channel	int	0	Channel no.
granularity	str	window	Possible values are: call, window, batch, file, encounter
machine_prediction	dict	None	Reserved for predictions made by algorithms and machine-learning models
comments	str	None	Any other relavant observation

Columns without a default value are mandatory; columns with a default value are optional.

deployment_id and start_utc are normally not required, as they are inferred from the file_id, but must be specified in cases where the file_id column is missing, or some rows have invalid/missing file IDs.

Annotations without file IDs are inserted into the database with the ID value 0 (zero).

TODO: chech that tentative (source,type) assignments are more specific cases of confident assignments TODO: check that there are no conflicts with existing annotations in the database

Args:

conn: sqlite3.Connection

Database connection

annot_tbl: pandas DataFrame or dict

Table of annotations.

deployment_id: int

Deployment unique identifier

job_id: int

Annotation job unique identifier

progress_bar: bool

Display progress bar. Default is False.

error: str

Error handling. NOT YET IMPLEMENTED. Options are:

a/abort: If any of the annotations have invalid data, abort the entire submission

i/ignore: Ignore any annotations with invalid data, but proceed with
submitting all other annotations to the database

r/replace: Automatically replace invalid data fields with default values (where possible)
and flag the affected annotations for review; if replacement is not possible, switch to manual mode.

m/manual: Manually review and fix every annotation with invalid data

Returns:

annot_ids: list(int): Unique identifiers assigned to the annotations

Raises:

ValueError: If the input table contains annotations with invalid (source,type) assignments.

Note: this consistency check is only performed for confident and tentative: assignments, not for ambiguous assignments.

AssertionError: If the annotation table does not have the required columns.

korus.db.add_negatives(conn, job_id)[source]

Auto-generate ‘negative’ annotations for a specified annotation job.

Note: This function should only be called once the annotation job is complete and all annotations have been submitted to the database.

TODO: consider renaming this function to generate_negatives TODO: add option to remove existing, auto-generated negatives for this job

e.g., with a boolen argument @replace with default value True

Args:

conn: sqlite3.Connection: Database connection
job_id: int: Annotation job index. Note that the job must be ‘exhaustive’

Returns:

annot_ids: list(int): Unique identifiers assigned to the annotations

Raises:

AssertionError: if the job is not ‘exhaustive’

korus.db.assign_files_to_job(conn, job_id, file_id, channel=0, extendable=True)[source]

Associates a set of audio files with an annotation job.

The association is made by adding (file_id, job_id, channel) tuples to the ‘file_job_relation’ table in the database.

Args:

conn: sqlite3.Connection: Database connection
job_id: int: Annotation job unique identifier
file_id: int, list(int): Audio file unique identifier(s)
channel: int, list(int), list(list(int)): For multi-channel recordings, this allows to specify which channels were inspected as part of the annotation job. Can either be a single int, a list of ints with len(channel) = no. channels, or a nested list of ints with len(channel) = no. files and len(channel[i]) = no. channels for file i.
extendable: bool: Allow this function to be called multiple times for the same annotation job. True by default. Set to False to help ensure that the database only contains completed annotation jobs.

Returns:

counter: int: Number of entries successfully added to the file_job_relation table.

korus.db.build_file_table(conn, job_id, top=False)[source]

Returns a table with the audio files that were inspected as part of a given annotation job or set of jobs.

The table has the following columns,

file_id (int): audio file unique identifier

deployment_id (int): deployment unique identifier

filename (str): audio filename

relative_path (str): relative path to audio file

sample_rate (int): sampling rate in samples/s

start_utc (datetime): file UTC start time

end_utc (datetime): file UTC end time

channel (str): the channels that were inspected (0;1;…)

Optionally, the following columns may be included,

top_path (str): path to the top directory, relative to which the
audiofile relative paths are specified.

Args:

conn: sqlite3.Connection: Database connection
job_id: int, list(int): Annotation job unique identifier(s)
top: bool: Whether to include the path to the top directory

Returns:

file_tbl: pandas.DataFrame: File table

korus.db.create_db(path)[source]

Create an SQLite database with the Korus schema

Args:

path: str: Full path to the database file (.sqlite)

Returns:

conn: sqlite3.Connection: Database connection

korus.db.filter_annotation(conn, source_type=None, exclude=None, tag=None, granularity=None, invert=False, strict=False, tentative=False, ambiguous=False, file=False, valid=False, taxonomy_id=None, job_id=None, deployment_id=None)[source]

Query annotation table by filtering on sound source and sound type.

TODO: implement strict TODO: implement file TODO: implement valid TODO: consider renaming source_type to select

Args:

conn: sqlite3.Connection: Database connection
source_type: tuple, list(tuple): Select annotations with this (source,type) label. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the @strict argument to change this behaviour.
exclude: tuple, list(tuple): Select annotations with this (source,type) exclusion label while also excluding annotations with this (source,type) label. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the @strict argument to change this behaviour.
tag: str,list(str): Select annotations with this tag.
granularity: str, list(str): Annotation granularity. Options are ‘unit’, ‘window’, ‘file’, ‘batch’, ‘encounter’.
invert: bool: Invert the label filtering criteria so that annotations with the (source,type) specified by the @source_type argument are excluded rather than selected. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default both ancestral and descendant nodes in the taxonomy tree are considered when performing an inverted search. Use the @strict argument to change this behaviour.
strict: bool: Whether to interpret labels ‘strictly’, meaning that ancestral/descendant nodes in the taxonomy tree are not considered. For example, when filtering on ‘KW’ annotations labelled as ‘SRKW’ will not be selected if @strict is set to True. Default is False.
tentative: bool: Whether to filter on tentative label assignments, when available. Default is False.
ambiguous: bool: Whether to also filter on ambiguous label assignments. Default is False.
file: bool: If True, exclude annotations pertaining to audio files not present in the database. Default is False. NOT YET IMPLEMENTED.
valid: bool: If True, exclude annotations with invalid data or flagged as requiring review. Default is False. NOT YET IMPLEMENTED.
taxonomy_id: int: Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest taxonomy will be used.
job_id: int, list(int): Restrict search to the specified annotation job(s).
deployment_id: int, list(int): Restrict search to the specified deployment(s).

Returns:

indices: list(int): Annotation indices

korus.db.filter_files(conn, deployment_id=None, start_utc=None, end_utc=None, job_id=None)[source]

Search for the files in the database based on deployment and time range

Args:

conn:: Database connection
deployment_id: int: Deployment identifier
start_utc: datetime.datetime: UTC start time
end_utc: datetime.datetime: UTC end time
job_id: int: Job identifier

Returns:

ids: list(int): File identifiers matching the search criteria

korus.db.filter_negative(conn, source_type=None, strict=False, taxonomy_id=None)[source]

Query annotation table by filtering on auto-generated negatives.

Args:

conn: sqlite3.Connection: Database connection
source_type: tuple, list(tuple): Select auto-generated annotations guaranteed to not contain any sounds of the the class (source,type) or descedant classes. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the @strict argument to change this behaviour.
strict: bool: Whether to interpret labels ‘strictly’, meaning that descendant nodes in the taxonomy tree are not considered. For example, when filtering on ‘KW’ annotations labelled as ‘SRKW’ will not be selected if @strict is set to True.
taxonomy_id: int: Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest taxonomy will be used.

Returns:

indices: list(int): Annotation indices

korus.db.find_negatives(file_tbl, annot_tbl, max_gap_ms=100, tag_id=1)[source]

Find time periods without annotations, also referred to as ‘negatives’.

Args:

file_tbl: pandas.DataFrame: Table of audio files generated by build_file_table().
annot_tbl: pandas.DataFrame: Table of annotations.
max_gap_ms: int: Negatives are allowed to span multiple audio files (from the same deployment) provided the temporal gap between the files is below this value.
tag_id: int: Tag index assigned to negatives

Returns:

neg_tbl: pandas.DataFrame: Negatives annotation table

korus.db.get_annotations(conn, indices=None, format='korus', label=None)[source]

Extract annotation data from the database.

TODO: create tests for the case format=”raven”

Args:

conn: sqlite3.Connection: Database connection
indices: list(int): Indices in the annotation table. Optional.
format: bool: Currently supported formats are: korus, ketos, raven
label: int,str: Label assigned to all in the ketos formatted table. Optional.

Returns:

annot_tbl: Pandas DataFrame: Annotation table

korus.db.get_label_id(conn, source_type=None, taxonomy_id=None, ascend=False, descend=False, always_list=False)[source]

Returns the label identifier corresponding to a sound-source, sound-type tag tuple.

If @ascend is set to True, the function will also return the label ids of all the ancestral nodes in the taxonomy tree. For example, if the sound source is specified as SRKW, it will return labels corresponding not only to SRKW, but also KW, Toothed, Cetacean, Mammal, Bio, and Unknown.

If @descend is set to True, the function will also return the label ids of all the descendant nodes in the taxonomy tree. For example, if the sound source is specified as SRKW, it will return labels corresponding not only to SRKW, but also J, K, and L pod.

Args:

conn: sqlite3.Connection: Database connection
source_type: tuple(str, str) or list(tuple): Sound source and sound type tags. The character ‘%’ can be used as wildcard. For example, use (‘SRKW’,’%’) to retrieve all labels associated with the sound source ‘SRKW’, irrespective of sound type. Multiple source-type pairs can be specified as a list of tuples.
taxonomy_id: int: Acoustic taxonomy unique identifier. If not specified, the latest taxonomy will be used.
ascend: bool: Also return the labels of ancestral nodes.
descend: bool: Also return the labels of descendant nodes.
always_list: bool: Whether to always return a list. Default is False.

Returns:

id: int, list(int): Label identifier(s)

Raises:

ValueError: if a label with the specified @source_type does not exist

korus.db.get_taxonomy(conn, taxonomy_id=None, return_id=False)[source]

Loads the specified acoustic taxonomy from the database.

Args:

conn: sqlite3.Connection: Database connection
taxonomy_id: int: Acoustic taxonomy unique identifier. If not specified, the latest added taxonomy will be loaded.
return_id: bool: Whether to also return the taxonomy identifier. Default is False.

Returns:

tax: kx.AcousticTaxonomy: Acoustic taxonomy
id: int: Taxonomy identifier. Only returned if @return_id has been set to True.

korus.db.import_taxonomy(conn, src, name, new_name=None)[source]

Import an acoustic taxonomy.

Args:

conn: sqlite3.Connection: Database connection. (The database into which the taxonomy will be imported.)
src: str: Path to the database file (.sqlite) from which the taxonomy is being imported.
name: str: Name of the taxonomy.
new_name: str: Optional field for renaming the taxonomy.

Returns:

c: sqlite3.Cursor: Database cursor

Raises:

sqlite3.IntegrityError: if the database already contains: a taxonomy with the same name.

korus.db.insert_job(conn, values)[source]

Insert an annotation job into the database.

Args:

conn: sqlite3.Connection: Database connection
values: dict: Values to be inserted

Returns:

c: sqlite3.Cursor: Database cursor

korus.db.insert_row(conn, table_name, values)[source]

Insert a row of values into a table in the database.

Args:

conn: sqlite3.Connection: Database connection
table_name: str: Table name
values: dict: Values to be inserted

Returns:

c: sqlite3.Cursor: Database cursor

Raises:

sqlite3.IntegrityError: if the table already contains an entry with these data

korus.db.insert_taxonomy(conn, tax, comment=None, overwrite=False)[source]

Insert acoustic taxonomy into database.

Also adds all the sound-source, sound-type combinations to the table of allowed labels.

Args:

conn: sqlite3.Connection: Database connection
tax: kx.AcousticTaxonomy: Acoustic taxonomy
comment: str: Optional field. Typically used for describing the main changes made to the taxonomy since the last version.
overwrite: bool: Set to True to allow existing entries in the taxonomy table with the same name and version no. to be overwritten.

Returns:

c: sqlite3.Cursor: Database cursor

Raises:

sqlite3.IntegrityError: if the database already contains: a taxonomy with the same name and version no.