Database

korus.db.add_annotations(conn, annot_tbl, job_id, progress_bar=False, error='replace')[source]

Add a set of annotations to the database.

The annotations must be provided in the form of Pandas DataFrame or Python Dictionary with the following structure,

Annotation table

Name

Type

Default Value

Description

job_id

int

Annotation job index

deployment_id

int

Hydrophone deployment index

file_id

int

Audio file index

sound_source

str

Sound-source assignment (confident)

sound_type

str

Sound-type assignment (confident)

tentative_sound_source

str

None

Tentative sound-source assignment

tentative_sound_type

str

None

Tentative sound-type assignment

ambiguous_sound_source

str or list(str)

None

A range of possible sound-source assignments, if specified as a str use commas to separate the individual assignments

ambiguous_sound_type

str or list(str)

None

A range of possible sound-type assignments, if specified as a str use commas to separate the individual assignments

tag

str or list(str)

None

Keywords that can be used in database queries, if specified as a str use commas to separate individual tags

start_utc

datetime or str

Deduced from the audio file timestamp (if available) and the within-file offset (@start_ms)

UTC start time, if specified as a str use a ISO-8601 compatible format

duration_ms

int

Computed as the audio file duration minus the within-file offset (@start_ms)

Duration in milliseconds

start_ms

int

0

Within-file offset in milliseconds

freq_min_hz

int

0

Minimum frequency in Hz

freq_max_hz

int

Nyquist frequency deduced from the audio file sampling rate

Maximum frequency in Hz

channel

int

0

Channel no.

granularity

str

window

Possible values are: call, window, batch, file, encounter

machine_prediction

dict

None

Reserved for predictions made by algorithms and machine-learning models

comments

str

None

Any other relavant observation

Columns without a default value are mandatory; columns with a default value are optional.

deployment_id and start_utc are normally not required, as they are inferred from the file_id, but must be specified in cases where the file_id column is missing, or some rows have invalid/missing file IDs.

Annotations without file IDs are inserted into the database with the ID value 0 (zero).

TODO: chech that tentative (source,type) assignments are more specific cases of confident assignments TODO: check that there are no conflicts with existing annotations in the database

Args:
conn: sqlite3.Connection

Database connection

annot_tbl: pandas DataFrame or dict

Table of annotations.

deployment_id: int

Deployment unique identifier

job_id: int

Annotation job unique identifier

progress_bar: bool

Display progress bar. Default is False.

error: str

Error handling. NOT YET IMPLEMENTED. Options are:

  • a/abort: If any of the annotations have invalid data, abort the entire submission

  • i/ignore: Ignore any annotations with invalid data, but proceed with

    submitting all other annotations to the database

  • r/replace: Automatically replace invalid data fields with default values (where possible)

    and flag the affected annotations for review; if replacement is not possible, switch to manual mode.

  • m/manual: Manually review and fix every annotation with invalid data

Returns:
annot_ids: list(int)

Unique identifiers assigned to the annotations

Raises:
ValueError: If the input table contains annotations with invalid (source,type) assignments.
Note: this consistency check is only performed for confident and tentative

assignments, not for ambiguous assignments.

AssertionError: If the annotation table does not have the required columns.

korus.db.add_negatives(conn, job_id)[source]

Auto-generate ‘negative’ annotations for a specified annotation job.

Note: This function should only be called once the annotation job is complete and all annotations have been submitted to the database.

TODO: consider renaming this function to generate_negatives TODO: add option to remove existing, auto-generated negatives for this job

e.g., with a boolen argument @replace with default value True

Args:
conn: sqlite3.Connection

Database connection

job_id: int

Annotation job index. Note that the job must be ‘exhaustive’

Returns:
annot_ids: list(int)

Unique identifiers assigned to the annotations

Raises:

AssertionError: if the job is not ‘exhaustive’

korus.db.assign_files_to_job(conn, job_id, file_id, channel=0, extendable=True)[source]

Associates a set of audio files with an annotation job.

The association is made by adding (file_id, job_id, channel) tuples to the ‘file_job_relation’ table in the database.

Args:
conn: sqlite3.Connection

Database connection

job_id: int

Annotation job unique identifier

file_id: int, list(int)

Audio file unique identifier(s)

channel: int, list(int), list(list(int))

For multi-channel recordings, this allows to specify which channels were inspected as part of the annotation job. Can either be a single int, a list of ints with len(channel) = no. channels, or a nested list of ints with len(channel) = no. files and len(channel[i]) = no. channels for file i.

extendable: bool

Allow this function to be called multiple times for the same annotation job. True by default. Set to False to help ensure that the database only contains completed annotation jobs.

Returns:
counter: int

Number of entries successfully added to the file_job_relation table.

korus.db.build_file_table(conn, job_id, top=False)[source]

Returns a table with the audio files that were inspected as part of a given annotation job or set of jobs.

The table has the following columns,

  • file_id (int): audio file unique identifier

  • deployment_id (int): deployment unique identifier

  • filename (str): audio filename

  • relative_path (str): relative path to audio file

  • sample_rate (int): sampling rate in samples/s

  • start_utc (datetime): file UTC start time

  • end_utc (datetime): file UTC end time

  • channel (str): the channels that were inspected (0;1;…)

Optionally, the following columns may be included,

  • top_path (str): path to the top directory, relative to which the

    audiofile relative paths are specified.

Args:
conn: sqlite3.Connection

Database connection

job_id: int, list(int)

Annotation job unique identifier(s)

top: bool

Whether to include the path to the top directory

Returns:
file_tbl: pandas.DataFrame

File table

korus.db.create_db(path)[source]

Create an SQLite database with the Korus schema

Args:
path: str

Full path to the database file (.sqlite)

Returns:
conn: sqlite3.Connection

Database connection

korus.db.filter_annotation(conn, source_type=None, exclude=None, tag=None, granularity=None, invert=False, strict=False, tentative=False, ambiguous=False, file=False, valid=False, taxonomy_id=None, job_id=None, deployment_id=None)[source]

Query annotation table by filtering on sound source and sound type.

TODO: implement strict TODO: implement file TODO: implement valid TODO: consider renaming source_type to select

Args:
conn: sqlite3.Connection

Database connection

source_type: tuple, list(tuple)

Select annotations with this (source,type) label. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the @strict argument to change this behaviour.

exclude: tuple, list(tuple)

Select annotations with this (source,type) exclusion label while also excluding annotations with this (source,type) label. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the @strict argument to change this behaviour.

tag: str,list(str)

Select annotations with this tag.

granularity: str, list(str)

Annotation granularity. Options are ‘unit’, ‘window’, ‘file’, ‘batch’, ‘encounter’.

invert: bool

Invert the label filtering criteria so that annotations with the (source,type) specified by the @source_type argument are excluded rather than selected. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default both ancestral and descendant nodes in the taxonomy tree are considered when performing an inverted search. Use the @strict argument to change this behaviour.

strict: bool

Whether to interpret labels ‘strictly’, meaning that ancestral/descendant nodes in the taxonomy tree are not considered. For example, when filtering on ‘KW’ annotations labelled as ‘SRKW’ will not be selected if @strict is set to True. Default is False.

tentative: bool

Whether to filter on tentative label assignments, when available. Default is False.

ambiguous: bool

Whether to also filter on ambiguous label assignments. Default is False.

file: bool

If True, exclude annotations pertaining to audio files not present in the database. Default is False. NOT YET IMPLEMENTED.

valid: bool

If True, exclude annotations with invalid data or flagged as requiring review. Default is False. NOT YET IMPLEMENTED.

taxonomy_id: int

Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest taxonomy will be used.

job_id: int, list(int)

Restrict search to the specified annotation job(s).

deployment_id: int, list(int)

Restrict search to the specified deployment(s).

Returns:
indices: list(int)

Annotation indices

korus.db.filter_files(conn, deployment_id=None, start_utc=None, end_utc=None, job_id=None)[source]

Search for the files in the database based on deployment and time range

Args:
conn:

Database connection

deployment_id: int

Deployment identifier

start_utc: datetime.datetime

UTC start time

end_utc: datetime.datetime

UTC end time

job_id: int

Job identifier

Returns:
ids: list(int)

File identifiers matching the search criteria

korus.db.filter_negative(conn, source_type=None, strict=False, taxonomy_id=None)[source]

Query annotation table by filtering on auto-generated negatives.

Args:
conn: sqlite3.Connection

Database connection

source_type: tuple, list(tuple)

Select auto-generated annotations guaranteed to not contain any sounds of the the class (source,type) or descedant classes. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the @strict argument to change this behaviour.

strict: bool

Whether to interpret labels ‘strictly’, meaning that descendant nodes in the taxonomy tree are not considered. For example, when filtering on ‘KW’ annotations labelled as ‘SRKW’ will not be selected if @strict is set to True.

taxonomy_id: int

Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest taxonomy will be used.

Returns:
indices: list(int)

Annotation indices

korus.db.find_negatives(file_tbl, annot_tbl, max_gap_ms=100, tag_id=1)[source]

Find time periods without annotations, also referred to as ‘negatives’.

Args:
file_tbl: pandas.DataFrame

Table of audio files generated by build_file_table().

annot_tbl: pandas.DataFrame

Table of annotations.

max_gap_ms: int

Negatives are allowed to span multiple audio files (from the same deployment) provided the temporal gap between the files is below this value.

tag_id: int

Tag index assigned to negatives

Returns:
neg_tbl: pandas.DataFrame

Negatives annotation table

korus.db.get_annotations(conn, indices=None, format='korus', label=None)[source]

Extract annotation data from the database.

TODO: create tests for the case format=”raven”

Args:
conn: sqlite3.Connection

Database connection

indices: list(int)

Indices in the annotation table. Optional.

format: bool

Currently supported formats are: korus, ketos, raven

label: int,str

Label assigned to all in the ketos formatted table. Optional.

Returns:
annot_tbl: Pandas DataFrame

Annotation table

korus.db.get_label_id(conn, source_type=None, taxonomy_id=None, ascend=False, descend=False, always_list=False)[source]

Returns the label identifier corresponding to a sound-source, sound-type tag tuple.

If @ascend is set to True, the function will also return the label ids of all the ancestral nodes in the taxonomy tree. For example, if the sound source is specified as SRKW, it will return labels corresponding not only to SRKW, but also KW, Toothed, Cetacean, Mammal, Bio, and Unknown.

If @descend is set to True, the function will also return the label ids of all the descendant nodes in the taxonomy tree. For example, if the sound source is specified as SRKW, it will return labels corresponding not only to SRKW, but also J, K, and L pod.

Args:
conn: sqlite3.Connection

Database connection

source_type: tuple(str, str) or list(tuple)

Sound source and sound type tags. The character ‘%’ can be used as wildcard. For example, use (‘SRKW’,’%’) to retrieve all labels associated with the sound source ‘SRKW’, irrespective of sound type. Multiple source-type pairs can be specified as a list of tuples.

taxonomy_id: int

Acoustic taxonomy unique identifier. If not specified, the latest taxonomy will be used.

ascend: bool

Also return the labels of ancestral nodes.

descend: bool

Also return the labels of descendant nodes.

always_list: bool

Whether to always return a list. Default is False.

Returns:
id: int, list(int)

Label identifier(s)

Raises:

ValueError: if a label with the specified @source_type does not exist

korus.db.get_taxonomy(conn, taxonomy_id=None, return_id=False)[source]

Loads the specified acoustic taxonomy from the database.

Args:
conn: sqlite3.Connection

Database connection

taxonomy_id: int

Acoustic taxonomy unique identifier. If not specified, the latest added taxonomy will be loaded.

return_id: bool

Whether to also return the taxonomy identifier. Default is False.

Returns:
tax: kx.AcousticTaxonomy

Acoustic taxonomy

id: int

Taxonomy identifier. Only returned if @return_id has been set to True.

korus.db.import_taxonomy(conn, src, name, new_name=None)[source]

Import an acoustic taxonomy.

Args:
conn: sqlite3.Connection

Database connection. (The database into which the taxonomy will be imported.)

src: str

Path to the database file (.sqlite) from which the taxonomy is being imported.

name: str

Name of the taxonomy.

new_name: str

Optional field for renaming the taxonomy.

Returns:
c: sqlite3.Cursor

Database cursor

Raises:
sqlite3.IntegrityError: if the database already contains

a taxonomy with the same name.

korus.db.insert_job(conn, values)[source]

Insert an annotation job into the database.

Args:
conn: sqlite3.Connection

Database connection

values: dict

Values to be inserted

Returns:
c: sqlite3.Cursor

Database cursor

korus.db.insert_row(conn, table_name, values)[source]

Insert a row of values into a table in the database.

Args:
conn: sqlite3.Connection

Database connection

table_name: str

Table name

values: dict

Values to be inserted

Returns:
c: sqlite3.Cursor

Database cursor

Raises:

sqlite3.IntegrityError: if the table already contains an entry with these data

korus.db.insert_taxonomy(conn, tax, comment=None, overwrite=False)[source]

Insert acoustic taxonomy into database.

Also adds all the sound-source, sound-type combinations to the table of allowed labels.

Args:
conn: sqlite3.Connection

Database connection

tax: kx.AcousticTaxonomy

Acoustic taxonomy

comment: str

Optional field. Typically used for describing the main changes made to the taxonomy since the last version.

overwrite: bool

Set to True to allow existing entries in the taxonomy table with the same name and version no. to be overwritten.

Returns:
c: sqlite3.Cursor

Database cursor

Raises:
sqlite3.IntegrityError: if the database already contains

a taxonomy with the same name and version no.