Database
- korus.db.add_annotations(conn, annot_tbl, job_id, progress_bar=False, error='replace')[source]
Add a set of annotations to the database.
The annotations must be provided in the form of Pandas DataFrame or Python Dictionary with the following structure,
Annotation table Name
Type
Default Value
Description
job_id
int
Annotation job index
deployment_id
int
Hydrophone deployment index
file_id
int
Audio file index
sound_source
str
Sound-source assignment (confident)
sound_type
str
Sound-type assignment (confident)
tentative_sound_source
str
None
Tentative sound-source assignment
tentative_sound_type
str
None
Tentative sound-type assignment
ambiguous_sound_source
str or list(str)
None
A range of possible sound-source assignments, if specified as a str use commas to separate the individual assignments
ambiguous_sound_type
str or list(str)
None
A range of possible sound-type assignments, if specified as a str use commas to separate the individual assignments
tag
str or list(str)
None
Keywords that can be used in database queries, if specified as a str use commas to separate individual tags
start_utc
datetime or str
Deduced from the audio file timestamp (if available) and the within-file offset (@start_ms)
UTC start time, if specified as a str use a ISO-8601 compatible format
duration_ms
int
Computed as the audio file duration minus the within-file offset (@start_ms)
Duration in milliseconds
start_ms
int
0
Within-file offset in milliseconds
freq_min_hz
int
0
Minimum frequency in Hz
freq_max_hz
int
Nyquist frequency deduced from the audio file sampling rate
Maximum frequency in Hz
channel
int
0
Channel no.
granularity
str
window
Possible values are: call, window, batch, file, encounter
machine_prediction
dict
None
Reserved for predictions made by algorithms and machine-learning models
comments
str
None
Any other relavant observation
Columns without a default value are mandatory; columns with a default value are optional.
deployment_id and start_utc are normally not required, as they are inferred from the file_id, but must be specified in cases where the file_id column is missing, or some rows have invalid/missing file IDs.
Annotations without file IDs are inserted into the database with the ID value 0 (zero).
TODO: chech that tentative (source,type) assignments are more specific cases of confident assignments TODO: check that there are no conflicts with existing annotations in the database
- Args:
- conn: sqlite3.Connection
Database connection
- annot_tbl: pandas DataFrame or dict
Table of annotations.
- deployment_id: int
Deployment unique identifier
- job_id: int
Annotation job unique identifier
- progress_bar: bool
Display progress bar. Default is False.
- error: str
Error handling. NOT YET IMPLEMENTED. Options are:
a/abort: If any of the annotations have invalid data, abort the entire submission
- i/ignore: Ignore any annotations with invalid data, but proceed with
submitting all other annotations to the database
- r/replace: Automatically replace invalid data fields with default values (where possible)
and flag the affected annotations for review; if replacement is not possible, switch to manual mode.
m/manual: Manually review and fix every annotation with invalid data
- Returns:
- annot_ids: list(int)
Unique identifiers assigned to the annotations
- Raises:
- ValueError: If the input table contains annotations with invalid (source,type) assignments.
- Note: this consistency check is only performed for confident and tentative
assignments, not for ambiguous assignments.
AssertionError: If the annotation table does not have the required columns.
- korus.db.add_negatives(conn, job_id)[source]
Auto-generate ‘negative’ annotations for a specified annotation job.
Note: This function should only be called once the annotation job is complete and all annotations have been submitted to the database.
TODO: consider renaming this function to generate_negatives TODO: add option to remove existing, auto-generated negatives for this job
e.g., with a boolen argument @replace with default value True
- Args:
- conn: sqlite3.Connection
Database connection
- job_id: int
Annotation job index. Note that the job must be ‘exhaustive’
- Returns:
- annot_ids: list(int)
Unique identifiers assigned to the annotations
- Raises:
AssertionError: if the job is not ‘exhaustive’
- korus.db.assign_files_to_job(conn, job_id, file_id, channel=0, extendable=True)[source]
Associates a set of audio files with an annotation job.
The association is made by adding (file_id, job_id, channel) tuples to the ‘file_job_relation’ table in the database.
- Args:
- conn: sqlite3.Connection
Database connection
- job_id: int
Annotation job unique identifier
- file_id: int, list(int)
Audio file unique identifier(s)
- channel: int, list(int), list(list(int))
For multi-channel recordings, this allows to specify which channels were inspected as part of the annotation job. Can either be a single int, a list of ints with len(channel) = no. channels, or a nested list of ints with len(channel) = no. files and len(channel[i]) = no. channels for file i.
- extendable: bool
Allow this function to be called multiple times for the same annotation job. True by default. Set to False to help ensure that the database only contains completed annotation jobs.
- Returns:
- counter: int
Number of entries successfully added to the file_job_relation table.
- korus.db.build_file_table(conn, job_id, top=False)[source]
Returns a table with the audio files that were inspected as part of a given annotation job or set of jobs.
The table has the following columns,
file_id (int): audio file unique identifier
deployment_id (int): deployment unique identifier
filename (str): audio filename
relative_path (str): relative path to audio file
sample_rate (int): sampling rate in samples/s
start_utc (datetime): file UTC start time
end_utc (datetime): file UTC end time
channel (str): the channels that were inspected (0;1;…)
Optionally, the following columns may be included,
- top_path (str): path to the top directory, relative to which the
audiofile relative paths are specified.
- Args:
- conn: sqlite3.Connection
Database connection
- job_id: int, list(int)
Annotation job unique identifier(s)
- top: bool
Whether to include the path to the top directory
- Returns:
- file_tbl: pandas.DataFrame
File table
- korus.db.create_db(path)[source]
Create an SQLite database with the Korus schema
- Args:
- path: str
Full path to the database file (.sqlite)
- Returns:
- conn: sqlite3.Connection
Database connection
- korus.db.filter_annotation(conn, source_type=None, exclude=None, tag=None, granularity=None, invert=False, strict=False, tentative=False, ambiguous=False, file=False, valid=False, taxonomy_id=None, job_id=None, deployment_id=None)[source]
Query annotation table by filtering on sound source and sound type.
TODO: implement strict TODO: implement file TODO: implement valid TODO: consider renaming source_type to select
- Args:
- conn: sqlite3.Connection
Database connection
- source_type: tuple, list(tuple)
Select annotations with this (source,type) label. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the @strict argument to change this behaviour.
- exclude: tuple, list(tuple)
Select annotations with this (source,type) exclusion label while also excluding annotations with this (source,type) label. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the @strict argument to change this behaviour.
- tag: str,list(str)
Select annotations with this tag.
- granularity: str, list(str)
Annotation granularity. Options are ‘unit’, ‘window’, ‘file’, ‘batch’, ‘encounter’.
- invert: bool
Invert the label filtering criteria so that annotations with the (source,type) specified by the @source_type argument are excluded rather than selected. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default both ancestral and descendant nodes in the taxonomy tree are considered when performing an inverted search. Use the @strict argument to change this behaviour.
- strict: bool
Whether to interpret labels ‘strictly’, meaning that ancestral/descendant nodes in the taxonomy tree are not considered. For example, when filtering on ‘KW’ annotations labelled as ‘SRKW’ will not be selected if @strict is set to True. Default is False.
- tentative: bool
Whether to filter on tentative label assignments, when available. Default is False.
- ambiguous: bool
Whether to also filter on ambiguous label assignments. Default is False.
- file: bool
If True, exclude annotations pertaining to audio files not present in the database. Default is False. NOT YET IMPLEMENTED.
- valid: bool
If True, exclude annotations with invalid data or flagged as requiring review. Default is False. NOT YET IMPLEMENTED.
- taxonomy_id: int
Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest taxonomy will be used.
- job_id: int, list(int)
Restrict search to the specified annotation job(s).
- deployment_id: int, list(int)
Restrict search to the specified deployment(s).
- Returns:
- indices: list(int)
Annotation indices
- korus.db.filter_files(conn, deployment_id=None, start_utc=None, end_utc=None, job_id=None)[source]
Search for the files in the database based on deployment and time range
- Args:
- conn:
Database connection
- deployment_id: int
Deployment identifier
- start_utc: datetime.datetime
UTC start time
- end_utc: datetime.datetime
UTC end time
- job_id: int
Job identifier
- Returns:
- ids: list(int)
File identifiers matching the search criteria
- korus.db.filter_negative(conn, source_type=None, strict=False, taxonomy_id=None)[source]
Query annotation table by filtering on auto-generated negatives.
- Args:
- conn: sqlite3.Connection
Database connection
- source_type: tuple, list(tuple)
Select auto-generated annotations guaranteed to not contain any sounds of the the class (source,type) or descedant classes. The character ‘%’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the @strict argument to change this behaviour.
- strict: bool
Whether to interpret labels ‘strictly’, meaning that descendant nodes in the taxonomy tree are not considered. For example, when filtering on ‘KW’ annotations labelled as ‘SRKW’ will not be selected if @strict is set to True.
- taxonomy_id: int
Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest taxonomy will be used.
- Returns:
- indices: list(int)
Annotation indices
- korus.db.find_negatives(file_tbl, annot_tbl, max_gap_ms=100, tag_id=1)[source]
Find time periods without annotations, also referred to as ‘negatives’.
- Args:
- file_tbl: pandas.DataFrame
Table of audio files generated by
build_file_table()
.- annot_tbl: pandas.DataFrame
Table of annotations.
- max_gap_ms: int
Negatives are allowed to span multiple audio files (from the same deployment) provided the temporal gap between the files is below this value.
- tag_id: int
Tag index assigned to negatives
- Returns:
- neg_tbl: pandas.DataFrame
Negatives annotation table
- korus.db.get_annotations(conn, indices=None, format='korus', label=None)[source]
Extract annotation data from the database.
TODO: create tests for the case format=”raven”
- Args:
- conn: sqlite3.Connection
Database connection
- indices: list(int)
Indices in the annotation table. Optional.
- format: bool
Currently supported formats are: korus, ketos, raven
- label: int,str
Label assigned to all in the ketos formatted table. Optional.
- Returns:
- annot_tbl: Pandas DataFrame
Annotation table
- korus.db.get_label_id(conn, source_type=None, taxonomy_id=None, ascend=False, descend=False, always_list=False)[source]
Returns the label identifier corresponding to a sound-source, sound-type tag tuple.
If @ascend is set to True, the function will also return the label ids of all the ancestral nodes in the taxonomy tree. For example, if the sound source is specified as SRKW, it will return labels corresponding not only to SRKW, but also KW, Toothed, Cetacean, Mammal, Bio, and Unknown.
If @descend is set to True, the function will also return the label ids of all the descendant nodes in the taxonomy tree. For example, if the sound source is specified as SRKW, it will return labels corresponding not only to SRKW, but also J, K, and L pod.
- Args:
- conn: sqlite3.Connection
Database connection
- source_type: tuple(str, str) or list(tuple)
Sound source and sound type tags. The character ‘%’ can be used as wildcard. For example, use (‘SRKW’,’%’) to retrieve all labels associated with the sound source ‘SRKW’, irrespective of sound type. Multiple source-type pairs can be specified as a list of tuples.
- taxonomy_id: int
Acoustic taxonomy unique identifier. If not specified, the latest taxonomy will be used.
- ascend: bool
Also return the labels of ancestral nodes.
- descend: bool
Also return the labels of descendant nodes.
- always_list: bool
Whether to always return a list. Default is False.
- Returns:
- id: int, list(int)
Label identifier(s)
- Raises:
ValueError: if a label with the specified @source_type does not exist
- korus.db.get_taxonomy(conn, taxonomy_id=None, return_id=False)[source]
Loads the specified acoustic taxonomy from the database.
- Args:
- conn: sqlite3.Connection
Database connection
- taxonomy_id: int
Acoustic taxonomy unique identifier. If not specified, the latest added taxonomy will be loaded.
- return_id: bool
Whether to also return the taxonomy identifier. Default is False.
- Returns:
- tax: kx.AcousticTaxonomy
Acoustic taxonomy
- id: int
Taxonomy identifier. Only returned if @return_id has been set to True.
- korus.db.import_taxonomy(conn, src, name, new_name=None)[source]
Import an acoustic taxonomy.
- Args:
- conn: sqlite3.Connection
Database connection. (The database into which the taxonomy will be imported.)
- src: str
Path to the database file (.sqlite) from which the taxonomy is being imported.
- name: str
Name of the taxonomy.
- new_name: str
Optional field for renaming the taxonomy.
- Returns:
- c: sqlite3.Cursor
Database cursor
- Raises:
- sqlite3.IntegrityError: if the database already contains
a taxonomy with the same name.
- korus.db.insert_job(conn, values)[source]
Insert an annotation job into the database.
- Args:
- conn: sqlite3.Connection
Database connection
- values: dict
Values to be inserted
- Returns:
- c: sqlite3.Cursor
Database cursor
- korus.db.insert_row(conn, table_name, values)[source]
Insert a row of values into a table in the database.
- Args:
- conn: sqlite3.Connection
Database connection
- table_name: str
Table name
- values: dict
Values to be inserted
- Returns:
- c: sqlite3.Cursor
Database cursor
- Raises:
sqlite3.IntegrityError: if the table already contains an entry with these data
- korus.db.insert_taxonomy(conn, tax, comment=None, overwrite=False)[source]
Insert acoustic taxonomy into database.
Also adds all the sound-source, sound-type combinations to the table of allowed labels.
- Args:
- conn: sqlite3.Connection
Database connection
- tax: kx.AcousticTaxonomy
Acoustic taxonomy
- comment: str
Optional field. Typically used for describing the main changes made to the taxonomy since the last version.
- overwrite: bool
Set to True to allow existing entries in the taxonomy table with the same name and version no. to be overwritten.
- Returns:
- c: sqlite3.Cursor
Database cursor
- Raises:
- sqlite3.IntegrityError: if the database already contains
a taxonomy with the same name and version no.