Annotation Interface

class korus.database.interface.annotation.AnnotationInterface(backend: TableBackend, taxonomy: TaxonomyInterface, job: JobInterface, file: FileInterface, tag: TagInterface, granularity: GranularityInterface)[source]

Bases: TableInterface

add(row: dict) int[source]

Add a single annotation to the table.

If the deployment ID is not specified, it will be inferred from file ID.

Either the UTC start time of the annotation or the within-file start time must be specified.

If the UTC start time is not specified, it will be inferred from the within-file start time, using the audio file’s UTC start time. Conversely, if the within-file start time is not specified, it will be inferred from the UTC start time.

If the duration is not specified, it is inferred assuming that the annotation extends to the end of the specified audiofile(s).

Args:
row: dict

Input data in the form of a dict, where the keys are the field names and the values are the values to be added to the database.

add_batch(df: DataFrame, progress_bar: bool = False) list[int][source]

Add a batch of annotations to the table

Args:
df: pandas.DataFrame

Annotations to be added to the table.

progress_bar: bool

Whether to display a progress bar.

Returns:
indices: list[int]

Row indices of the added entries

create_selections(indices: list[int], window: float, step: float | None = None, center: bool = False, exclusive: bool = False, num_max: int | None = None, exclude: tuple[str, str] | list[tuple[str, str]] | None = None, data_support: bool = True, progress_bar: bool = False)[source]

Create uniform-length selection windows on a set of annotations.

Args:
indices: list[int]

Annotation indices

window: float

Window size in seconds.

step: float

Step size in seconds. Used for creating temporally translated views of the same annotation. If None, at most one (1) selection will be created per annotation.

center: bool

Align the selection window temporally with the midpoint of the annotation. If False, the temporal alignment will be chosen at random (uniform distribution).

exclusive: bool

If True, the selection window is not allowed to contain anything but the annotated section of data. In other words, the selection window is not allowed extend beyond the start/end point of the annotation. In particular, this means that selections will not be created for annotations shorther than @window_ms. Default is False.

num_max: int

Create at most this many selections.

exclude: tuple[str, str] | list[tuple[str, str]]

Only return selections that have been verified to not contain sounds with this (source,type) label. Note that the requirement extends to all ancestral and descendant nodes in the taxonomy tree. NOT YET IMPLEMENTED.

data_support: bool

If True, selection windows are not allowed to extend beyond the start/end times of the audio files in the database. Default is True.

progress_bar: bool

Whether to display a progress bar. Default is False.

Returns:
: Pandas DataFrame

Selection table with columns sel_id, filename, start, end, annot_id

filter(*conditions: dict, **kwargs)[source]

Search the table.

Note: Search criteria specified by keyword arguments take priority over search criteria specified using the positional arguments. Specifically, keyword search criteria are inserted into every condition dict replacing any pre-existing criteria for the same field.

Args:
conditions: sequence of dict

Search criteria, where the keys are the field names and the values are the search values. Use tuples to search on a range of values and lists to search on multiple values.

Keyword args:
select: tuple | list[tuple]

Select annotations with this (source,type) label. The character ‘*’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the strict argument to change this behaviour.

exclude: tuple | list[tuple]

Exclude annotations with this (source,type) label, but select annotations with this (source,type) excluded_label. The character ‘*’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the strict argument to change this behaviour.

strict: bool

Whether to interpret labels ‘strictly’, meaning that ancestral/descendant nodes in the taxonomy tree are not considered. For example, when filtering on ‘KW’ annotations labelled as ‘SRKW’ will not be selected if strict is set to True. Default is False. NOT YET IMPLEMENTED.

tentative: bool

Whether to filter on tentative label assignments, when available. Default is False.

ambiguous: bool

Whether to also filter on ambiguous label assignments. Default is False.

file: bool

If True, only include annotations pertaining to audio files in the database. Default is False. NOT YET IMPLEMENTED.

taxonomy_version: int

Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest version will be used.

Returns:
self: TableInterface

A reference to this instance

generate_negatives(job_id: int)[source]

Generate negative annotations.

Here, negatives are understood as (uninterrupted) time periods during which no sounds were annotated.

Negatives are added to the annotation table with negative=True.

Args:
job_id: int

Job index

load_raven(path: str, deployment_id: int | None = None, granularity: str = 'unit', taxonomy_version: int | None = None, progress_bar: bool = False)[source]

Load annotations from a RavenPro TSV file.

Checks that the audio files exist in the database and that the labels exist in the taxonomy.

Args:
path: str

Path to the RavenPro file with tab-separated values (TSV).

deployment_id: int

If not specified, the annotation table must contain the column Deployment ID.

granularity: str

Annotation granularity for entries not marked as ‘Batch’ annotations.

taxonomy_version: int

Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest version will be used.

Returns:
df: pandas.DataFrame

The validated annotation table, with the format expected by the add_batch method.

df_raven: pandas.DataFrame

The input table with two extra columns: * Valid (bool): True, if the row was successfully validated. False, if errors were detected. * Errors (str): Errors produced by the validation algorithm.

to_raven(path: str, indices: int | list[int] | None = None)[source]

Export annotations to a TSV file in RavenPro format.

Args:
path: str

Output path

indices: int | list[int]

The indices of the annotations to be exported. If None, all annotations are exported.