Annotation Interface

class korus.database.interface.annotation.AnnotationInterface(backend: TableBackend, taxonomy: TaxonomyInterface, job: JobInterface, file: FileInterface, tag: TagInterface, granularity: GranularityInterface)[source]

Bases: TableInterface

add(row: dict) → int[source]

Add a single annotation to the table.

If the deployment ID is not specified, it will be inferred from file ID.

Either the UTC start time of the annotation or the within-file start time must be specified.

If the UTC start time is not specified, it will be inferred from the within-file start time, using the audio file’s UTC start time. Conversely, if the within-file start time is not specified, it will be inferred from the UTC start time.

If the duration is not specified, it is inferred assuming that the annotation extends to the end of the specified audiofile(s).

Args:

row: dict: Input data in the form of a dict, where the keys are the field names and the values are the values to be added to the database.

add_batch(df: DataFrame, progress_bar: bool = False) → list[int][source]

Add a batch of annotations to the table

Args:

df: pandas.DataFrame: Annotations to be added to the table.
progress_bar: bool: Whether to display a progress bar.

Returns:

indices: list[int]: Row indices of the added entries

create_selections(indices: list[int], window: float, step: float | None = None, center: bool = False, exclusive: bool = False, num_max: int | None = None, exclude: tuple[str, str] | list[tuple[str, str]] | None = None, data_support: bool = True, progress_bar: bool = False)[source]

Create uniform-length selection windows on a set of annotations.

Args:

indices: list[int]: Annotation indices
window: float: Window size in seconds.
step: float: Step size in seconds. Used for creating temporally translated views of the same annotation. If None, at most one (1) selection will be created per annotation.
center: bool: Align the selection window temporally with the midpoint of the annotation. If False, the temporal alignment will be chosen at random (uniform distribution).
exclusive: bool: If True, the selection window is not allowed to contain anything but the annotated section of data. In other words, the selection window is not allowed extend beyond the start/end point of the annotation. In particular, this means that selections will not be created for annotations shorther than @window_ms. Default is False.
num_max: int: Create at most this many selections.
exclude: tuple[str, str] | list[tuple[str, str]]: Only return selections that have been verified to not contain sounds with this (source,type) label. Note that the requirement extends to all ancestral and descendant nodes in the taxonomy tree. NOT YET IMPLEMENTED.
data_support: bool: If True, selection windows are not allowed to extend beyond the start/end times of the audio files in the database. Default is True.
progress_bar: bool: Whether to display a progress bar. Default is False.

Returns:

: Pandas DataFrame: Selection table with columns sel_id, filename, start, end, annot_id

filter(*conditions: dict, **kwargs)[source]

Search the table.

Note: Search criteria specified by keyword arguments take priority over search criteria specified using the positional arguments. Specifically, keyword search criteria are inserted into every condition dict replacing any pre-existing criteria for the same field.

Args:

conditions: sequence of dict: Search criteria, where the keys are the field names and the values are the search values. Use tuples to search on a range of values and lists to search on multiple values.

Keyword args:

select: tuple | list[tuple]: Select annotations with this (source,type) label. The character ‘*’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the strict argument to change this behaviour.
exclude: tuple | list[tuple]: Exclude annotations with this (source,type) label, but select annotations with this (source,type) excluded_label. The character ‘*’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the strict argument to change this behaviour.
strict: bool: Whether to interpret labels ‘strictly’, meaning that ancestral/descendant nodes in the taxonomy tree are not considered. For example, when filtering on ‘KW’ annotations labelled as ‘SRKW’ will not be selected if strict is set to True. Default is False. NOT YET IMPLEMENTED.
tentative: bool: Whether to filter on tentative label assignments, when available. Default is False.
ambiguous: bool: Whether to also filter on ambiguous label assignments. Default is False.
file: bool: If True, only include annotations pertaining to audio files in the database. Default is False. NOT YET IMPLEMENTED.
taxonomy_version: int: Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest version will be used.

Returns:

self: TableInterface: A reference to this instance

generate_negatives(job_id: int)[source]

Generate negative annotations.

Here, negatives are understood as (uninterrupted) time periods during which no sounds were annotated.

Negatives are added to the annotation table with negative=True.

Args:

job_id: int: Job index

load_raven(path: str, deployment_id: int | None = None, granularity: str = 'unit', taxonomy_version: int | None = None, progress_bar: bool = False)[source]

Load annotations from a RavenPro TSV file.

Checks that the audio files exist in the database and that the labels exist in the taxonomy.

Args:

path: str: Path to the RavenPro file with tab-separated values (TSV).
deployment_id: int: If not specified, the annotation table must contain the column Deployment ID.
granularity: str: Annotation granularity for entries not marked as ‘Batch’ annotations.
taxonomy_version: int: Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest version will be used.

Returns:

df: pandas.DataFrame: The validated annotation table, with the format expected by the add_batch method.
df_raven: pandas.DataFrame: The input table with two extra columns: * Valid (bool): True, if the row was successfully validated. False, if errors were detected. * Errors (str): Errors produced by the validation algorithm.

to_raven(path: str, indices: int | list[int] | None = None)[source]

Export annotations to a TSV file in RavenPro format.

Args:

path: str: Output path
indices: int | list[int]: The indices of the annotations to be exported. If None, all annotations are exported.