Annotation Interface
- class korus.database.interface.annotation.AnnotationInterface(backend: TableBackend, taxonomy: TaxonomyInterface, job: JobInterface, file: FileInterface, tag: TagInterface, granularity: GranularityInterface)[source]
Bases:
TableInterface- add(row: dict) int[source]
Add a single annotation to the table.
If the deployment ID is not specified, it will be inferred from file ID.
Either the UTC start time of the annotation or the within-file start time must be specified.
If the UTC start time is not specified, it will be inferred from the within-file start time, using the audio file’s UTC start time. Conversely, if the within-file start time is not specified, it will be inferred from the UTC start time.
If the duration is not specified, it is inferred assuming that the annotation extends to the end of the specified audiofile(s).
- Args:
- row: dict
Input data in the form of a dict, where the keys are the field names and the values are the values to be added to the database.
- add_batch(df: DataFrame, progress_bar: bool = False) list[int][source]
Add a batch of annotations to the table
- Args:
- df: pandas.DataFrame
Annotations to be added to the table.
- progress_bar: bool
Whether to display a progress bar.
- Returns:
- indices: list[int]
Row indices of the added entries
- create_selections(indices: list[int], window: float, step: float | None = None, center: bool = False, exclusive: bool = False, num_max: int | None = None, exclude: tuple[str, str] | list[tuple[str, str]] | None = None, data_support: bool = True, progress_bar: bool = False)[source]
Create uniform-length selection windows on a set of annotations.
- Args:
- indices: list[int]
Annotation indices
- window: float
Window size in seconds.
- step: float
Step size in seconds. Used for creating temporally translated views of the same annotation. If None, at most one (1) selection will be created per annotation.
- center: bool
Align the selection window temporally with the midpoint of the annotation. If False, the temporal alignment will be chosen at random (uniform distribution).
- exclusive: bool
If True, the selection window is not allowed to contain anything but the annotated section of data. In other words, the selection window is not allowed extend beyond the start/end point of the annotation. In particular, this means that selections will not be created for annotations shorther than @window_ms. Default is False.
- num_max: int
Create at most this many selections.
- exclude: tuple[str, str] | list[tuple[str, str]]
Only return selections that have been verified to not contain sounds with this (source,type) label. Note that the requirement extends to all ancestral and descendant nodes in the taxonomy tree. NOT YET IMPLEMENTED.
- data_support: bool
If True, selection windows are not allowed to extend beyond the start/end times of the audio files in the database. Default is True.
- progress_bar: bool
Whether to display a progress bar. Default is False.
- Returns:
- : Pandas DataFrame
Selection table with columns sel_id, filename, start, end, annot_id
- filter(*conditions: dict, **kwargs)[source]
Search the table.
Note: Search criteria specified by keyword arguments take priority over search criteria specified using the positional arguments. Specifically, keyword search criteria are inserted into every condition dict replacing any pre-existing criteria for the same field.
- Args:
- conditions: sequence of dict
Search criteria, where the keys are the field names and the values are the search values. Use tuples to search on a range of values and lists to search on multiple values.
- Keyword args:
- select: tuple | list[tuple]
Select annotations with this (source,type) label. The character ‘*’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the strict argument to change this behaviour.
- exclude: tuple | list[tuple]
Exclude annotations with this (source,type) label, but select annotations with this (source,type) excluded_label. The character ‘*’ can be used as wildcard. Accepts both a single tuple and a list of tuples. By default all descendant nodes in the taxonomy tree are also considered. Use the strict argument to change this behaviour.
- strict: bool
Whether to interpret labels ‘strictly’, meaning that ancestral/descendant nodes in the taxonomy tree are not considered. For example, when filtering on ‘KW’ annotations labelled as ‘SRKW’ will not be selected if strict is set to True. Default is False. NOT YET IMPLEMENTED.
- tentative: bool
Whether to filter on tentative label assignments, when available. Default is False.
- ambiguous: bool
Whether to also filter on ambiguous label assignments. Default is False.
- file: bool
If True, only include annotations pertaining to audio files in the database. Default is False. NOT YET IMPLEMENTED.
- taxonomy_version: int
Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest version will be used.
- Returns:
- self: TableInterface
A reference to this instance
- generate_negatives(job_id: int)[source]
Generate negative annotations.
Here, negatives are understood as (uninterrupted) time periods during which no sounds were annotated.
Negatives are added to the annotation table with negative=True.
- Args:
- job_id: int
Job index
- load_raven(path: str, deployment_id: int | None = None, granularity: str = 'unit', taxonomy_version: int | None = None, progress_bar: bool = False)[source]
Load annotations from a RavenPro TSV file.
Checks that the audio files exist in the database and that the labels exist in the taxonomy.
- Args:
- path: str
Path to the RavenPro file with tab-separated values (TSV).
- deployment_id: int
If not specified, the annotation table must contain the column Deployment ID.
- granularity: str
Annotation granularity for entries not marked as ‘Batch’ annotations.
- taxonomy_version: int
Acoustic taxonomy that the (source,type) label arguments refer to. If not specified, the latest version will be used.
- Returns:
- df: pandas.DataFrame
The validated annotation table, with the format expected by the add_batch method.
- df_raven: pandas.DataFrame
The input table with two extra columns: * Valid (bool): True, if the row was successfully validated. False, if errors were detected. * Errors (str): Errors produced by the validation algorithm.