Tutorial 2
Retrieving data from the database¶
In this tutorial, we show how to retrieve data from a (rather minimal) Korus example database. Jump to the next tutorial (#3), to see how the database was created and populated with data.
Connecting to the database ¶
We begin by importing the necessary modules and opening a connection to the example database,
import sqlite3
import korus.db as kdb
conn = sqlite3.connect("db_t3.sqlite")
Viewing annotation taxonomies ¶
As you may recall from Tutorial 1, the taxonomy used by the acoustic analyst for labelling sound sources and sound types, is saved to the Korus database. The taxonomy can be readily retrieved using the get_taxonomy
function,
tax = kdb.get_taxonomy(conn)
The get_taxonomy
function returns an instance of the AcousticTaxonomy
class,
type(tax)
korus.tax.AcousticTaxonomy
The show
method helps us visualize the node structure of the taxonomy,
tax.show(append_name=True) #sound-source tree
Unknown ├── Anthro [Anthropogenic] │ └── Boat │ ├── Engine │ └── Prop [Propeller] └── Bio [Biological] └── Whale ├── HW [Humpback whale] └── NARW [North Atlantic right whale]
Retrieving annotations ¶
Korus contains the function filter_annotation
to retrieve annotations based on various search criteria. For example, we can use the filter_annotation
function to retrieve annotations with a specific (sound-source, sound-type) label,
indices = kdb.filter_annotation(conn, select=("NARW","Upcall"))
print(indices)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
Note that the filter_annotation
function merely returns the indices of the annotations that match our search criteria. We can use the get_annotations
function to obtain the data associated with these annotations as a Pandas DataFrame,
df = kdb.get_annotations(conn, indices)
print(df)
job_id deployment_id file_id sound_source sound_type \ 0 1 2 2 NARW LU 1 1 2 2 NARW LU 2 1 2 2 NARW LU 3 1 2 2 NARW LU 4 1 2 2 NARW LU 5 1 2 2 NARW LU 6 1 2 2 NARW LU 7 1 2 2 NARW LU 8 1 2 2 NARW LU 9 1 2 2 NARW LU 10 1 2 2 NARW LU 11 1 2 2 NARW LU 12 1 2 2 NARW LU 13 1 2 3 NARW LU 14 1 2 3 NARW LU 15 1 2 3 NARW LU 16 1 2 3 NARW LU 17 1 2 4 NARW LU 18 1 2 4 NARW LU 19 1 2 4 NARW LU 20 1 2 4 NARW LU 21 1 2 4 NARW LU 22 1 2 4 NARW LU 23 1 2 4 NARW LU 24 1 2 4 NARW LU 25 1 2 4 NARW LU tentative_sound_source tentative_sound_type tag start_utc \ 0 None None None 2013-06-23 08:18:47.456 1 None None None 2013-06-23 08:19:12.142 2 None None None 2013-06-23 08:19:55.394 3 None None None 2013-06-23 08:20:26.258 4 None None None 2013-06-23 08:22:36.797 5 None None None 2013-06-23 08:23:56.098 6 None None None 2013-06-23 08:24:47.904 7 None None None 2013-06-23 08:25:10.286 8 None None None 2013-06-23 08:25:29.211 9 None None None 2013-06-23 08:25:35.196 10 None None None 2013-06-23 08:28:32.988 11 None None None 2013-06-23 08:29:26.867 12 None None None 2013-06-23 08:29:36.451 13 None None None 2013-06-23 08:31:06.766 14 None None None 2013-06-23 08:41:27.124 15 None None None 2013-06-23 08:42:34.557 16 None None None 2013-06-23 08:42:49.057 17 None None None 2013-06-24 08:01:07.471 18 None None None 2013-06-24 08:01:44.545 19 None None None 2013-06-24 08:17:35.633 20 None None None 2013-06-24 08:17:45.900 21 None None None 2013-06-24 08:21:29.181 22 None None None 2013-06-24 08:22:57.573 23 None None None 2013-06-24 08:23:47.266 24 None None None 2013-06-24 08:27:42.240 25 None None None 2013-06-24 08:27:55.300 duration_ms start_ms freq_min_hz freq_max_hz channel granularity \ 0 3000 1127340 0 500 0 window 1 3000 1152026 0 500 0 window 2 3000 1195278 0 500 0 window 3 3000 1226142 0 500 0 window 4 3000 1356681 0 500 0 window 5 3000 1435982 0 500 0 window 6 3000 1487788 0 500 0 window 7 3000 1510170 0 500 0 window 8 3000 1529095 0 500 0 window 9 3000 1535080 0 500 0 window 10 3000 1712872 0 500 0 window 11 3000 1766751 0 500 0 window 12 3000 1776335 0 500 0 window 13 3000 66649 0 500 0 window 14 3000 687007 0 500 0 window 15 3000 754440 0 500 0 window 16 3000 768940 0 500 0 window 17 3000 67353 0 500 0 window 18 3000 104427 0 500 0 window 19 3000 1055515 0 500 0 window 20 3000 1065782 0 500 0 window 21 3000 1289063 0 500 0 window 22 3000 1377455 0 500 0 window 23 3000 1427148 0 500 0 window 24 3000 1662122 0 500 0 window 25 3000 1675182 0 500 0 window machine_prediction comments 0 None None 1 None None 2 None None 3 None None 4 None None 5 None None 6 None None 7 None None 8 None None 9 None None 10 None None 11 None None 12 None None 13 None None 14 None None 15 None None 16 None None 17 None None 18 None None 19 None None 20 None None 21 None None 22 None None 23 None None 24 None None 25 None None
Note that although we filtered on the sound type 'Upcall', as defined in 2nd version of the taxonomy, the filter_annotation
function still helped us find all the annotations that had the "old" sound-type label, 'LU', used in the 1st version of the taxonomy as a label for 'loud' upcalls.
Retrieving annotations in Ketos format ¶
We could also have specified that we wanted the table formatted to be compatible with format used in Ketos,
df_kt, label_dict = kdb.get_annotations(conn, indices, ketos=True)
Note that with ketos=True
the get_annotations
function returns not one, but two arguments. The first argument is the Pandas DataFrame with the annotation data (now in the standard Ketos format) while the second argument is a Python dictionary, mapping each integer label (0,1,2,...) appearing the DataFrame to the corresponding (sound-source, sound-type) label pair. Let's take a look at the dictionary first,
print(label_dict)
{0: 'NARW;LU'}
And then the annotation table,
print(df_kt)
filename dir_path start duration freq_min \ 0 audio_20130623T080000.116Z.flac 20130623 1127.340 3.0 0 1 audio_20130623T080000.116Z.flac 20130623 1152.026 3.0 0 2 audio_20130623T080000.116Z.flac 20130623 1195.278 3.0 0 3 audio_20130623T080000.116Z.flac 20130623 1226.142 3.0 0 4 audio_20130623T080000.116Z.flac 20130623 1356.681 3.0 0 5 audio_20130623T080000.116Z.flac 20130623 1435.982 3.0 0 6 audio_20130623T080000.116Z.flac 20130623 1487.788 3.0 0 7 audio_20130623T080000.116Z.flac 20130623 1510.170 3.0 0 8 audio_20130623T080000.116Z.flac 20130623 1529.095 3.0 0 9 audio_20130623T080000.116Z.flac 20130623 1535.080 3.0 0 10 audio_20130623T080000.116Z.flac 20130623 1712.872 3.0 0 11 audio_20130623T080000.116Z.flac 20130623 1766.751 3.0 0 12 audio_20130623T080000.116Z.flac 20130623 1776.335 3.0 0 13 audio_20130623T083000.117Z.flac 20130623 66.649 3.0 0 14 audio_20130623T083000.117Z.flac 20130623 687.007 3.0 0 15 audio_20130623T083000.117Z.flac 20130623 754.440 3.0 0 16 audio_20130623T083000.117Z.flac 20130623 768.940 3.0 0 17 audio_20130624T080000.118Z.flac 20130624 67.353 3.0 0 18 audio_20130624T080000.118Z.flac 20130624 104.427 3.0 0 19 audio_20130624T080000.118Z.flac 20130624 1055.515 3.0 0 20 audio_20130624T080000.118Z.flac 20130624 1065.782 3.0 0 21 audio_20130624T080000.118Z.flac 20130624 1289.063 3.0 0 22 audio_20130624T080000.118Z.flac 20130624 1377.455 3.0 0 23 audio_20130624T080000.118Z.flac 20130624 1427.148 3.0 0 24 audio_20130624T080000.118Z.flac 20130624 1662.122 3.0 0 25 audio_20130624T080000.118Z.flac 20130624 1675.182 3.0 0 freq_max label comments 0 500 0 None 1 500 0 None 2 500 0 None 3 500 0 None 4 500 0 None 5 500 0 None 6 500 0 None 7 500 0 None 8 500 0 None 9 500 0 None 10 500 0 None 11 500 0 None 12 500 0 None 13 500 0 None 14 500 0 None 15 500 0 None 16 500 0 None 17 500 0 None 18 500 0 None 19 500 0 None 20 500 0 None 21 500 0 None 22 500 0 None 23 500 0 None 24 500 0 None 25 500 0 None
Filtering on tags ¶
In addition to filtering on the standardized (sound-source, sound-type) labels, which can also filter on the free-text tags. For example,
indices = kdb.filter_annotation(conn, tag="NEGATIVE")
print(indices)
[27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54]
As above, we may use the get_annotations
function to retrieve the data associated with the annotations.
df_kt, label_dict = kdb.get_annotations(conn, indices, ketos=True)
print(label_dict)
print(df_kt)
{0: 'NEGATIVE'} filename dir_path start duration freq_min \ 0 audio_20130623T080000.116Z.flac 20130623 0.000 1127.340 0 1 audio_20130623T080000.116Z.flac 20130623 1130.340 21.686 0 2 audio_20130623T080000.116Z.flac 20130623 1155.026 40.252 0 3 audio_20130623T080000.116Z.flac 20130623 1198.278 27.864 0 4 audio_20130623T080000.116Z.flac 20130623 1229.142 127.539 0 5 audio_20130623T080000.116Z.flac 20130623 1359.681 76.301 0 6 audio_20130623T080000.116Z.flac 20130623 1438.982 48.806 0 7 audio_20130623T080000.116Z.flac 20130623 1490.788 19.382 0 8 audio_20130623T080000.116Z.flac 20130623 1513.170 15.925 0 9 audio_20130623T080000.116Z.flac 20130623 1532.095 2.985 0 10 audio_20130623T080000.116Z.flac 20130623 1538.080 174.792 0 11 audio_20130623T080000.116Z.flac 20130623 1715.872 50.879 0 12 audio_20130623T080000.116Z.flac 20130623 1769.751 6.584 0 13 audio_20130623T080000.116Z.flac 20130623 1779.335 87.315 0 14 audio_20130623T083000.117Z.flac 20130623 69.649 617.358 0 15 audio_20130623T083000.117Z.flac 20130623 690.007 64.433 0 16 audio_20130623T083000.117Z.flac 20130623 757.440 11.500 0 17 audio_20130623T083000.117Z.flac 20130623 771.940 1028.115 0 18 audio_20130624T080000.118Z.flac 20130624 0.000 67.353 0 19 audio_20130624T080000.118Z.flac 20130624 70.353 34.074 0 20 audio_20130624T080000.118Z.flac 20130624 107.427 948.088 0 21 audio_20130624T080000.118Z.flac 20130624 1058.515 7.267 0 22 audio_20130624T080000.118Z.flac 20130624 1068.782 220.281 0 23 audio_20130624T080000.118Z.flac 20130624 1292.063 85.392 0 24 audio_20130624T080000.118Z.flac 20130624 1380.455 46.693 0 25 audio_20130624T080000.118Z.flac 20130624 1430.148 231.974 0 26 audio_20130624T080000.118Z.flac 20130624 1665.122 10.060 0 27 audio_20130624T080000.118Z.flac 20130624 1678.182 121.873 0 freq_max label comments 0 500 0 None 1 500 0 None 2 500 0 None 3 500 0 None 4 500 0 None 5 500 0 None 6 500 0 None 7 500 0 None 8 500 0 None 9 500 0 None 10 500 0 None 11 500 0 None 12 500 0 None 13 500 0 None 14 500 0 None 15 500 0 None 16 500 0 None 17 500 0 None 18 500 0 None 19 500 0 None 20 500 0 None 21 500 0 None 22 500 0 None 23 500 0 None 24 500 0 None 25 500 0 None 26 500 0 None 27 500 0 None
Ending the session ¶
As the last step of the tutorial, we close the connection to the database.
conn.close()