bacpipe.core.workflows

Functions

clustering_pipeline(model_name, ...[, ...])

Clustering pipeline, generating clusterings based on the settings file.

cross_model_evaluation(dim_reduction_model, ...)

Generate plots to compare models by the specified tasks.

ensure_models_exist(model_base_path, model_names)

Ensure that the model checkpoints for the selected models are available locally.

evaluation_with_settings_already_exists(...)

Check if the evaluation with the specified settings already exists.

generate_embeddings([...])

Run the embedding generation pipeline including classification using the pretrained classifier (if included).

get_model_names(models, audio_dir, ...[, ...])

Get the names of the models used for processing.

ground_truth_by_model(model, audio_dir[, ...])

Generate ground truth labels that are mapped onto the timestamps of a model, based on the model-specific input lengths.

hf_hub_download(repo_id, filename, *[, ...])

Download a given file if it's not already present in the local cache.

make_set_paths_func(audio_dir[, ...])

model_specific_evaluation(loader_dict, ...)

Perform evaluation of the embeddings using the specified evaluation task.

play([bool_save_logs])

Play the bacpipe! The pipeline will run using the models specified in bacpipe.config.models and generate results in the directory bacpipe.settings.results_dir.

plot_comparison(plot_path, models, ...[, ...])

Create big overview visualization of all embeddings spaces.

plot_embeddings(loader, model_name, label_by)

Generate figures and axes to plot points corresponding to embeddings.

probing_pipeline(model_name, ground_truth, ...)

Probing pipeline consisting of building the classifier, evaluating it and saving metrics and plots of performance.

replace_default_kwargs_with_user_kwargs([...])

run_pipeline_for_models(models, audio_dir, ...)

Generate embeddings for each model in the list of model names.

run_pipeline_for_single_model(model_name, ...)

Run the bacpipe pipeline, including embedding generation, classification using the pretrained classifier (if included), dimensionality reduction (if passed), and plotting of visualization to files.

save_logs()

visualise_results_across_models(plot_path, ...)

Create visualizations to compare models by specified tasks.

visualize_using_dashboard(models[, ...])

Create and serve the dashboard for visualization.

Classes

EmbedAndLabelLoader(dim_reduction_model[, ...])

Embedder(model_name[, loader, CustomModel, ...])

This class takes care of loading the specified model and using it to process the audio data to create embeddings.

Loader(audio_dir[, model_name, ...])

Initiate the generation of embedding by creating a Loader object.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

bacpipe.core.workflows.cross_model_evaluation(dim_reduction_model, evaluation_task, models, **kwargs)[source]

Generate plots to compare models by the specified tasks.

Parameters:
  • dim_reduction_model (str) – name of dimensionality reduction model

  • evaluation_task (list) – tasks to evaluate models by

  • models (list) – embedding models

bacpipe.core.workflows.ensure_models_exist(model_base_path, model_names, repo_id='vskode/bacpipe_models')[source]

Ensure that the model checkpoints for the selected models are available locally. Downloads from Hugging Face Hub if missing.

Parameters:
  • model_base_path (Path) – Local base directory where the checkpoints should be stored.

  • model_names (str or list) – Model name or list of model names to run

  • repo_id (str, optional) – Hugging Face Hub repo ID, by default “vinikay/bacpipe_models”

Returns:

path to saved models

Return type:

str

bacpipe.core.workflows.evaluation_with_settings_already_exists(audio_dir, dim_reduction_model, models, testing=False, **kwargs)[source]

Check if the evaluation with the specified settings already exists. The function checks if the embeddings, dimensionality reduction, probing and clustering evaluation results already exist in the specified directory. If any of these results do not exist, the function returns False. Otherwise, it returns True.

Parameters:
  • audio_dir (string) – full path to audio files

  • dim_reduction_model (string) – name of the dimensionality reduction model to be used

  • models (list) – embedding models

Returns:

True if the evaluation with the specified settings

Return type:

bool

bacpipe.core.workflows.generate_embeddings(avoid_pipelined_gpu_inference=False, **kwargs)[source]

Run the embedding generation pipeline including classification using the pretrained classifier (if included). All of this will be done for one model. The predefined folder structure will be created so that subsequent processing runs will be very fast, as they then only load the data. kwargs that are not specifically passed will be taken from bacpipe.config and bacpipe.settings.

Parameters:

avoid_pipelined_gpu_inference (bool, optional) – set to True to avoid multiprocessing, by default False

Returns:

loader object to access embeddings and classifier predictions

Return type:

bacpipe.Loader

Raises:

ValueError – if not model name is provided

bacpipe.core.workflows.get_model_names(models, audio_dir, main_results_dir, embed_parent_dir, already_computed=False, **kwargs)[source]

Get the names of the models used for processing. This is either done by using already computed embeddings or by using the selected models from the config file. If already computed embeddings are used, the model names are extracted from the directory structure.

Parameters:
  • models (list) – list of embedding models

  • audio_dir (string) – full path to audio files

  • main_results_dir (string) – top level directory for the results of the embedding evaluation

  • embed_parent_dir (string) – parent directory for the embeddings

  • already_computed (bool, Default is False) – ignore model list and use only models whos embeddings already have been computed and are saved in the results dir

Raises:

ValueError – If already computed embeddings are used, but no embeddings are found in the specified directory.

bacpipe.core.workflows.model_specific_evaluation(loader_dict, evaluation_task, probe_configs, models, dim_reduction_model=False, **kwargs)[source]

Perform evaluation of the embeddings using the specified evaluation task. The evaluation task can be either probing or clustering. The evaluation is performed using the functions from the probing and clustering modules. The results of the evaluation are saved in the directory specified by the audio_dir parameter.

Parameters:
  • loader_dict (dict) – dictionary containing the loader objects for each model

  • evaluation_task (string) – name of the evaluation task to be performed.

  • probe_configs (dict) – dictionary containing the configuration for the probing tasks. The configurations are specified in the bacpipe/settings.yaml file.

  • models (list) – embedding models

bacpipe.core.workflows.play(bool_save_logs=False, **kwargs)[source]

Play the bacpipe! The pipeline will run using the models specified in bacpipe.config.models and generate results in the directory bacpipe.settings.results_dir. For more details see the ReadMe file on the repository page https://github.com/bioacoustic-ai/bacpipe or the documentation under https://bacpipe.readthedocs.io/en/latest/.

Parameters:

bool_save_logs (bool, optional) – Save logs, config and settings file. This is important if you get a bug, sharing this will be very helpful to find the source of the problem, by default False

Raises:

FileNotFoundError – If no audio files are found we can’t compute any embeddings. So make sure the path is correct :)

bacpipe.core.workflows.run_pipeline_for_models(models, audio_dir, dim_reduction_model, **kwargs)[source]

Generate embeddings for each model in the list of model names. The embeddings are generated using the generate_embeddings function from the generate_embeddings module. The embeddings are saved in the directory specified by the audio_dir parameter. The function returns a dictionary containing the loader objects for each model, by which metadata and paths are stored. kwargs that are not specifically passed will be taken from bacpipe.config and bacpipe.settings.

code example: ``` loader = bacpipe.run_pipeline_for_models(

models=[‘birdnet’, ‘naturebeats’], audio_dir=’bacpipe/tests/test_data’, dim_reduction_model=’umap’

)

# this call will initiate the embedding generation process, it will check if embeddings # already exist for the combination of each model and the dataset and if so it will # be ready to load them. The loader keys will be the model name and the values will # be the loader objects for each model. Each object contains all the information # on the generated embeddings. To name access them: loader[‘birdnet’].embeddings() # this will give you a dictionary with the keys corresponding to embedding files # and the values corresponding to the embeddings as numpy arrays

loader[‘birdnet’].metadata_dict # This will give you a dictionary overview of: # - where the audio data came from, # - where the embeddings were saved # - all the audio files, # - the embedding size of the model, # - the audio file lengths, # - the number of embeddings for each audio files # - the sample rate # - the number of samples per window # - and the total length of the processed dataset in seconds # Thic dictionary is also saved as a yaml file in the directory of the embeddings ```

Parameters:
  • models (list) – embedding models

  • audio_dir (string) – full path to audio files

  • dim_reduction_model (string) – name of the dimensionality reduction model to be used for the embeddings. If “None” is selected, no dimensionality reduction is performed.

Returns:

loader_dict – dictionary containing the loader objects for each model

Return type:

dict

bacpipe.core.workflows.run_pipeline_for_single_model(model_name, audio_dir, dim_reduction_model='None', check_if_already_processed=True, check_if_already_dim_reduced=True, testing=False, **kwargs)[source]

Run the bacpipe pipeline, including embedding generation, classification using the pretrained classifier (if included), dimensionality reduction (if passed), and plotting of visualization to files. All of this will be done for one model. The predefined folder structure will be created so that subsequent processing runs will be very fast, as they then only load the data. kwargs that are not specifically passed will be taken from bacpipe.config and bacpipe.settings.

Parameters:
  • model_name (string) – model name

  • audio_dir (str) – path to audio data

  • dim_reduction_model (str, optional) – name of dimensionality reduction model, by default “None”

  • check_if_already_processed (bool, optional) – set to False if you want to force recomputing of embeddings, by default True

  • check_if_already_dim_reduced (bool, optional) – set to False if you want to force recomputing of dimensionality reduced embeddings, by default True

  • overwrite (bool, optional) – set to True if you want default labels and ground truth labels to be processed again, by default False

  • testing (bool, optional) – set to True for testing, by default False

Returns:

object to processed embeddings and classifier predictions

Return type:

bacpipe.Loader