bacpipe.embedding_evaluation.probing package

Submodules

bacpipe.embedding_evaluation.probing.dataset_probe module

class bacpipe.embedding_evaluation.probing.dataset_probe.ProbeDatasetLoader(class_df, embeds, label2index, set_name=None, **kwargs)[source]

Bases: Dataset

__getitem__(idx)[source]

Iterate through dataset.

Parameters:

idx (int) – index of training step

Returns:

(embedding, true label)

Return type:

tuple

__init__(class_df, embeds, label2index, set_name=None, **kwargs)[source]

Class to initialize and iterate through classification dataset.

Parameters:
  • class_df (pd.DataFrame) – classification dataframe

  • embeds (np.array) – embeddings

  • label2index (dict) – linking labels to integers

  • set_name (string, optional) – train, test or val set, by default None

bacpipe.embedding_evaluation.probing.dataset_probe.generate_annotations_for_probing_task(ground_truth, paths, label_column, dataset_csv_path='probe_annotations.csv', train_ratio=None, test_ratio=None, **kwargs)[source]
bacpipe.embedding_evaluation.probing.dataset_probe.probe_dataset_loader(set_name, clean_df, embeds, label2index, batch_size=64, shuffle=False, **kwargs)[source]

Create dataset loader object for classification.

Parameters:
  • set_name (string) – train, test of val set

  • clean_df (pd.DataFrame) – classification dataframe

  • embeds (np.array) – embeddings

  • label2index (dict) – link labels to ints

  • batch_size (int, optional) – number of embeddings per batch, by default 64

  • shuffle (bool, optional) – shuffle or not, by default False

Returns:

dataset loader object to iterate over during training

Return type:

DataLoader obj

bacpipe.embedding_evaluation.probing.evaluate_probe module

bacpipe.embedding_evaluation.probing.evaluate_probe.accuracy_per_class(y_true, y_pred, label2index, items_per_class)[source]

Accuracy per class

Parameters:
  • y_true (list) – ground truth

  • y_pred (list) – predictions

  • label2index (dict) – link labels to ints

  • items_per_class (list) – number of items per class

Returns:

classwise accuracy

Return type:

dict

bacpipe.embedding_evaluation.probing.evaluate_probe.auc(y_true, probability_scores)[source]

Compute the AUC

bacpipe.embedding_evaluation.probing.evaluate_probe.compute_task_metrics(y_pred, y_true, probability_scores, label2index)[source]

Compute the evaluation metrics

bacpipe.embedding_evaluation.probing.evaluate_probe.eval_probe(probe, embeds, df, label2index, device='cuda:0', config='linear', paths=None, save_probe=False, **kwargs)[source]

Perform inference using probe.

Parameters:
  • probe (object) – trained classification object

  • test_dataloader (DataLoader object) – dataset iterator

  • device (str, optional) – ‘cpu’ or ‘cuda’, by default “cuda:0”

  • config (str, optional) – type of classification, by default “linear”

Returns:

  • list – prediction values in ints corresponding to labels

  • list – ground truth values in ints

  • np.array – probabilities for each class and each embedding

bacpipe.embedding_evaluation.probing.evaluate_probe.macro_accuracy(y_true, y_pred)[source]

Compute macro accuracy.

Parameters:
  • y_true (list) – ground truth

  • y_pred (list) – predictions

Returns:

balance accuracy score

Return type:

float

bacpipe.embedding_evaluation.probing.evaluate_probe.macro_f1(y_true, y_pred)[source]

Compute the macro f1 score

bacpipe.embedding_evaluation.probing.evaluate_probe.micro_accuracy(y_true, y_pred)[source]
bacpipe.embedding_evaluation.probing.evaluate_probe.micro_f1(y_true, y_pred)[source]

Compute the micro f1 score

bacpipe.embedding_evaluation.probing.evaluate_probe.save_probe_results(paths, config, metrics, **kwargs)[source]

Save a dict with all performance metrics.

Parameters:
  • paths (SimpleNamespace object) – dict with attributs of paths for loading and saving

  • config (string) – type of classification (linear or knn)

  • metrics (dict) – performance

bacpipe.embedding_evaluation.probing.inference_probe module

bacpipe.embedding_evaluation.probing.inference_probe.prepare_probe_inference(model, probe_path='')[source]

Load a linear probe that was previously trained and saved. The probe is loaded and the state_dict of the model is loaded so that the probe is ready and in the exact same state as after training.

Parameters:
  • model (str) – model name of backbone

  • probe_path (str, optional) – path to probe, will default to the standard bacpipe path, by default ‘’

Returns:

  • torch model object – linear probe model

  • dict – dictionary to associate the columns of the generated predictions array with the corresponding class label

bacpipe.embedding_evaluation.probing.inference_probe.run_probe_inference(model, linear_probe, threshold, embeds=None, return_binary_presence=True, callbacks=None, device='cpu')[source]

Apply a previously trained linear probe to data. This requires either that the embeddings were already created using the backbone and saved using the bacpipe folder structure, or that the embeddings are directly passed to this function. See the examples notebooks for an example use case. This function then loads the embeddings and applies the linear probe to classify the data.

Parameters:
  • model (str) – model name

  • linear_probe (torch model) – linear probe torch model object

  • threshold (float) – float value to process the predictions

  • embeds (torch.Tensor, optional) – embeddings array, by default None

  • return_binary_presence (bool, optional) – if true a binary presence array is returned, by default True

  • callbacks (function, optional) – use to have custom progress bars increment, by default None

  • device (str, optional) – select device to process the probe, by default ‘cpu’

Returns:

generated probe predictions

Return type:

np.ndarray

bacpipe.embedding_evaluation.probing.probe module

bacpipe.embedding_evaluation.probing.probe.embeds_array_without_noise(embeds, ground_truth, label_column, **kwargs)[source]
bacpipe.embedding_evaluation.probing.probe.prepare_probe_inference(model, probe_path='')[source]
bacpipe.embedding_evaluation.probing.probe.probing_pipeline(model_name, ground_truth, embeds, paths=None, name='linear', overwrite=True, label_column='species', **kwargs)[source]

Probing pipeline consisting of building the classifier, evaluating it and saving metrics and plots of performance.

Parameters:
  • paths (SimpleNamespace object) – dict with attributes corresponding to paths for loading and saving

  • embeds (np.array) – embeddings

  • name (string) – Type of Probing

  • dataset_csv_path (string) – name of Probing dataframe as specified in settings.yaml

  • overwrite (bool) – overwrite existing Probing?, defaults to False

bacpipe.embedding_evaluation.probing.probe.run_probe_inference(model, linear_probe, threshold, embeds=None, return_binary_presence=True, callbacks=None)[source]

bacpipe.embedding_evaluation.probing.train_probe module

class bacpipe.embedding_evaluation.probing.train_probe.KNNProbe(n_neighbors=15, testing=False, **kwargs)[source]

Bases: Module

__init__(n_neighbors=15, testing=False, **kwargs)[source]

K-nearest neighbor classifier.

Parameters:

n_neighbors (int, optional) – hyperparameter specified in settings.yaml file, by default 15

fit(x, y)[source]

Train KNN classifier with numpy data

forward(x)[source]

Predict using KNN (only after it’s trained)

class bacpipe.embedding_evaluation.probing.train_probe.LinearProbe(in_dim, out_dim, device='cpu', **kwargs)[source]

Bases: Module

__init__(in_dim, out_dim, device='cpu', **kwargs)[source]

Linear classification layer.

Parameters:
  • in_dim (int) – number of input dimensions (dictated by embeddings)

  • out_dim (int) – number of output dimensions (dictated by classes in ground truth)

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

bacpipe.embedding_evaluation.probing.train_probe.train_knn_probe(knn_classifier, train_dataloader, device='cpu', **kwargs)[source]

Pipeline for knn classifier training.

Parameters:
  • knn_classifier (object) – classifier object

  • train_dataloader (DataLoader object) – iterator for dataset

  • device (str, optional) – ‘cpu’ or ‘cuda’, by default “cpu”

Returns:

classifier object

Return type:

object

bacpipe.embedding_evaluation.probing.train_probe.train_linear_probe(linear_classifier, train_dataloader, learning_rate, num_epochs, device='cuda:0', **kwargs)[source]

Linear classification training pipeline. Hyperparameters are specified in settings.yaml file and passed to this function.

Parameters:
  • linear_classifier (object) – classification object

  • train_dataloader (DataLoader object) – dataset loader to iterate over

  • learning_rate (float) – learning rate

  • num_epochs (int) – number of epochs for training

  • device (str, optional) – ‘cpu’ or ‘cuda’, by default “cuda:0”

Returns:

trained linear classification object

Return type:

object

bacpipe.embedding_evaluation.probing.train_probe.train_probe(embeds, df, label2index, config='linear', learning_rate=None, num_epochs=None, n_neighbors=None, **kwargs)[source]

Classification pipeline. First the classification dataframe is loaded, then a dict is created to link labels to ints, then the dataset loaders are created to iterate over. Next depending of the specified config a linear or KNN classification is performed. Finally the classifiers are used for inference and based on that performance metrics are created.

Parameters:
  • paths (SimpleNamespace dict) – dictionary object containing paths for loading and saving

  • dataset_csv_path (string) – name of classification dataframe as secified in the settings.yaml file

  • embeds (np.array) – the embeddings

  • config (str, optional) – type of classification, by default ‘linear’

Returns:

performance dictionary

Return type:

dict

Module contents