bacpipe.embedding_evaluation.probing package
Submodules
bacpipe.embedding_evaluation.probing.dataset_probe module
- class bacpipe.embedding_evaluation.probing.dataset_probe.ProbeDatasetLoader(class_df, embeds, label2index, set_name=None, **kwargs)[source]
Bases:
Dataset- __getitem__(idx)[source]
Iterate through dataset.
- Parameters:
idx (int) – index of training step
- Returns:
(embedding, true label)
- Return type:
tuple
- __init__(class_df, embeds, label2index, set_name=None, **kwargs)[source]
Class to initialize and iterate through classification dataset.
- Parameters:
class_df (pd.DataFrame) – classification dataframe
embeds (np.array) – embeddings
label2index (dict) – linking labels to integers
set_name (string, optional) – train, test or val set, by default None
- bacpipe.embedding_evaluation.probing.dataset_probe.generate_annotations_for_probing_task(ground_truth, paths, label_column, dataset_csv_path='probe_annotations.csv', train_ratio=None, test_ratio=None, **kwargs)[source]
- bacpipe.embedding_evaluation.probing.dataset_probe.probe_dataset_loader(set_name, clean_df, embeds, label2index, batch_size=64, shuffle=False, **kwargs)[source]
Create dataset loader object for classification.
- Parameters:
set_name (string) – train, test of val set
clean_df (pd.DataFrame) – classification dataframe
embeds (np.array) – embeddings
label2index (dict) – link labels to ints
batch_size (int, optional) – number of embeddings per batch, by default 64
shuffle (bool, optional) – shuffle or not, by default False
- Returns:
dataset loader object to iterate over during training
- Return type:
DataLoader obj
bacpipe.embedding_evaluation.probing.evaluate_probe module
- bacpipe.embedding_evaluation.probing.evaluate_probe.accuracy_per_class(y_true, y_pred, label2index, items_per_class)[source]
Accuracy per class
- Parameters:
y_true (list) – ground truth
y_pred (list) – predictions
label2index (dict) – link labels to ints
items_per_class (list) – number of items per class
- Returns:
classwise accuracy
- Return type:
dict
- bacpipe.embedding_evaluation.probing.evaluate_probe.auc(y_true, probability_scores)[source]
Compute the AUC
- bacpipe.embedding_evaluation.probing.evaluate_probe.compute_task_metrics(y_pred, y_true, probability_scores, label2index)[source]
Compute the evaluation metrics
- bacpipe.embedding_evaluation.probing.evaluate_probe.eval_probe(probe, embeds, df, label2index, device='cuda:0', config='linear', paths=None, save_probe=False, **kwargs)[source]
Perform inference using probe.
- Parameters:
probe (object) – trained classification object
test_dataloader (DataLoader object) – dataset iterator
device (str, optional) – ‘cpu’ or ‘cuda’, by default “cuda:0”
config (str, optional) – type of classification, by default “linear”
- Returns:
list – prediction values in ints corresponding to labels
list – ground truth values in ints
np.array – probabilities for each class and each embedding
- bacpipe.embedding_evaluation.probing.evaluate_probe.macro_accuracy(y_true, y_pred)[source]
Compute macro accuracy.
- Parameters:
y_true (list) – ground truth
y_pred (list) – predictions
- Returns:
balance accuracy score
- Return type:
float
- bacpipe.embedding_evaluation.probing.evaluate_probe.macro_f1(y_true, y_pred)[source]
Compute the macro f1 score
- bacpipe.embedding_evaluation.probing.evaluate_probe.micro_f1(y_true, y_pred)[source]
Compute the micro f1 score
- bacpipe.embedding_evaluation.probing.evaluate_probe.save_probe_results(paths, config, metrics, **kwargs)[source]
Save a dict with all performance metrics.
- Parameters:
paths (SimpleNamespace object) – dict with attributs of paths for loading and saving
config (string) – type of classification (linear or knn)
metrics (dict) – performance
bacpipe.embedding_evaluation.probing.inference_probe module
- bacpipe.embedding_evaluation.probing.inference_probe.prepare_probe_inference(model, probe_path='')[source]
Load a linear probe that was previously trained and saved. The probe is loaded and the state_dict of the model is loaded so that the probe is ready and in the exact same state as after training.
- Parameters:
model (str) – model name of backbone
probe_path (str, optional) – path to probe, will default to the standard bacpipe path, by default ‘’
- Returns:
torch model object – linear probe model
dict – dictionary to associate the columns of the generated predictions array with the corresponding class label
- bacpipe.embedding_evaluation.probing.inference_probe.run_probe_inference(model, linear_probe, threshold, embeds=None, return_binary_presence=True, callbacks=None, device='cpu')[source]
Apply a previously trained linear probe to data. This requires either that the embeddings were already created using the backbone and saved using the bacpipe folder structure, or that the embeddings are directly passed to this function. See the examples notebooks for an example use case. This function then loads the embeddings and applies the linear probe to classify the data.
- Parameters:
model (str) – model name
linear_probe (torch model) – linear probe torch model object
threshold (float) – float value to process the predictions
embeds (torch.Tensor, optional) – embeddings array, by default None
return_binary_presence (bool, optional) – if true a binary presence array is returned, by default True
callbacks (function, optional) – use to have custom progress bars increment, by default None
device (str, optional) – select device to process the probe, by default ‘cpu’
- Returns:
generated probe predictions
- Return type:
np.ndarray
bacpipe.embedding_evaluation.probing.probe module
- bacpipe.embedding_evaluation.probing.probe.embeds_array_without_noise(embeds, ground_truth, label_column, **kwargs)[source]
- bacpipe.embedding_evaluation.probing.probe.probing_pipeline(model_name, ground_truth, embeds, paths=None, name='linear', overwrite=True, label_column='species', **kwargs)[source]
Probing pipeline consisting of building the classifier, evaluating it and saving metrics and plots of performance.
- Parameters:
paths (SimpleNamespace object) – dict with attributes corresponding to paths for loading and saving
embeds (np.array) – embeddings
name (string) – Type of Probing
dataset_csv_path (string) – name of Probing dataframe as specified in settings.yaml
overwrite (bool) – overwrite existing Probing?, defaults to False
bacpipe.embedding_evaluation.probing.train_probe module
- class bacpipe.embedding_evaluation.probing.train_probe.KNNProbe(n_neighbors=15, testing=False, **kwargs)[source]
Bases:
Module
- class bacpipe.embedding_evaluation.probing.train_probe.LinearProbe(in_dim, out_dim, device='cpu', **kwargs)[source]
Bases:
Module- __init__(in_dim, out_dim, device='cpu', **kwargs)[source]
Linear classification layer.
- Parameters:
in_dim (int) – number of input dimensions (dictated by embeddings)
out_dim (int) – number of output dimensions (dictated by classes in ground truth)
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- bacpipe.embedding_evaluation.probing.train_probe.train_knn_probe(knn_classifier, train_dataloader, device='cpu', **kwargs)[source]
Pipeline for knn classifier training.
- Parameters:
knn_classifier (object) – classifier object
train_dataloader (DataLoader object) – iterator for dataset
device (str, optional) – ‘cpu’ or ‘cuda’, by default “cpu”
- Returns:
classifier object
- Return type:
object
- bacpipe.embedding_evaluation.probing.train_probe.train_linear_probe(linear_classifier, train_dataloader, learning_rate, num_epochs, device='cuda:0', **kwargs)[source]
Linear classification training pipeline. Hyperparameters are specified in settings.yaml file and passed to this function.
- Parameters:
linear_classifier (object) – classification object
train_dataloader (DataLoader object) – dataset loader to iterate over
learning_rate (float) – learning rate
num_epochs (int) – number of epochs for training
device (str, optional) – ‘cpu’ or ‘cuda’, by default “cuda:0”
- Returns:
trained linear classification object
- Return type:
object
- bacpipe.embedding_evaluation.probing.train_probe.train_probe(embeds, df, label2index, config='linear', learning_rate=None, num_epochs=None, n_neighbors=None, **kwargs)[source]
Classification pipeline. First the classification dataframe is loaded, then a dict is created to link labels to ints, then the dataset loaders are created to iterate over. Next depending of the specified config a linear or KNN classification is performed. Finally the classifiers are used for inference and based on that performance metrics are created.
- Parameters:
paths (SimpleNamespace dict) – dictionary object containing paths for loading and saving
dataset_csv_path (string) – name of classification dataframe as secified in the settings.yaml file
embeds (np.array) – the embeddings
config (str, optional) – type of classification, by default ‘linear’
- Returns:
performance dictionary
- Return type:
dict