bacpipe.embedding_evaluation.probing.train_probe

Functions

probe_dataset_loader(set_name, clean_df, ...)

Create dataset loader object for classification.

train_knn_probe(knn_classifier, train_dataloader)

Pipeline for knn classifier training.

train_linear_probe(linear_classifier, ...[, ...])

Linear classification training pipeline.

train_probe(embeds, df, label2index[, ...])

Classification pipeline.

Classes

KNNProbe([n_neighbors, testing])

KNeighborsClassifier([n_neighbors, weights, ...])

Classifier implementing the k-nearest neighbors vote.

LinearProbe(in_dim, out_dim[, device])

class bacpipe.embedding_evaluation.probing.train_probe.KNNProbe(n_neighbors=15, testing=False, **kwargs)[source]

Bases: Module

__init__(n_neighbors=15, testing=False, **kwargs)[source]

K-nearest neighbor classifier.

Parameters:

n_neighbors (int, optional) – hyperparameter specified in settings.yaml file, by default 15

fit(x, y)[source]

Train KNN classifier with numpy data

forward(x)[source]

Predict using KNN (only after it’s trained)

class bacpipe.embedding_evaluation.probing.train_probe.LinearProbe(in_dim, out_dim, device='cpu', **kwargs)[source]

Bases: Module

__init__(in_dim, out_dim, device='cpu', **kwargs)[source]

Linear classification layer.

Parameters:
  • in_dim (int) – number of input dimensions (dictated by embeddings)

  • out_dim (int) – number of output dimensions (dictated by classes in ground truth)

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

bacpipe.embedding_evaluation.probing.train_probe.train_knn_probe(knn_classifier, train_dataloader, device='cpu', **kwargs)[source]

Pipeline for knn classifier training.

Parameters:
  • knn_classifier (object) – classifier object

  • train_dataloader (DataLoader object) – iterator for dataset

  • device (str, optional) – ‘cpu’ or ‘cuda’, by default “cpu”

Returns:

classifier object

Return type:

object

bacpipe.embedding_evaluation.probing.train_probe.train_linear_probe(linear_classifier, train_dataloader, learning_rate, num_epochs, device='cuda:0', **kwargs)[source]

Linear classification training pipeline. Hyperparameters are specified in settings.yaml file and passed to this function.

Parameters:
  • linear_classifier (object) – classification object

  • train_dataloader (DataLoader object) – dataset loader to iterate over

  • learning_rate (float) – learning rate

  • num_epochs (int) – number of epochs for training

  • device (str, optional) – ‘cpu’ or ‘cuda’, by default “cuda:0”

Returns:

trained linear classification object

Return type:

object

bacpipe.embedding_evaluation.probing.train_probe.train_probe(embeds, df, label2index, config='linear', learning_rate=None, num_epochs=None, n_neighbors=None, **kwargs)[source]

Classification pipeline. First the classification dataframe is loaded, then a dict is created to link labels to ints, then the dataset loaders are created to iterate over. Next depending of the specified config a linear or KNN classification is performed. Finally the classifiers are used for inference and based on that performance metrics are created.

Parameters:
  • paths (SimpleNamespace dict) – dictionary object containing paths for loading and saving

  • dataset_csv_path (string) – name of classification dataframe as secified in the settings.yaml file

  • embeds (np.array) – the embeddings

  • config (str, optional) – type of classification, by default ‘linear’

Returns:

performance dictionary

Return type:

dict