bacpipe.embedding_evaluation.probing.dataset_probe

Functions

generate_annotations_for_probing_task(...[, ...])

probe_dataset_loader(set_name, clean_df, ...)

Create dataset loader object for classification.

Classes

DataLoader(dataset[, batch_size, shuffle, ...])

Data loader combines a dataset and a sampler, and provides an iterable over the given dataset.

Dataset()

An abstract class representing a Dataset.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

ProbeDatasetLoader(class_df, embeds, label2index)

class bacpipe.embedding_evaluation.probing.dataset_probe.ProbeDatasetLoader(class_df, embeds, label2index, set_name=None, **kwargs)[source]

Bases: Dataset

__getitem__(idx)[source]

Iterate through dataset.

Parameters:

idx (int) – index of training step

Returns:

(embedding, true label)

Return type:

tuple

__init__(class_df, embeds, label2index, set_name=None, **kwargs)[source]

Class to initialize and iterate through classification dataset.

Parameters:
  • class_df (pd.DataFrame) – classification dataframe

  • embeds (np.array) – embeddings

  • label2index (dict) – linking labels to integers

  • set_name (string, optional) – train, test or val set, by default None

bacpipe.embedding_evaluation.probing.dataset_probe.generate_annotations_for_probing_task(ground_truth, paths, label_column, dataset_csv_path='probe_annotations.csv', train_ratio=None, test_ratio=None, **kwargs)[source]
bacpipe.embedding_evaluation.probing.dataset_probe.probe_dataset_loader(set_name, clean_df, embeds, label2index, batch_size=64, shuffle=False, **kwargs)[source]

Create dataset loader object for classification.

Parameters:
  • set_name (string) – train, test of val set

  • clean_df (pd.DataFrame) – classification dataframe

  • embeds (np.array) – embeddings

  • label2index (dict) – link labels to ints

  • batch_size (int, optional) – number of embeddings per batch, by default 64

  • shuffle (bool, optional) – shuffle or not, by default False

Returns:

dataset loader object to iterate over during training

Return type:

DataLoader obj