bacpipe.embedding_evaluation.clustering package

Submodules

bacpipe.embedding_evaluation.clustering.cluster module

bacpipe.embedding_evaluation.clustering.cluster.clustering_pipeline(model_name, ground_truth, embeds, paths=None, overwrite=True, label_column='species', **kwargs)[source]

Clustering pipeline, generating clusterings based on the settings file. Clusterings are then evaluated and a dictionary with the evaluation scores is saved and returned

Parameters:

model_name (str) – name of model backbone
ground_truth (dict) – ground truth labels and a label2dict dictionary
embeds (np.array) – embeddings
paths (SimpleNamespace object) – dict with path attributs for saving and loading
overwrite (bool, optional) – whether to overwrite exisiting clustering files, by default False
label_column (str, optional) – name of column in annotations file, defaults to bacpipe.settings.label_column

bacpipe.embedding_evaluation.clustering.cluster.convert_numpy_types(obj)[source]

bacpipe.embedding_evaluation.clustering.cluster.eval_clustering(clusterings, ground_truth=[], embeds=None, default_labels=None, label_column=None, **kwargs)[source]

Evaluate clustering performance.

Parameters:

clusterings (dict) – dictionary with clusterings
ground_truth (list) – ground truth labels
default_labels (dict) – default labels for the dataset
label_column (string) – label type defined in annotations.csv file

Returns:

performance metrics

Return type:

dict

bacpipe.embedding_evaluation.clustering.cluster.eval_with_silhouette(embeds, ground_truth, metrics=None)[source]

Evaluate clustering using Silhouette Score.

Parameters:

embeds (np.ndarray) – embeddings
ground_truth (list) – ground truth array
metrics (dict, optional) – already generated evaluation metrics, if any, by default None

Returns:

evaluation metrics including Silhouette score

Return type:

dict

bacpipe.embedding_evaluation.clustering.cluster.get_clustering_models(clust_params)[source]

Initialize the clustering models specified in settings.yaml

Parameters:: clust_params (dict) – clusterings specified in settings.yaml
Returns:: clustering objects to run the data on
Return type:: dict

bacpipe.embedding_evaluation.clustering.cluster.get_nr_of_clusters(labels, clust_configs, **kwargs)[source]

Get number of clusters either from ground truth or if doesn’t exist from settings.yaml

Parameters:

labels (list) – ground truth labels
clust_configs (dict) – clusterings specified in settings.yaml

Returns:

clustering dict with correct number of clusters

Return type:

dict

bacpipe.embedding_evaluation.clustering.cluster.run_clustering(embeds, cluster_configs, label_column=None, ground_truth=[])[source]

Fit clustering algorithms to embeddings.

Parameters:

embeds (np.array) – embeddings
cluster_configs (dict) – clustering algorithm objects
label_column (string) – label type defined in annotations.csv file
ground_truth (list) – ground truth labels

Returns:

labels accordings to clustering algorithms

Return type:

dict

bacpipe.embedding_evaluation.clustering.cluster.save_clustering_performance(paths, clusterings, metrics, label_column)[source]

Save the clustering performance. A json file for the performance metrics and a npy file with the cluster labels for visualizations.

Parameters:

paths (SimpleNamespace object) – dict with path attributes
clusterings (np.array) – clustering labels
metrics (dict) – clustering performance
label_column (str) – label as defined in annotation.csv file

bacpipe.embedding_evaluation.clustering package

Submodules

bacpipe.embedding_evaluation.clustering.cluster module

Module contents