bacpipe.embedding_evaluation.clustering package

Submodules

bacpipe.embedding_evaluation.clustering.cluster module

bacpipe.embedding_evaluation.clustering.cluster.clustering_pipeline(model_name, ground_truth, embeds, paths=None, overwrite=True, label_column='species', **kwargs)[source]

Clustering pipeline, generating clusterings based on the settings file. Clusterings are then evaluated and a dictionary with the evaluation scores is saved and returned

Parameters:
  • model_name (str) – name of model backbone

  • ground_truth (dict) – ground truth labels and a label2dict dictionary

  • embeds (np.array) – embeddings

  • paths (SimpleNamespace object) – dict with path attributs for saving and loading

  • overwrite (bool, optional) – whether to overwrite exisiting clustering files, by default False

  • label_column (str, optional) – name of column in annotations file, defaults to bacpipe.settings.label_column

bacpipe.embedding_evaluation.clustering.cluster.convert_numpy_types(obj)[source]
bacpipe.embedding_evaluation.clustering.cluster.eval_clustering(clusterings, ground_truth=[], embeds=None, default_labels=None, label_column=None, **kwargs)[source]

Evaluate clustering performance.

Parameters:
  • clusterings (dict) – dictionary with clusterings

  • ground_truth (list) – ground truth labels

  • default_labels (dict) – default labels for the dataset

  • label_column (string) – label type defined in annotations.csv file

Returns:

performance metrics

Return type:

dict

bacpipe.embedding_evaluation.clustering.cluster.eval_with_silhouette(embeds, ground_truth, metrics=None)[source]

Evaluate clustering using Silhouette Score.

Parameters:
  • embeds (np.ndarray) – embeddings

  • ground_truth (list) – ground truth array

  • metrics (dict, optional) – already generated evaluation metrics, if any, by default None

Returns:

evaluation metrics including Silhouette score

Return type:

dict

bacpipe.embedding_evaluation.clustering.cluster.get_clustering_models(clust_params)[source]

Initialize the clustering models specified in settings.yaml

Parameters:

clust_params (dict) – clusterings specified in settings.yaml

Returns:

clustering objects to run the data on

Return type:

dict

bacpipe.embedding_evaluation.clustering.cluster.get_nr_of_clusters(labels, clust_configs, **kwargs)[source]

Get number of clusters either from ground truth or if doesn’t exist from settings.yaml

Parameters:
  • labels (list) – ground truth labels

  • clust_configs (dict) – clusterings specified in settings.yaml

Returns:

clustering dict with correct number of clusters

Return type:

dict

bacpipe.embedding_evaluation.clustering.cluster.run_clustering(embeds, cluster_configs, label_column=None, ground_truth=[])[source]

Fit clustering algorithms to embeddings.

Parameters:
  • embeds (np.array) – embeddings

  • cluster_configs (dict) – clustering algorithm objects

  • label_column (string) – label type defined in annotations.csv file

  • ground_truth (list) – ground truth labels

Returns:

labels accordings to clustering algorithms

Return type:

dict

bacpipe.embedding_evaluation.clustering.cluster.save_clustering_performance(paths, clusterings, metrics, label_column)[source]

Save the clustering performance. A json file for the performance metrics and a npy file with the cluster labels for visualizations.

Parameters:
  • paths (SimpleNamespace object) – dict with path attributes

  • clusterings (np.array) – clustering labels

  • metrics (dict) – clustering performance

  • label_column (str) – label as defined in annotation.csv file

Module contents