Internal API

ProjectDF

The ProjectDF class is the core back-end class of spacemake.

class spacemake.project_df.ProjectDF(file_path, config: Optional[ConfigFile] = None)

ProjectDF: class responsible for managing spacemake projects.

Parameters:
  • file_path – path to the project_df.csv file, where we will save the project list.

  • config (ConfigFile) – config file object

  • df (pd.DataFrame) – A pandas dataframe, containing one row per sample

add_sample_sheet(sample_sheet_path, basecalls_dir)

add_sample_sheet.

Parameters:
  • sample_sheet_path

  • basecalls_dir

add_samples_from_yaml(projects_yaml_file)

add_samples_from_yaml.

Parameters:

projects_yaml_file

add_update_sample(action=None, project_id=None, sample_id=None, R1=None, R2=None, reads=None, dge=None, longreads=None, longread_signature=None, sample_sheet=None, basecalls_dir=None, is_merged=False, return_series=False, map_strategy=None, puck_barcode_file=None, puck_barcode_file_id=None, **kwargs)

add_update_sample.

Parameters:
  • action

  • project_id

  • sample_id

  • R1

  • R2

  • reads

  • dge

  • longreads

  • longread_signature

  • sample_sheet

  • basecalls_dir

  • is_merged

  • return_series

  • map_strategy

  • kwargs

assert_projects_samples_exist(project_id_list=[], sample_id_list=[])

assert_projects_samples_exist.

Parameters:
  • project_id_list

  • sample_id_list

assert_valid()

assert_valid.

this function iterates over projects/samples in the project_df, and asserts whether the specified variables are in accordance with the configuration file, and whether the specified files (R1, R2, puck_barcode_files) exist at the specified locations.

compute_max_barcode_mismatch(indices: List[str]) int

compute_max_barcode_mismatch.

Parameters:

indices (List[str]) – List of illumina I7 index barcodes

Returns:

the maximum mismatch to be allowed for this set of index barcodes

Return type:

int

delete_sample(project_id, sample_id)

delete_sample.

Parameters:
  • project_id

  • sample_id

dump()

dump.

find_barcode_file(puck_barcode_file_id: str) str

Tries to find path of a barcode file, using the puck_barcode_file_id.

Parameters:

puck_barcode_file_id (str) – puck_barcode_file_id of the puck we are looking for.

Returns:

path of the puck file, containing barcodes, or None

Return type:

str

get_ix_from_project_sample_list(project_id_list=[], sample_id_list=[])

get_ix_from_project_sample_list.

Parameters:
  • project_id_list

  • sample_id_list

get_metadata(field, project_id=None, sample_id=None, **kwargs)

get_metadata.

Parameters:
  • field

  • project_id

  • sample_id

  • kwargs

get_puck(project_id: str, sample_id: str, return_empty=False) Puck

get_puck.

Parameters:
  • project_id (str) – project_id of a sample

  • sample_id (str) – sample_id of a sample

  • return_empty

Returns:

A Puck object containing puck object

Return type:

Puck

get_puck_variables(project_id: str, sample_id: str, return_empty=False) Dict

get_puck_variables.

Parameters:
  • project_id (str) – project_id of a sample

  • sample_id (str) – sample_id of a sample

  • return_empty

Returns:

A dictionary containing the puck variables of a given sample

Return type:

Dict

get_sample_info(project_id: str, sample_id: str) Dict

get_sample_info.

Parameters:
  • project_id (str) –

  • sample_id (str) –

Returns:

A dictionary containing all the values of a given sample, from the ProjectDF.

Return type:

Dict

hamming_distance(string1: str, string2: str) int

Cacluate hamming distance between two strings

Parameters:
  • string1 (str) –

  • string2 (str) –

Return type:

int

has_dge(project_id: str, sample_id: str) bool

Returns True if a has dge. for Pacbio only samples returns False.

Parameters:
  • project_id (str) –

  • sample_id (str) –

Return type:

bool

is_external(project_id: str, sample_id: str) bool

is_external.

Parameters:
  • project_id (str) –

  • sample_id (str) –

Return type:

bool

is_spatial(project_id: str, sample_id: str, puck_barcode_file_id: str) bool

Returns true if a sample with index (project_id, sample_id) is spatial, meaning that it has spatial barcodes attached. Or, if the puck_barcode_file_id is ‘puck_collection’, meaning that the same is necessarily spatial (transformed from local to global coordinates)

Parameters:
  • project_id (str) –

  • sample_id (str) –

Return type:

bool

merge_samples(merged_project_id, merged_sample_id, project_id_list=[], sample_id_list=[], **kwargs)

merge samples.

Parameters:
  • merged_project_id

  • merged_sample_id

  • project_id_list

  • sample_id_list

  • kwargs

remove_variable(ix, variable_name, variable_key)

remove_variable.

Parameters:
  • ix

  • variable_name

  • variable_key

sample_exists(project_id=None, sample_id=None)

sample_exists.

Parameters:
  • project_id

  • sample_id

set_remove_variable(variable_name, variable_key, action, project_id_list=[], sample_id_list=[], keep_old=False)

set_remove_variable.

Parameters:
  • variable_name

  • variable_key

  • action

  • project_id_list

  • sample_id_list

  • keep_old

set_variable(ix, variable_name, variable_key, keep_old=False)

set_variable.

Parameters:
  • ix

  • variable_name

  • variable_key

  • keep_old

ConfigFile

This class is responsible for updating spacemake’s configuration.

spacemake.config.get_species_parser(required=True)

a parser that allows to add a reference sequence and annotation, belonging to some species