Configuration

Once installed, spacemake configured before running.

After you have installed spacemake as specified here, you are ready to process and analyze spatial samples.

To initialize spacemake cd into the directory in which you want to start spacemake. This directory will be your project_root.

Then simply type:

spacemake init \
   --dropseq_tools <path_to_dropseq_tools_dir>

Here the path_to_dropseq_tools_dir should point to the directory of the downloaded Dropseq-tools package, downloaded in Step 2 of the installation.

Optionally, you can also provide the --download_species flag, which will download Gencode genomes and annotations for mouse and human, and place them under project\_root/species\_data/<species>, where <species> is either mouse or human.

One of the most important parts of spacemake are the so-called ‘shared sample-variables’. These are reusable, user-definable variables, which we can assign to several samples.

They can be shortly defined as follows:

species
a collection of genome, annotation and rRNA_genome. There is no default species, and each sample can have exactly one species.
barcode_flavor
the variable which specifies the structure of Read1 and Read2, namely how the cell_barcode and UMI should be extracted. If no value provided for a sample, the default will be used.
run_mode
each sample can have several run_mode-s, all of which are user definable. If no run_mode-s are specified, a sample will be processed using default run_mode settings.
puck (for spatial samples only)
if a sample is spatial, it has to have a puck variable. If no puck is specified, a default puck will be used.

To add, update, delete or list a shared sample-variable, you can use the following commands:

spacemake config add_<shared-sample-variable>
spacemake config update_<shared-sample-variable>
spacemake config delete_<shared-sample-variable>
spacemake config list_<shared-sample-variable>

where <shared-sample-variable> is one of species, barcode_flavor, run_mode or puck

Configure species

To add species, the following command can be used:

spacemake config add_species \
    --name NAME \         # name of the species to be added
    --genome GENOME \     # path to the genome (.fa) file for the species to
                          # be added
    --annotation ANNOTATION \
                          # path to the annotation (.gtf) file for the species
                          # to be added
    --rRNA_genome RRNA_GENOME
                          # (optional) path to the ribosomal-RNA genome (.fa)
                          # file for the species to be added

The spacemake config update_species takes the same arguments as above, while spacemake config delete_species takes only --name.

To list the currently available species, type:

spacemake config list_species

Configure barcode_flavors

This sample-variable describes how the cell-barcode and the UMI should be extracted from Read1 and Read2. The default value for barcode_flavor will be dropseq: cell_barcode = r1[0:12] (cell-barcode comes from first 12nt of Read1) and UMI = r1[12:20] (UMI comes from the 13-20 nt of Read1).

If a sample has no barcode_flavor provided, the default run_mode will be used

Provided barcode_flavors

Spacemake provides the following barcode_flavors out of the box:

default:
    cell: "r1[0:12]"
    UMI: "r1[12:20]"
slide_seq_14bc:
    cell: "r1[0:14]"
    UMI: "r1[14:23]"
slide_seq_15bc:
    cell: "r1[0:14]"
    UMI: "r1[15:23]"
visium:
    cell: "r1[0:16]"
    UMI: "r1[16:28]"
sc_10x_v2:
    cell: "r1[0:16]"
    UMI: "r1[16:26]"
seq_scope:
    UMI: "r2[0:9]"
    cell: "r1[0:20]"

To list the currently available barcode_flavor-s, type:

spacemake config list_barcode_flavors

Add a new barcode_flavor

spacemake config add_barcode_flavor \
   --name NAME \
      # name of the barcode flavor

   --umi UMI \
      # structure of UMI, using python's list syntax.
      # Example: to set UMI to 13-20 NT of Read1, use --umi r1[12:20].
      # It is also possible to use the first 8nt of Read2 as UMI: --umi r2[0:8].

   --cell_barcode CELL_BARCODE
      # structure of CELL BARCODE, using python's list syntax.
      # Example: to set the cell_barcode to 1-12 nt of Read1, use --cell_barcode r1[0:12].
      # It is also possible to reverse the CELL BARCODE, for instance with r1[0:12][::-1].

Update/delete a barcode_flavor

The spacemake config update_barcode_flavor takes the same arguments as above, while spacemake config delete_barcode_flavor takes only --name.

Configure run_modes

Specifying a “run mode” is an essential flexibity that spacemake offers. Through setting a run_mode, a sample can be processed and analysed downstream in various fashions.

Each run_mode can have the following variables:

n_beads
number of cell-barcode expected
umi_cutoff
a list of integers. downstream the analysis will be run using these UMI cutoffs, that is cell-barcodes with less UMIs will be discarded
clean_dge
whether to clean cell-barcodes from overhang primers, before creating the DGE.
detect_tissue (spatial only)
if True, apart from UMI cutoff spacemake will try to detect the tissue in-silico.
polyA_adapter_trimming
if True 3’ polyA stretches and apaters will be trimmed from Read2.
count_intronic_reads
if True intronic reads will be counted when creating the DGE.
count_mm_reads
if True multi-mappers will be counted. Only those multi-mapping reads will be counted this way, which map to exactly one CDS or UTR segment of a gene.
mesh_data (spatial only)
if True a mesh will be created when running this run_mode.
mesh_type (spatial only)
spacemake currently offers two types of meshes: (1) circle, where circles with a given mesh_spot_diameter_um will be placed in a hexagonal grid, mesh_spot_distance_um distance apart; (2) a hexagonal grid, where equal hexagons with mesh_spot_diameter_um sides will be placed in a full mesh grid, such that the whole area is covered.
mesh_spot_diameter_um (spatial only)
the diameter of the mesh spatial-unit, in microns.
mesh_spot_distance_um (spatial only, only for circle mesh)
distance between the meshed circles, in microns.
parent_run_mode
Each run_mode can have a parent, to which it will fall back. If a one of the run_mode variables is missing, the variable of the parent will be used. If parent is not provided, the default run_mode will be the parent.

Provided run_mode(s)

default:
    n_beads: 100000
    umi_cutoff: [100, 300, 500]
    clean_dge: False
    detect_tissue: False
    polyA_adapter_trimming: True
    count_intronic_reads: True
    count_mm_reads: False
    mesh_data: False
    mesh_type: 'circle'
    mesh_spot_diameter_um: 55
    mesh_spot_distance_um: 100
visium:
    n_beads: 10000
    umi_cutoff: [1000]
    clean_dge: False
    detect_tissue: True
    polyA_adapter_trimming: False
    count_intronic_reads: False
    count_mm_reads: True
slide_seq:
    n_beads: 100000
    umi_cutoff: [50]
    clean_dge: False
    detect_tissue: False
scRNA_seq:
    n_beads: 10000
    umi_cutoff: [500]
    detect_tissue: False
    polyA_adapter_trimming: True
    count_intronic_reads: True
    count_mm_reads: False
seq_scope:
    clean_dge: false
    count_intronic_reads: false
    count_mm_reads: false
    detect_tissue: false
    mesh_data: true
    mesh_spot_diameter_um: 10
    mesh_spot_distance_um: 15
    mesh_type: hexagon
    n_beads: 1000
    umi_cutoff:
    - 100
    - 300

Note

If a sample has no run_mode provided, the default will be used

Note

If a run_mode variable is not provided, the variable of the default run_mode will be used

To list the currently available run_mode-s, type:

spacemake config list_run_modes

Add a new run_mode

See the variable descriptions above.

spacemake config add_run_mode \
   --name NAME \
   --parent_run_mode PARENT_RUN_MODE \
   --umi_cutoff UMI_CUTOFF [UMI_CUTOFF ...] \
   --n_beads N_BEADS \
   --clean_dge {True,true,False,false} \
   --detect_tissue {True,true,False,false} \
   --polyA_adapter_trimming {True,true,False,false} \
   --count_intronic_reads {True,true,False,false} \
   --count_mm_reads {True,true,False,false} \
   --mesh_data {True,true,False,false} \
   --mesh_type {circle,hexagon} \
   --mesh_spot_diameter_um MESH_SPOT_DIAMETER_UM \
   --mesh_spot_distance_um MESH_SPOT_DISTANCE_UM

Update/delete a run_mode

The spacemake config update_run_mode takes the same arguments as above, while spacemake config delete_run_mode takes only --name.

Configure pucks

Each spatial sample, needs to have a puck. The puck sample-variable will define the dimensionality of the underlying spatial structure, which then spacemake will use during the autmated analysis and plotting.

Each puck has the following variables:

  • width_um: the width of the puck, in microns
  • spot_diameter_um: the diameter of bead on this puck, in microns.
  • barcodes (optional): the path to the barcode file, containing the cell_barcode and (x,y) position for each. This is handy, when several pucks have the same barcodes, such as for 10x visium.

Provided pucks

default:
    width_um: 3000
    spot_diameter_um: 10
visium:
    barcodes: 'puck_data/visium_barcode_positions.csv'
    width_um: 6500
    spot_diameter_um: 55
seq_scope:
    width_um: 1000
    spot_diameter_um: 1
slide_seq:
    width_um: 3000
    spot_diameter_um: 10

as you can see, the visium puck comes with a barcodes variable, which points to puck_data/visium_barcode_positions.csv. Upon initiation, this file will automatically placed there by spacemake

To list the currently available puck-s, type:

spacemake config list_pucks

Add a new puck

spacemake config add_puck \
   --name NAME \        # name of the puck
   --width_um WIDTH_UM \
   --spot_diameter_um SPOT_DIAMETER_UM \
   --barcodes BARCODES # path to the barcode file, optional