Quick start guide¶

The examples here are minimal code pieces on how to quickly get started with spacemake. It is assumed that spacemake has been instaled following the instructions here.

Initialize spacemake¶

After you have installed spacemake as specified here, you are ready to process and analyze spatial samples.

To initialize spacemake cd into the directory in which you want to start spacemake. This directory will be your project_root.

Then simply type:

spacemake init \
   --dropseq_tools <path_to_dropseq_tools_dir>

Here the path_to_dropseq_tools_dir should point to the directory of the downloaded Dropseq-tools package, downloaded in Step 2 of the installation.

Shared sample-variables¶

One of the most important parts of spacemake are the so-called ‘shared sample-variables’. These are reusable, user-definable variables, which we can assign to several samples.

They can be shortly defined as follows:

species: a collection of genome, annotation and rRNA_genome. There is no default species, and each sample can have exactly one species.
barcode_flavor: the variable which specifies the structure of Read1 and Read2, namely how the cell_barcode and UMI should be extracted. If no value provided for a sample, the default will be used.
run_mode: each sample can have several run_mode-s, all of which are user definable. If no run_mode-s are specified, a sample will be processed using default run_mode settings.
puck (for spatial samples only): if a sample is spatial, it has to have a puck variable. If no puck is specified, a default puck will be used.

To add, update, delete or list a shared sample-variable, you can use the following commands:

spacemake config add_<shared-sample-variable>
spacemake config update_<shared-sample-variable>
spacemake config delete_<shared-sample-variable>
spacemake config list_<shared-sample-variable>

where <shared-sample-variable> is one of species, barcode_flavor, run_mode or puck

As spacemake comes with no default value for species, before anything can be done, a new species has to be added:

spacemake config add_species \
   --name \         # name of the species
   --genome \       # path to .fa file
   --annotation \   # path to .gtf file
   --rRNA_genome \  # (optional) path to ribosomal-RNA genome
   --STAR_index_dir # (optional) path to an existing STAR index directory

More info here.

Warning

If the --STAR_index_dir flag is provided spacemake will check if the STAR index provided has the same version of STAR as the command-line STAR. If this is not the case, an error will be raised.

Visium quick start¶

Step 1: add a Visium sample¶

After spacemake has been initialized, a Visium sample can be added.

To add a Visium sample, type in terminal:

spacemake projects add_sample \
   --project_id <project_id> \
   --sample_id <sample_id> \
   --R1 <path_to_R1.fastq.gz> \ # single R1 or several R1 files
   --R2 <path_to_R2.fastq.gz> \ # single R2 or several R2 files
   --species <species> \
   --puck visium \
   --run_mode visium \
   --barcode_flavor visium

Above we add a new visium project with puck, run_mode, barcode_flavor all set to visium.

This is possible as spacemake comes with pre-defined variables, all suited for visium. The visium run_mode will process the sample in the same way as spaceranger would: intronic reads will not be counted, multi-mappers (where the multi-mapping read maps only to one CDS or UTR region) will be counted, 3’ polyA stretches will not be trimmed from Read2.

Note

With the --R1 and --R2 it is possible to provide a single .fastq.gz file (one per mate) or several files per mate. For example, if the result of a demultiplexing run is as follows:

sample_1a_R1.fastq.gz, sample_1b_R1.fastq.gz, sample_1a_R2.fastq.gz, sample_1b_R2.fastq.gz, meaning that R1 and R2 are both split into two, one can simply call spacemake with the following command:

spacemake projects add_sample \
   ...
   --R1 sample_1a_R1.fastq.gz sample_1b_R1.fastq.gz \
   --R2 sample_1a_R2.fastq.gz sample_1b_R2.fastq.gz \

The important thing is to always keep the order consistent between the two mates.

To see the values of these predefined variables checkout the configuration docs.

To add several visium samples at once, follow the tutorial here

Step 2: running spacemake¶

After a sample is added spacemake can be run with:

spacemake run --cores <n_cores> --keep-going

The --keep-going flag is optional, however it will ensure that spacemake runs all the jobs it can, even if one job fails (this logic is directly taken from snakemake).

For a complete explanation on the spacemake run command check out the documentation here.

Slide-seq quick start¶

Step 1: add a Slide-seq sample¶

After spacemake has been initialized, a Slide-seq sample can be added.

To add a Slide-seq sample, type in terminal:

spacemake projects add_sample \
   --project_id <project_id> \
   --sample_id <sample_id> \
   --R1 <path_to_R1.fastq.gz> \
   --R2 <path_to_R2.fastq.gz> \
   --species <species> \
   --puck slide_seq \
   --run_mode slide_seq \
   --barcode_flavor slide_seq_14bc \
   --puck_barcode_file <path_to_puck_barcode_file>

Above we add a new Slide-seq project with the puck, run_mode will be set to slide_seq which are pre-defined settings for Slide-seq samples.

Note

For spatial samples other than visium - such as Slide-seq - we need to provide a puck_barcode_file (since each puck has different barcodes, unlike for visium samples). This file should be a comma or tab separated, containing column names as first row. Acceptable column names are:

cell_bc, barcodes or barcode for cell-barcode
xcoord or x_pos for x-positions
ycoord or y_pos for y-positions

In this example barcode_flavor will be set to slide_seq_14bc, a pre-defined barcode_flavor in spacemake, where the cell_barcode comes from the first 14nt of Read1, and the UMI comes from nt 13-22 (remaining 9 nt). The other pre-defined barcode_flavor for Slide-seq is slide_seq_15bc: here cell_barcode again comes from the first 14nt of Read1, but the UMI comes from nt 14-22 (remaining 8) of Read1.

To see the values of these predefined variables checkout the configuration docs.

To add several slide_seq projects at once, follow the tutorial here

Step 2: running spacemake¶

After a sample is added spacemake can be run with:

spacemake run --cores <n_cores> --keep-going

The --keep-going flag is optional, however it will ensure that spacemake runs all the jobs it can, even if one job fails (this logic is directly taken from snakemake).

For a complete explanation on the spacemake run command check out the documentation here.

Seq-scope quick start¶

Step 1: add a Seq-scope sample¶

After spacemake has been initialized, a Seq-scope sample can be added.

Adding a Seq-scope sample is similar to Slide-seq:

spacemake projects add_sample \
   --project_id <project_id> \
   --sample_id <sample_id> \
   --R1 <path_to_R1.fastq.gz> \ # single R1 or several R1 files
   --R2 <path_to_R2.fastq.gz> \ # single R2 or several R2 files
   --species <species> \
   --puck seq_scope \
   --run_mode seq_scope \
   --barcode_flavor seq_scope \
   --puck_barcode_file <path_to_puck_barcode_file>

Here we used the pre-defined variables for puck, barcode_flavor and run_mode all set to seq_scope.

The seq_scope puck has 1000 micron width and bead size set to 1 micron.

The seq_scope barcode_flavor describes how the cell_barcode and he UMI should be extracted. As described in the Seq-scope paper. cell_barcode comes from nt 1-20 of Read1, and UMI comes from 1-9nt of Read2.

The seq_scope run_mode has its settings as follows:

seq_scope:
    clean_dge: false
    count_intronic_reads: false
    count_mm_reads: false
    detect_tissue: false
    mesh_data: true
    mesh_spot_diameter_um: 10
    mesh_type: hexagon
    n_beads: 1000
    umi_cutoff:
    - 100
    - 300

The most important thing to notice here that by default, we create a hexagonal mesh with the seq_scope run_mode. This means that downstream rather than with working with the 1 micron beads, spaceame will create a mesh of adjascent, equal hexagons with 10 micron sides.

Step 2: running spacemake¶

After a sample is added spacemake can be run with:

spacemake run --cores <n_cores> --keep-going

The --keep-going flag is optional, however it will ensure that spacemake runs all the jobs it can, even if one job fails (this logic is directly taken from snakemake).

For a complete explanation on the spacemake run command check out the documentation here.

scRNA-seq quick start¶

Step 1: add a single-cell RNA-seq sample¶

spacemake was written as a spatial-transcriptomics pipeline, however it will also work for single-cell experiments, where there is no spatial information available.

To add a scRNA-seq sample, simply type:

spacemake projects add_sample \
   --project_id <project_id> \
   --sample_id <sample_id> \
   --R1 <path_to_R1.fastq.gz> \ # single R1 or several R1 files
   --R2 <path_to_R2.fastq.gz> \ # single R2 or several R2 files
   --species <species> \
   --run_mode scRNA_seq \
   --barcode_flavor default # use other flavors for 10X Chromium

As seen above, we define fewer variables as before: only species and run_mode are needed.

Warning

As it can be seen, we set the barcode_flavor to default, which will use the Drop-seq scRNA-seq barcoding strategy (cell_barcode is 1-12nt of Read1, UMI is 13-20 nt of Read1).

For 10X samples either the sc_10x_v2 (10X Chromium Single Cell 3’ V2) or visium (10X Chromium Single Cell 3’ V3, same as visium) should be be used as barcode_flavor. Both are pre-defined in spacemake.

If another barcoding strategy is used, a custom barcode_flavor has to be defined. Here is a complete guide on how to add a custom barcode_flavor.

Note

By setting run_mode to scRNA_seq we used the pre-defined run_mode settings tailored for single-cell experiments: expected number of cells (or beads) will be 10k, introns will be counted, UMI cutoff will be at 500, multi-mappers will not be counted and polyA and adapter sequences will be trimmed from Read2.

Of course, running single-cell samples with other run_mode settings is also possible.

Here it can be learned how to add a custom run_mode, tailored to the experimental needs.

To see the values of these predefined variables checkout the configuration docs.

To add several single-cell projects at once, follow the tutorial here

Step 2: running spacemake¶

After a sample is added spacemake can be run with:

spacemake run --cores <n_cores> --keep-going

The --keep-going flag is optional, however it will ensure that spacemake runs all the jobs it can, even if one job fails (this logic is directly taken from snakemake).

For a complete explanation on the spacemake run command check out the documentation here.