Configuration
Once installed and initialized, spacemake needs to be configured.
One of the most important parts of spacemake are the so-called ‘shared sample-variables’. These are reusable, user-definable variables, which we can assign to several samples. They can be shortly defined as follows:
speciesa collection of genome, annotation and rRNA_genome. There is no default species, and each sample can have exactly one species.
barcode-flavorthe variable which specifies the structure of Read1 and Read2, namely how the cell-barcode and UMI should be extracted. If no value provided for a sample, the default will be used.
run-modeeach sample can have several
run-mode``s, all of which are user definable. If no ``run-mode``s are specified, a sample will be processed using ``defaultrun-modesettings.puck(spatial only)if a sample is spatial, it has to have a puck variable. If no puck is specified, a default puck will be used.
To add, update, delete or list a shared sample-variable, you can use the following commands:
spacemake config add-<shared-sample-variable>
spacemake config update-<shared-sample-variable>
spacemake config delete-<shared-sample-variable>
spacemake config list-<shared-sample-variable>
where <shared-sample-variable> is one of species, barcode-flavor, run-mode or puck
Configure species
To add species, the following command can be used:
spacemake config add_species \
--name NAME \ # name of the species to be added
--reference REF \ # name of the reference sequence
# ('genome', 'rRNA', 'spike_in', ...)
# if omitted defaults to 'genome'
--sequence SEQUENCE \ # path to the reference sequence file
# (.fa) to be added
--genome SEQUENCE \ # DEPRECATED! Please use --sequence instead.
--annotation ANNOTATION \
# path to the annotation (.gtf) file for the species
# to be added
The spacemake config update-species takes the same arguments as above, while spacemake config delete-species takes only --name.
As of version 0.7 you can add multiple reference sequences per species. For that,
simply execute add-species multiple times, varying --reference ... but keeping --name constant.
To list the currently available species, type:
spacemake config list-species
Configure adapter-flavors and pre-processing
Spacemake allows to pre-process raw reads based on adapter-flavors. An adapter-flavor describes how adapters and polyA stretches should be trimmed from the cDNA read (usually read2). The complete set of operations that can be performed are: - trim polyA stretches - trim adapters - clip low-quality bases - clip fixed number of bases from either end of read2
Access to these operations is provided through the adapter-flavors section of config.yaml only. Here is an example of an adapter-flavor:
adapter_flavors:
example:
- nextseq_quality:
cutoff: 25
- polyA:
- adapter:
name: SMART
seq: AAGCAGTGGTATCAACGCAGAGTGAATGGG
where: left
min_overlap: 10
max_errors: 0.1
Below follows a list of each operation and the supported parameters and default values.
quality
Trim low-quality bases from 3’ end and/or 5’ end of read. Functionality is provided by cutadapt.
Two parameters are supported: left and right, which define the quality threshold below which bases will be trimmed from the 5’ and 3’ end of read2, respectively.
Default is left: 0 and right: 25.
nextseq_quality
Trim low-quality bases from 3’ end of read. Functionality is provided by cutadapt.
The sole parameter is cutoff, which defines the quality threshold below which bases will be trimmed. Analogous to quality with right=cutoff, except that terminal G nucleotides
are always treated as below cutoff quality. Default is cutoff: 25.
Note
Before version 0.9.1 there was no quality trimming of bases at all, which led to issues on some runs. Between versions 0.9.1 and 0.9.5, the default was set
to nextseq_quality with cutoff: 32, which is a common default for quality trimming, but relatively strict. In version 0.9.5 the default was changed to cutoff: 25,
which is in our experience a good compromise, because low quality bases may still be soft-clipped in the mapping stage. However, if you experience a drop in UMI counts between pre 0.9.1 and
current versions, you can try lowering the quality cutoff further (or even set it to 0) and rerun your samples, to restore pre 0.9.1 behavior.
clip
Clip bases from either end of read2. Two parameters are supported: left and right, which define how many bases should be clipped from the 5’ and 3’ end of read2,
respectively. Default is left: 0 and right: 0.
polyA
Trim polyA stretches from 3’ end of read. Functionality is provided by cutadapt.
The only supported parameter is revcomp, which if set to True will trim polyT stretches instead of polyA. Default is revcomp: False.
adapter
Trim adapters from either end of read. Functionality is provided by cutadapt. Paraneters are:
name: name of the adapter. Only for logging purposes.
seq: sequence of the adapter to be trimmed.
min_overlap: minimum overlap between read and adapter for a successful trimming. Default is3.
max_errors: maximum error rate allowed for a successful trimming. Default is0.1.
where: where to search for the adapter. Possible values are'left', and'right'. Default is'right'(3 prime end of cDNA).
Note
Internally, spacemake uses the cutadapt python module to perform all trimming operations. If where == 'left' we use cutadapt.adapters.NonInternalFrontAdapter,
for where == 'right' we use cutadapt.adapters.BackAdapter.
For more information about the parameters and their meaning, please refer to the cutadapt source code.
Each adapter-flavor in the config.yaml is a list of operations to be performed in the given order. If needed, you can chain multiple operations of the same type (for
example to remove multiple adapters).
Configure barcode-flavors
This sample-variable describes how the cell-barcode and the UMI should be extracted from Read1 and Read2.
The default value for barcode_flavor will be dropseq: cell = r1[0:12] (cell-barcode comes from first 12nt of Read1) and
UMI = r1[12:20] (UMI comes from the 13-20 nt of Read1).
If a sample has no barcode_flavor provided, the default barcode_flavor will be used
Barcode correction
As of version 0.9.3, spacemake performs spatial barcode correction with edit distance 1, which boosts counts by ~5-15% for many samples.
For performance reasons, this employs some heuristics:
- all N bases are replaced with A, in the reference (flowcell) catalog, as well as in the samples.
- a capture-area catalog of reference barcodes is built for each samples, based on exact match counts alone.
- exact matches to the capture-area catalog are searched first and preferred. Unmatched barcodes go on to a second stage of potential error correction.
- spacemake looks all edit distance 1 variants of an unmatched sample barcode in the capture-area catalog in a defined order.
The first match is reported and no further matches are considered. The order is as follows: (1) substitutions, (2) insertions, (3) deletions. This means that if a barcode has no exact matches, but multiple edit 1 matches, the correction will be deterministic, but is not guaranteed to be correct. In practice, however, the fraction of barcodes with multiple edit 1 matches is extremely low and dwarfed by other sources of experimental and technical noise.
Note
Barcode correction requires to configure --puck-barcode-files for your sample. Otherwise it will not be treated as a spatial sample and no capture-area catalog
can be built.
Note
If you have already run your samples with a previous version of spacemake and want to apply the new barcode correction, you can run
spacemake run estimate-correction-gains to get an estimate of the increase in UMI counts to expect for each sample. In our experience,
this is close to the actual increase, unless your ratio of reads to UMIs is already high, indicating saturation of the library, in which case the gains may be lower.
If you want to give it a try, just update spacemake and run again. The correction should be applied automatically.
Provided barcode-flavors
Note
Future versions of spacemake will merge barcode-flavors into adapter-flavors (which arguably become pre-processing flavors at that point)
by defining barcode as a pre-processing step with cell and UMI as parameters.
In the current implementation, barcode-flavors are kept separate for backwards compatibility. The new implementation will give additional
flexibity, for example to remove additional adapters/primers, or clip the read further, after barcode extraction. Currently, if barcode
is not in the list of pre-processing steps, it is taken to be implied as the last step and its parameters are loaded from the barcode-flavor.
Spacemake provides the following barcode-flavors out of the box:
default:
cell: "r1[0:12]"
UMI: "r1[12:20]"
openst:
cell: "r1[2:27]"
UMI: "r2[0:9]"
sc_10x_v2:
cell: "r1[0:16]"
UMI: "r1[16:26]"
seq_scope:
UMI: "r2[0:9]"
cell: "r1[0:20]"
slide_seq_14bc:
cell: "r1[0:14]"
UMI: "r1[14:23]"
slide_seq_15bc:
cell: "r1[0:14]"
UMI: "r1[15:23]"
visium:
cell: "r1[0:16]"
UMI: "r1[16:28]"
To list the currently available barcode-flavor-s, type:
spacemake config list_barcode-flavors
Warning
The command line interface for adding, updating, and deleting barcode-flavors will be deprecated in future versions of spacemake.
Please consider editing the config.yaml file directly to manage barcode-flavors.
Add a new barcode_flavor
spacemake config add_barcode-flavor \
--name NAME \
# name of the barcode flavor
--umi UMI \
# structure of UMI, using python's list syntax.
# Example: to set UMI to 13-20 NT of Read1, use --umi r1[12:20].
# It is also possible to use the first 8nt of Read2 as UMI: --umi r2[0:8].
--cell-barcode CELL-BARCODE
# structure of CELL BARCODE, using python's list syntax.
# Example: to set the cell-barcode to 1-12 nt of Read1, use --cell-barcode r1[0:12].
# It is also possible to reverse the CELL BARCODE, for instance with r1[0:12][::-1].
Update/delete a barcode-flavor
The spacemake config update-barcode-flavor takes the same arguments as above, while spacemake config delete-barcode-flavor takes only --name.
Configure run-modes
Specifying a “run mode” is an essential flexibity that spacemake offers.
Through setting a run-mode, a sample can be processed and analysed downstream in various fashions.
Each run-mode can have the following variables:
n_beadsnumber of cell-barcode expected
umi_cutoffa list of integers. downstream the analysis will be run using these UMI cutoffs, that is cell-barcodes with less UMIs will be discarded
clean_dgewhether to clean cell-barcodes from overhang primers, before creating the DGE.
detect_tissue(spatial only)if
True, apart from UMI cutoff spacemake will try to detect the tissue in-silico.polyA_adapter_trimmingif
True3’ polyA stretches and apaters will be trimmed from Read2.count_intronic_readsif
Trueintronic reads will be counted when creating the DGE.count_mm_readsif
Truemulti-mappers will be counted. Only those multi-mapping reads will be counted this way, which map to exactly one CDS or UTR segment of a gene.mesh_data(spatial only)if
Truea mesh will be created when running thisrun-mode.mesh_type(spatial only)spacemake currently offers two types of meshes: (1)
circle, where circles with a givenmesh_spot_diameter_umwill be placed in a hexagonal grid,mesh_spot_distance_umdistance apart; (2) a hexagonal grid, where equal hexagons withmesh_spot_diameter_umsides will be placed in a full mesh grid, such that the whole area is covered.mesh_spot_diameter_um(spatial only)the diameter of the mesh spatial-unit, in microns.
mesh_spot_distance_um(spatial only, only for circle mesh)distance between the meshed circles, in microns.
spatial_barcode_min_matches(spatial only)ratio spatial barcode matches, expressed as 0-1 interval, used as a minimum threshold to filter out pucks from DGE creation and subsequent steps of the pipeline. If set to 0, no pucks are excluded.
parent_run-modeEach
run-modecan have a parent, to which it will fall back. If a one of therun-modevariables is missing, the variable of the parent will be used. If parent is not provided, thedefaultrun-modewill be the parent.
Provided run-modes
default:
clean_dge: false
count_intronic_reads: true
count_mm_reads: false
detect_tissue: false
mesh_data: false
mesh_spot_diameter_um: 55
mesh_spot_distance_um: 100
mesh_type: circle
n_beads: 100000
polyA_adapter_trimming: true
spatial_barcode_min_matches: 0
umi_cutoff:
- 100
- 300
- 500
openst:
clean_dge: false
count_intronic_reads: true
count_mm_reads: true
detect_tissue: false
mesh_data: true
mesh_spot_diameter_um: 7
mesh_spot_distance_um: 7
mesh_type: hexagon
n_beads: 100000
polyA_adapter_trimming: true
spatial_barcode_min_matches: 0.1
umi_cutoff:
- 100
- 250
- 500
scRNA_seq:
count_intronic_reads: true
count_mm_reads: false
detect_tissue: false
n_beads: 10000
umi_cutoff:
- 500
seq_scope:
clean_dge: false
count_intronic_reads: false
count_mm_reads: false
detect_tissue: false
mesh_data: true
mesh_spot_diameter_um: 10
mesh_spot_distance_um: 15
mesh_type: hexagon
n_beads: 1000
umi_cutoff:
- 100
- 300
slide_seq:
clean_dge: false
detect_tissue: false
n_beads: 100000
umi_cutoff:
- 50
visium:
clean_dge: false
count_intronic_reads: false
count_mm_reads: true
detect_tissue: true
n_beads: 10000
umi_cutoff:
- 1000
Note
If a sample has no run-mode provided, the default will be used
Note
If a run-mode variable is not provided, the variable of the default run-mode will be used
To list the currently available run-mode-s, type:
spacemake config list_run-modes
Warning
The command line interface for adding, updating, and deleting run_modes will be deprecated in future versions of spacemake.
Please consider editing the config.yaml file directly to manage run-modes.
Add a new run_mode
See the variable descriptions above.
spacemake config add_run-mode \
--name NAME \
--parent_run_mode PARENT_RUN_MODE \
--umi_cutoff UMI_CUTOFF [UMI_CUTOFF ...] \
--n_beads N_BEADS \
--clean_dge {True,true,False,false} \
--detect_tissue {True,true,False,false} \
--polyA_adapter_trimming {True,true,False,false} \
--count_intronic_reads {True,true,False,false} \
--count_mm_reads {True,true,False,false} \
--mesh_data {True,true,False,false} \
--mesh_type {circle,hexagon} \
--mesh_spot_diameter_um MESH_SPOT_DIAMETER_UM \
--mesh_spot_distance_um MESH_SPOT_DISTANCE_UM
Update/delete a run-mode
The spacemake config update-run-mode takes the same arguments as above, while spacemake config delete-run-mode takes only --name.
Configure pucks
Each spatial sample is associated with a puck. The puck variable defines the
dimensionality of the underlying spatial structure, which spacemake uses
during the automated analysis and plotting, as well as the binning (meshing) of
the data when selected in the run-mode.
Each puck has the following variables:
width_um: the width of the puck, in micronsspot_diameter_um: the diameter of bead on this puck, in microns.barcodes(optional): the path to the barcode file, containing the cell_barcode and (x,y) position for each. This is handy when several pucks have the same barcodes, such as for 10x Visium.coordinate_system(optional): the path to the coordinate system file, containing puck IDs and the (x,y,z) position for each, in global coordinates. This coordinate system is analogous to the global coordinate system for image stitching. When specified, this ‘stitching’ is automatically performed onpuck-s with spatial information.
Provided pucks
default:
coordinate_system: ''
spot_diameter_um: 10
width_um: 3000
openst:
coordinate_system: puck_data/openst_coordinate_system.csv
spot_diameter_um: 0.6
width_um: 1200
seq_scope:
spot_diameter_um: 1
width_um: 1000
slide_seq:
spot_diameter_um: 10
width_um: 3000
visium:
barcodes: puck_data/visium_barcode_positions.csv
spot_diameter_um: 55
width_um: 6500
The visium puck comes with a barcodes variable, which points to
puck_data/visium_barcode_positions.csv. Similarly, the openst puck comes with
a coordinate_system variable, pointing to puck_data/openst_coordinate_system.csv.
Upon initiation, these files will automatically placed there by spacemake
To list the currently available puck-s, type:
spacemake config list_pucks
Warning
The command line interface for adding, updating, and deleting pucks will be deprecated in future versions of spacemake.
Please consider editing the config.yaml file directly to manage pucks.
Add a new puck
spacemake config add_puck \
--name NAME \ # name of the puck
--width_um WIDTH_UM \
--spot_diameter_um SPOT_DIAMETER_UM \
--barcodes BARCODES \ # path to the barcode file, optional
--coordinate_system COORDINATE_SYSTEM # path to the coordinate system file, optional
Custom snakemake rules
As of version 0.7 it is now possible to add custom snakemake rules to your spacemake workflow.
Simply add the following line to the config.yaml in your spacemake root folder:
custom_rules: /path/to/my_own_custom_snakefile.smk
Within your custom code, you can import spacemake modules and have access to internal variables. If you need to make spacemake aware of new top-level targets that have to be made, you can register a callback
register_module_output_hook(get_my_custom_targets, "my_own_custom_snakefile.smk")
The function get_my_custom_targets() will be called once all other, internal spacemake code has been executed
and is expected to return a list of files that will be appended to the input: dependencies of the top-level
rule. Providing rules to make these files is up to your custom rules.
The second parameter is more for logging purposes and allows to track which module or part of the code injected which dependencies. By default, it is good practive to use the filename.