Configuration¶
Once installed, spacemake configured before running.
After you have installed spacemake as specified here, you are ready to process and analyze spatial samples.
To initialize spacemake cd
into the directory in which you want to start spacemake. This directory will be your project_root
.
Then simply type:
spacemake init \
--dropseq_tools <path_to_dropseq_tools_dir>
Here the path_to_dropseq_tools_dir should point to the directory of the downloaded Dropseq-tools package, downloaded in Step 2 of the installation.
Optionally, you can also provide the --download_species
flag, which will download Gencode genomes and
annotations for mouse
and human
, and place them under project\_root/species\_data/<species>
,
where <species> is either mouse or human.
One of the most important parts of spacemake are the so-called ‘shared sample-variables’. These are reusable, user-definable variables, which we can assign to several samples.
They can be shortly defined as follows:
- species
- a collection of genome, annotation and rRNA_genome. There is no default species, and each sample can have exactly one species.
- barcode_flavor
- the variable which specifies the structure of Read1 and Read2, namely how the cell_barcode and UMI should be extracted. If no value provided for a sample, the default will be used.
- run_mode
- each sample can have several
run_mode
-s, all of which are user definable. If norun_mode
-s are specified, a sample will be processed usingdefault
run_mode
settings. - puck (for spatial samples only)
- if a sample is spatial, it has to have a puck variable. If no puck is specified, a default puck will be used.
To add, update, delete or list a shared sample-variable, you can use the following commands:
spacemake config add_<shared-sample-variable>
spacemake config update_<shared-sample-variable>
spacemake config delete_<shared-sample-variable>
spacemake config list_<shared-sample-variable>
where <shared-sample-variable>
is one of species, barcode_flavor, run_mode or puck
Configure species¶
To add species, the following command can be used:
spacemake config add_species \
--name NAME \ # name of the species to be added
--genome GENOME \ # path to the genome (.fa) file for the species to
# be added
--annotation ANNOTATION \
# path to the annotation (.gtf) file for the species
# to be added
--rRNA_genome RRNA_GENOME
# (optional) path to the ribosomal-RNA genome (.fa)
# file for the species to be added
The spacemake config update_species
takes the same arguments as above, while spacemake config delete_species
takes only --name
.
To list the currently available species
, type:
spacemake config list_species
Configure barcode_flavors¶
This sample-variable describes how the cell-barcode and the UMI should be extracted from Read1 and Read2.
The default
value for barcode_flavor will be dropseq: cell_barcode = r1[0:12]
(cell-barcode comes from first 12nt of Read1) and
UMI = r1[12:20]
(UMI comes from the 13-20 nt of Read1).
If a sample has no barcode_flavor provided, the default run_mode will be used
Provided barcode_flavors¶
Spacemake provides the following barcode_flavors out of the box:
default:
cell: "r1[0:12]"
UMI: "r1[12:20]"
slide_seq_14bc:
cell: "r1[0:14]"
UMI: "r1[14:23]"
slide_seq_15bc:
cell: "r1[0:14]"
UMI: "r1[15:23]"
visium:
cell: "r1[0:16]"
UMI: "r1[16:28]"
sc_10x_v2:
cell: "r1[0:16]"
UMI: "r1[16:26]"
seq_scope:
UMI: "r2[0:9]"
cell: "r1[0:20]"
To list the currently available barcode_flavor
-s, type:
spacemake config list_barcode_flavors
Add a new barcode_flavor¶
spacemake config add_barcode_flavor \
--name NAME \
# name of the barcode flavor
--umi UMI \
# structure of UMI, using python's list syntax.
# Example: to set UMI to 13-20 NT of Read1, use --umi r1[12:20].
# It is also possible to use the first 8nt of Read2 as UMI: --umi r2[0:8].
--cell_barcode CELL_BARCODE
# structure of CELL BARCODE, using python's list syntax.
# Example: to set the cell_barcode to 1-12 nt of Read1, use --cell_barcode r1[0:12].
# It is also possible to reverse the CELL BARCODE, for instance with r1[0:12][::-1].
Update/delete a barcode_flavor¶
The spacemake config update_barcode_flavor
takes the same arguments as above, while spacemake config delete_barcode_flavor
takes only --name
.
Configure run_modes¶
Specifying a “run mode” is an essential flexibity that spacemake offers.
Through setting a run_mode
, a sample can be processed and analysed downstream in various fashions.
Each run_mode
can have the following variables:
n_beads
- number of cell-barcode expected
umi_cutoff
- a list of integers. downstream the analysis will be run using these UMI cutoffs, that is cell-barcodes with less UMIs will be discarded
clean_dge
- whether to clean cell-barcodes from overhang primers, before creating the DGE.
detect_tissue
(spatial only)- if
True
, apart from UMI cutoff spacemake will try to detect the tissue in-silico. polyA_adapter_trimming
- if
True
3’ polyA stretches and apaters will be trimmed from Read2. count_intronic_reads
- if
True
intronic reads will be counted when creating the DGE. count_mm_reads
- if
True
multi-mappers will be counted. Only those multi-mapping reads will be counted this way, which map to exactly one CDS or UTR segment of a gene. mesh_data
(spatial only)- if
True
a mesh will be created when running thisrun_mode
. mesh_type
(spatial only)- spacemake currently offers two types of meshes: (1)
circle
, where circles with a givenmesh_spot_diameter_um
will be placed in a hexagonal grid,mesh_spot_distance_um
distance apart; (2) a hexagonal grid, where equal hexagons withmesh_spot_diameter_um
sides will be placed in a full mesh grid, such that the whole area is covered. mesh_spot_diameter_um
(spatial only)- the diameter of the mesh spatial-unit, in microns.
mesh_spot_distance_um
(spatial only, only for circle mesh)- distance between the meshed circles, in microns.
parent_run_mode
- Each
run_mode
can have a parent, to which it will fall back. If a one of therun_mode
variables is missing, the variable of the parent will be used. If parent is not provided, thedefault
run_mode
will be the parent.
Provided run_mode(s)¶
default:
n_beads: 100000
umi_cutoff: [100, 300, 500]
clean_dge: False
detect_tissue: False
polyA_adapter_trimming: True
count_intronic_reads: True
count_mm_reads: False
mesh_data: False
mesh_type: 'circle'
mesh_spot_diameter_um: 55
mesh_spot_distance_um: 100
visium:
n_beads: 10000
umi_cutoff: [1000]
clean_dge: False
detect_tissue: True
polyA_adapter_trimming: False
count_intronic_reads: False
count_mm_reads: True
slide_seq:
n_beads: 100000
umi_cutoff: [50]
clean_dge: False
detect_tissue: False
scRNA_seq:
n_beads: 10000
umi_cutoff: [500]
detect_tissue: False
polyA_adapter_trimming: True
count_intronic_reads: True
count_mm_reads: False
seq_scope:
clean_dge: false
count_intronic_reads: false
count_mm_reads: false
detect_tissue: false
mesh_data: true
mesh_spot_diameter_um: 10
mesh_spot_distance_um: 15
mesh_type: hexagon
n_beads: 1000
umi_cutoff:
- 100
- 300
Note
If a sample has no run_mode
provided, the default
will be used
Note
If a run_mode
variable is not provided, the variable of the default run_mode
will be used
To list the currently available run_mode
-s, type:
spacemake config list_run_modes
Add a new run_mode¶
See the variable descriptions above.
spacemake config add_run_mode \
--name NAME \
--parent_run_mode PARENT_RUN_MODE \
--umi_cutoff UMI_CUTOFF [UMI_CUTOFF ...] \
--n_beads N_BEADS \
--clean_dge {True,true,False,false} \
--detect_tissue {True,true,False,false} \
--polyA_adapter_trimming {True,true,False,false} \
--count_intronic_reads {True,true,False,false} \
--count_mm_reads {True,true,False,false} \
--mesh_data {True,true,False,false} \
--mesh_type {circle,hexagon} \
--mesh_spot_diameter_um MESH_SPOT_DIAMETER_UM \
--mesh_spot_distance_um MESH_SPOT_DISTANCE_UM
Update/delete a run_mode¶
The spacemake config update_run_mode
takes the same arguments as above, while spacemake config delete_run_mode
takes only --name
.
Configure pucks¶
Each spatial sample, needs to have a puck
. The puck
sample-variable will define the
dimensionality of the underlying spatial structure, which then spacemake will use
during the autmated analysis and plotting.
Each puck has the following variables:
width_um
: the width of the puck, in micronsspot_diameter_um
: the diameter of bead on this puck, in microns.barcodes
(optional): the path to the barcode file, containing the cell_barcode and (x,y) position for each. This is handy, when several pucks have the same barcodes, such as for 10x visium.
Provided pucks¶
default:
width_um: 3000
spot_diameter_um: 10
visium:
barcodes: 'puck_data/visium_barcode_positions.csv'
width_um: 6500
spot_diameter_um: 55
seq_scope:
width_um: 1000
spot_diameter_um: 1
slide_seq:
width_um: 3000
spot_diameter_um: 10
as you can see, the visium
puck comes with a barcodes
variable, which points to
puck_data/visium_barcode_positions.csv
. Upon initiation, this file will automatically placed
there by spacemake
To list the currently available puck
-s, type:
spacemake config list_pucks
Add a new puck¶
spacemake config add_puck \
--name NAME \ # name of the puck
--width_um WIDTH_UM \
--spot_diameter_um SPOT_DIAMETER_UM \
--barcodes BARCODES # path to the barcode file, optional