Manage projects and samples¶
In spacemake each sample, and it’s settings, are stored in the project_df.csv
under the root
directory of the spacemake project.
Each sample will have exactly one row in this project_df.csv
file. In the back-end, spacemake uses a pandas.DataFrame
to load, and save this .csv
file on disk. This data-frame
will be indexed by key (project_id, sample_id)
The spacemake class responsible for this back-end logic is the ProjectDF class.
Add a single sample¶
Sample parameters¶
In spacemake each sample can have the folloing variables:
project_id
project_id
of a samplesample_id
sample_id
of a sampleR1
.fastq.gz
file path(s) to Read1 read file(s). Can be either a single file, or a space separated list of consecutive files. If a list provided, the files will be merged together and the mergedR1.fastq.gz
will be processed downstream.R2
- same as before, but for Read2 read file(s).
longreads
(optional)- fastq(.gz)|fq(.gz)|bam file path to pacbio long reads for library debugging
longread-signature
(optional)- identify the expected longread signature (see longread.yaml)
dge
(optional)- Since the
0.1
version of spacemake, it is possible to only provide the count matrix as input data for spacemake. Note: a raw count matrix is expected, if a non count matrix is provided, spacemake will raise an error. barcode_flavor
(optional)barcode_flavor
of the sample. If not provided,default
will be used (Drop-seq).species
species
of the samplepuck
(optional)- name of the
puck
for this sample. if puck contains abarcodes
variable, with a path to a coordinate file, those coordinates will be used when processing this sample. If not provided, adefault
puck will be used withwidth_um=3000
,spot_diameter_um=10
. puck_id
(optional)puck_id
of a samplepuck_barcode_file
(optional)- the path to the file contining (x,y) positions of the barcodes. If the
puck
for this sample has abarcodes
variable, it will be ignored, andpuck_barcode_file
will be used. investigator
(optional)- name investigator(s) responsible for this sample
experiment
(optional)- description of the experiment
sequencing_date
(optional)- sequencing date of the sample
run_mode
(optional)- A list of
run_mode
names for this sample. The sample will be processed as defined in therun_mode
-s provided. If not provided, thedefault
run_mode
will be used.
To add a single sample, we can use the following command:
spacemake projects add_sample \
--project_id PROJECT_ID \ # required
--sample_id SAMPLE_ID \ # required
--R1 R1 [R1 R1 ...] \ # required, if no longreads
--R2 R2 [R2 R2 ...] \ # required, if no longreads
--longreads LONGREADS \ # required, if no R1 & R2
--longread-signature LONGREAD_SIGNATURE \ # optional
--barcode_flavor BARCODE_FLAVOR \ # optional
--species SPECIES \ # required
--puck PUCK \ # optional
--puck_id PUCK_ID \ # optional
--puck_barcode_file PUCK_BARCODE_FILE \ # optional
--investigator INVESTIGATOR \ # optional
--experiment EXPERIMENT \ # optional
--sequencing_date SEQUENCING_DATE \ # optional
--run_mode RUN_MODE [RUN_MODE ...] \ # optional
Warning
A sample is spatial only if: either a puck_barcode_file
is provided, or the sample’s
puck
has a barcodes
variable pointing to a barcode position file.
If this is not the case, spacemake won’t be able to find the spatial barcodes for
this sample, and the sampe will be processed as a single-cell sample.
In case both the puck_barcode_file
is provided and the sample’s puck
has the
barcodes
variable set, puck_barcode_file
will be used for the spatial coordinates.
Add a Visium/Seq-scope/Slide-seq sample¶
Currently spacemake works out of the box with three spatial methods: Visium, Seq-scope and Slide-seq.
- To add a Visium sample, follow the quick start guide here.
- To add a Seq-scope sample, follow the quick start guide here.
- To add a Slide-seq sample, follow the quick start guide here.
Add a custom spatial sample¶
In order to process a custom spatial sample with spacemake follow the step by step guide below.
Step 1: specifying a puck¶
Each spatial sample will need a so-called puck to be configured first. By ‘puck’ we mean the physical properties of the underlying methods. Visium for instance works with 6.5mm by 6.5mm sized capture areas, where each spot has 55 microns diameter. To configure a custom puck follow the guide here.
Warning
If a puck is not specified, spacemake will still run but will use the default
puck as specified here.
Step 2: formatting a custom puck_barcode_file¶
For all spatial samples we need to provide a puck_barcode_file
. This file needs to be a comma or tab separated, and it needs to have the following three (named) columns:
cell_bc
,barcodes
orbarcode
for cell-barcodexcoord
orx_pos
for x-positionsycoord
ory_pos
for y-positions
Step 3: configure run_mode(s), barcode_flavor and species¶
Before a custom sample is added the run_mode(s), barcode_flavor and species should be configured. The guides on how to do this can be found here for run-modes, here for and here for species.
The configured run_mode(s) will specify how a sample is processed downstream, and the barcode_flavor will specify the barcoding strategy used (ie how many nucleotides are used for UMI, which nucleotides are used for the spot barcodes).
Step 4: add your sample¶
Once everything is configured you can add your custom spatial sample with the following command:
spacemake projects add_sample \
# your sample's project_id \
--project_id PROJECT_ID \
# your sample's sample_id \
--sample_id SAMPLE_ID \
# one or more R1.fastq.gz files
--R1 R1 [R1 R1 ...] \
# one or more R2.fastq.gz files
--R2 R2 [R2 R2 ...] \
# name of the barcode\_flavor, configured in Step 3 \
--barcode_flavor BARCODE_FLAVOR \
# name of the species, configured in Step 3 \
--species SPECIES \
# name of the puck, configured in Step 1 \
--puck PUCK \
# path to your custom barcode file, configured in Step 2 \
--puck_barcode_file PUCK_BARCODE_FILE \
# name of the run\_mode(s), configured in Step 3 \
--run_mode RUN_MODE [RUN_MODE ...]
Add a single-cell sample¶
To add a single-cell sample follow the quick start guide here.
Add a pre-processed count-matrix¶
Coming soon!
Add several samples at once¶
It is possible to add several samples in just one command. First, the sample variables have
to be defined in a samples.yaml
file, then we can run the following command:
spacemake projects add_samples_from_yaml --samples_yaml samples.yaml
The samples.yaml
should have the following structure:
additional_projects:
- project_id: visium
sample_id: visium_1
R1: <path_to_visium_1_R1.fastq.gz>
R2: <path_to_visium_1_R2.fastq.gz>
species: mouse
puck: visium
barcode_flavor: visium
run_mode: [visium]
- project_id: visium
sample_id: visium_2
R1: <path_to_visium_2_R1.fastq.gz>
R2: <path_to_visium_2_R2.fastq.gz>
species: human
puck: visium
barcode_flavor: visium
run_mode: [visium]
- project_id: slideseq
sample_id: slideseq_1
R1: <path_to_slideseq_1_R1.fastq.gz>
R2: <path_to_slideseq_1_R2.fastq.gz>
species: mouse
puck: slideseq
barcode_flavor: slideseq_14bc
run_mode: [default, slideseq]
puck_barcode_file: <path_to_slideseq_puck_barcode_file>
Under additional_projects
we define a list where each element will be a key:value pair, to be inserted in the project_df.csv
Note
When using the above command, if a sample is already present in the project_df.csv
rather than adding it again, spacemake will update it.
If someone runs spacemake projects add_samples_from_yaml --samples yaml samples.yaml
and
then modifies something in the samples.yaml
, and runs the command again, the project_df.csv
will contain the updated version of the settings.
Add samples from illumina sample-sheet¶
Coming soon…
Listing projects¶
To list projects, which are already configured and added, simply type:
spacemake projects list
It will show the main variables for each project in the project_df.csv
.
To view extra variables which are not shown, use the --variables
option
to specify which extra variables to show.