The detection of somatic single nucleotide variants (SNVs) is critical in research and clinical applications. This process typically requires a matched normal reference for reliable results. Unfortunately there are many cases where such normal references are unavailable. S22S is a random forest classifier that acts to identify somatic SNVs in a reference-free context.


1. Download the package and User Manual and follow instructions for installation in the User Manual
2. Download data files and unzip to a directory on your machine
3. Set the "snv_freq_data_dir" parameter of the config file to the location of the above folder
4. Look at the examples/ folder in the package for an example of the input config file (User Manual contains descriptions on which fields are mandatory)
5. Various SNP/INDEL reference files for calculating distance to nearest known SNP/INDEL can be used.
We recommend the INDEL reference from the GATK bundle resource.
Download at: https://software.broadinstitute.org/gatk/download/bundle
We recommend using dbsnp for the SNP reference.
Download at: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/
Remember to split the SNP reference file by chromosome and gzipped and indexed using tabix.