CHOPOFF.jl

About

Uncompromising finding of CRISPR off-targets:

many fast alignment algorithms optimized specifically for CRISPR
search for larger distances allowing for mismatches and bulges
support for ambiguous bases
arbitrarily large genomes
VCF support - with multiple overlapping SNPs
near-instant alignment-free off-target filtering
pruning of off-targets by their location (remove overlapping, competing off-targets)
extensively tested
full framework that can be extended for your own algorithms with ease

Requirements

Some algorithms generate as many files as there are prefixes (e.g. for prefix 7 - this will make 4^7 - 16384 files). This strategy allows us to operate the searches independently on multiple cores and not get throttled when querying large number of the guides. However, some systems have artificial limits on the number of open files, for example in Ubuntu 'ulimit -n' will show the limit. Increase the limits, if it creates problems for you.
When using many cores for building the indexes - you have to have around ~1 GB of RAM per thread.

It is possible to build CHOPOFF into standalone application - which includes all dependencies and Julia into one compiled software. This is recommended method for using of CHOPOFF when you are not a developer. If you know how to code in Julia, you might make use of the whole framework using CHOPOFF as a package.

To build a standalone application run ./build_standalone.sh script from the main directory. Script will produce binary in a "build" folder. Then you can run from inside that folder ./bin/CHOPOFF --help. To learn about building a database run ./bin/CHOPOFF build --help and to use existing database check out ./bin/CHOPOFF search --help. It is possible to skip testing + precompile step to speed up the build process with ./build_standalone.sh --noprecompile.

You can alternatively download the latest release from the releases' page on the GitHub.

When using application as self-contained compiled software, you can control number of cores by setting JULIA_NUM_THREADS environment variable.

Example commands for using standalone

Building of prefixHashDB database for standard Cas9 --motif with support for up to levenshtein distance 3 --distance for an example genome using 10 threads.

export JULIA_NUM_THREADS=10  
EXAMPLE_GENOME="./test/sample_data/genome/semirandom.fa"
CHOPOFF build --name Cas9_hg38 --genome "$EXAMPLE_GENOME" -o out_dir/phDB_16_3/ --distance 3 --motif Cas9 prefixHashDB

Searching of above database for all off-targets for guides listed in --guides up to the 2 levenshtein distance --distance using 15 threads, writing the results into --output file. Because --early_stopping argument is not supplied below, by default prefixHashDB will search for up to 1e6 off-targets per guide per distance. Pay attention that default guides for the Cas9, are 20bp long, as can be inspected in the example file.

export JULIA_NUM_THREADS=15  
EXAMPLE_GUIDES="./test/sample_data/guides.txt"
CHOPOFF search --database phDB_16_3/ --guides "$EXAMPLE_GUIDES" --output out_dir/phDB_16_2.csv --distance 2 prefixHashDB

No-build application

Run CHOPOFF package as an application, without building first. From the directory of the package run:

julia --threads 4 --project="." ./src/CHOPOFF.jl --help

Quick Use

For search of off-targets you have a couple of options:

prefixHashDB - the fastest, we apply hashes to symbolic alignments for fast filtering of OTs
linearDB - most rigorously tested
motifDB - on top of linearDB we apply pigeonhole principle like filter which you can adjust
treeDB - will work best for longer gRNAs, uses vantage point for filteirng
fmiDB - the smallest file size, very fast, but only for distances ≤ 2
binaryFuseFilterDB - uses hashing on top of FM-index

If you would like to filter or rank gRNAs to only those that are most likely off-target free you want to use hashDB or slower and much larger, but less probabilistic dictDB.

For VCF file support use vcfDB. For use of the framework as a Julia package consult the documentation - Public Interface section.

Support

You can buy me a coffee to show some love and appreciation!

LICENSE

License for non-commercial applications is aGPL-3.0. For commercial applications you should acquire permission or licensing contract.

https://tldrlegal.com/license/gnu-affero-general-public-license-v3-(agpl-3.0)

This program is free software for non-commercial applications: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program inside LICENSE file. If not, see https://www.gnu.org/licenses/.