Utils

gRNAs and kmers

CHOPOFF.getseqFunction

getseq(n = 20, letters = ['A', 'C', 'G', 'T'])

Randomize sequence of length n from letters.

CHOPOFF.as_kmersFunction

as_kmers(x::LongDNA{4}, kmer_size::Int)

Transforms x into vector of kmers of size kmer_size. All ambiguous bases will be expanded.

Examples

julia> as_kmers(dna"ACTGG", 4)
2-element Vector{LongSequence{DNAAlphabet{4}}}:
 ACTG
 CTGG
CHOPOFF.as_skipkmersFunction

as_skipkmers(x::LongDNA{4}, kmer_size::Int)

Transforms x into vector of skip-kmers of size kmer_size. All ambiguous bases will be expanded. Leftover-bases are ignored!

Examples

julia> as_skipkmers(dna"ACTGG", 2)
2-element Vector{LongSequence{DNAAlphabet{4}}}:
 AC
 TG
CHOPOFF.all_kmersFunction

all_kmers(size = 4; alphabet = [DNA_A, DNA_C, DNA_G, DNA_T]

Make a list of all possible kmers with givensize using bases in the alphabet.

Examples

julia> all_kmers(2; alphabet = [DNA_A, DNA_N])
4-element Vector{LongSequence{DNAAlphabet{4}}}:
 AA
 AN
 NA
 NN
CHOPOFF.minkmersizeFunction

minkmersize(len::Int = 20, d::Int = 4)

Pigeon hole principle: minimum k-mer size that is required for two strings of size len to be aligned within distance of d.

Examples

julia> minkmersize(20, 3)
5

julia> minkmersize(20, 6)
2

Persistence

CHOPOFF.saveFunction

save(object::Any, destination::String)

Uses julia serializer to save the data to binary format. Read more about serialization. Notice that:

  1. This function will overwrite destination!
  2. This serialization is dependent on julia build! This means files can fail to work when reloaded across different julia builds.
CHOPOFF.loadFunction

load(destination::String)

Load file saved with save function. This may not load properly files saved in other julia builds.

Summarize off-targets

CHOPOFF.summarize_offtargetsFunction
summarize_offtargets(res::DataFrame; distance::Int = maximum(res.distance))

Summarize all off-targets into count table from the detail file. This does not automatically filters overlaps. You can specify distance to filter out some of the higher distances.

Arguments

res - DataFrame created by one of the off-target finding methods, it contains columns such as :guide, :chromosome, :strand, :distance, :start.

distance - What is the maximum distance to assume in the data frame, its possible to specify smaller distance than contained in the res DataFrame and autofilter lower distances.

Examples

using CHOPOFF, BioSequences

# make a temporary directory
tdir = tempname()
db_path = joinpath(tdir, "linearDB")
mkpath(db_path)

# use CHOPOFF example genome
chopoff_path = splitpath(dirname(pathof(CHOPOFF)))[1:end-1]
genome = joinpath(vcat(chopoff_path, 
    "test", "sample_data", "genome", "semirandom.fa"))

# build a linearDB
build_linearDB(
    "samirandom", genome, 
    Motif("Cas9"), 
    db_path)

# load up example gRNAs
guides_s = Set(readlines(joinpath(vcat(chopoff_path, 
    "test", "sample_data", "guides.txt"))))
guides = LongDNA{4}.(guides_s)
    
# finally, make results!
res_path = joinpath(tdir, "linearDB", "results.csv")
search_linearDB(db_path, guides, res_path; distance = 3)

# load results
using DataFrames, CSV
res = DataFrame(CSV.File(res_path))

# filter results by close proximity
res = filter_overlapping(res, 23)

# summarize results into a table of counts by distance
summary = summarize_offtargets(res; distance = 3)

Proximity filter

CHOPOFF.filter_overlappingFunction
filter_overlapping(res::DataFrame, distance::Int)

Filter overlapping off-targets. Remember that off-targets have their start relative to the PAM location.

Arguments

res - DataFrame created by one of the off-target finding methods, it contains columns such as :guide, :chromosome, :strand, :distance, :start.

distance - To what distance from the :start do we consider the off-target to be overlapping?

Examples

using CHOPOFF, BioSequences

# make a temporary directory
tdir = tempname()
db_path = joinpath(tdir, "linearDB")
mkpath(db_path)

# use CHOPOFF example genome
chopoff_path = splitpath(dirname(pathof(CHOPOFF)))[1:end-1]
genome = joinpath(vcat(chopoff_path, 
    "test", "sample_data", "genome", "semirandom.fa"))

# build a linearDB
build_linearDB(
    "samirandom", genome, 
    Motif("Cas9"), 
    db_path)

# load up example gRNAs
guides_s = Set(readlines(joinpath(vcat(chopoff_path, 
    "test", "sample_data", "guides.txt"))))
guides = LongDNA{4}.(guides_s)
    
# finally, make results!
res_path = joinpath(tdir, "linearDB", "results.csv")
search_linearDB(db_path, guides, res_path; distance = 3)

# load results
using DataFrames, CSV
res = DataFrame(CSV.File(res_path))

# filter results by close proximity
res = filter_overlapping(res, 23)

# summarize results into a table of counts by distance
summary = summarize_offtargets(res; distance = 3)