Align gRNA and off-target

CHOPOFF.isinclusiveFunction

isinclusive(x::S, y::S) where {S<:BioSymbol}

Check whether x contains all options of y or y contains all options of x. Can be used as replacement in place of BioSequences.iscompatible or BioSequences.isequal.

Examples

julia> isinclusive(DNA_N, DNA_A)
true

julia> isinclusive(DNA_W, DNA_R) # W is (A/T), R is (A/G)
false

julia> isinclusive(DNA_W, DNA_A)
true
CHOPOFF.hammingFunction

hamming(s1::T, s2::K, ismatch = iscompatible) where {T <: BioSequence, K <: BioSequence}

Check Hamming distance between s1 and s2 using as matching definition ismatch. Should be faster than pairalign(HammingDistance(), s1, s2). Make sure inputs are of the same length.

Examples

julia> hamming(dna"ACGC", dna"AWRC")
1
CHOPOFF.levenshteinFunction
levenshtein(guide::T, ref::K, k::Int = 4,
    ismatch::Function = iscompatible) where {T <: BioSequence, K <: BioSequence}

Calculate Levenshtein distance bounded by k maximum edit distance with a twist for guide + off-target comparisons.

Levenshtein distance is the minimum number of operations (consisting of insertions, deletions, substitutions of a single character) required to change one string into the other. guide input sequence is a gRNA sequence, ref input is reference sequence with expansion on the 3' end of k bases. This extension will not count toward the score, if it is not covered with aligned guide. Return k + 1, if distance is higher than k and terminate early. This function should be 10x faster than pairalign(LevenshteinDistance(), guide, ref, distance_only = true).

Notice that guide and ref have to be oriented towards left side e.g. PAM-guide, and PAM-offtarget-extension! Take a look at the examples below to understand why.

Examples

julia> levenshtein(dna"ATGA", dna"AGACCT") # CCT as extension of the ref does not count towards the score in optimal alignment
1

julia> levenshtein(dna"ATGATCG", dna"AGAAATCGATG") # ATG does not count in optimal alignment ATG--ATCG/A-GAAATCG
3
CHOPOFF.AlnType
struct Aln
    guide::String
    ref::String
    dist::Int
end

Simple data structure to hold information on the optimal alignment. Therefore guide and ref may contain DNA_Gap as these are aligned sequences. Function align returns this object.

CHOPOFF.alignFunction
align(guide::T, ref::K, k::Int = 4,
    ismatch::Function = iscompatible) where {T <: BioSequence, K <: BioSequence}

Calculate Levenshtein distance bounded by k maximum edit distance with a twist for guide + off-target comparisons, but return also the alignment as an Align object.

Levenshtein distance is the minimum number of operations (consisting of insertions, deletions, substitutions of a single character) required to change one string into the other. guide input sequence is a gRNA sequence, ref input is reference sequence with expansion on the 3' end of k bases. This extension will not count toward the score, if it is not covered with aligned guide. Return k + 1, if distance is higher than k and terminate early.

Notice that guide and ref have to be oriented towards left side e.g. PAM-guide, and PAM-offtarget-extension! Take a look at the examples below to understand why.

Examples

julia> align(dna"ATGA", dna"AGACCT") # CCT as extension of the ref does not count towards the score in optimal alignment
Aln("ATGA", "A-GA", 1)

julia> align(dna"ATGATCG", dna"AGAAATCGATG") # ATG does not count
Aln("ATG--ATCG", "A-GAAATCG", 3)