Align gRNA and off-target
CHOPOFF.isinclusive — Functionisinclusive(x::S, y::S) where {S<:BioSymbol}
Check whether x contains all options of y or y contains all options of x. Can be used as replacement in place of BioSequences.iscompatible or BioSequences.isequal.
Examples
julia> isinclusive(DNA_N, DNA_A)
true
julia> isinclusive(DNA_W, DNA_R) # W is (A/T), R is (A/G)
false
julia> isinclusive(DNA_W, DNA_A)
trueCHOPOFF.hamming — Functionhamming(s1::T, s2::K, ismatch = iscompatible) where {T <: BioSequence, K <: BioSequence}
Check Hamming distance between s1 and s2 using as matching definition ismatch. Should be faster than pairalign(HammingDistance(), s1, s2). Make sure inputs are of the same length.
Examples
julia> hamming(dna"ACGC", dna"AWRC")
1CHOPOFF.levenshtein — Functionlevenshtein(guide::T, ref::K, k::Int = 4,
ismatch::Function = iscompatible) where {T <: BioSequence, K <: BioSequence}Calculate Levenshtein distance bounded by k maximum edit distance with a twist for guide + off-target comparisons.
Levenshtein distance is the minimum number of operations (consisting of insertions, deletions, substitutions of a single character) required to change one string into the other. guide input sequence is a gRNA sequence, ref input is reference sequence with expansion on the 3' end of k bases. This extension will not count toward the score, if it is not covered with aligned guide. Return k + 1, if distance is higher than k and terminate early. This function should be 10x faster than pairalign(LevenshteinDistance(), guide, ref, distance_only = true).
Notice that guide and ref have to be oriented towards left side e.g. PAM-guide, and PAM-offtarget-extension! Take a look at the examples below to understand why.
Examples
julia> levenshtein(dna"ATGA", dna"AGACCT") # CCT as extension of the ref does not count towards the score in optimal alignment
1
julia> levenshtein(dna"ATGATCG", dna"AGAAATCGATG") # ATG does not count in optimal alignment ATG--ATCG/A-GAAATCG
3CHOPOFF.Aln — Typestruct Aln
guide::String
ref::String
dist::Int
endSimple data structure to hold information on the optimal alignment. Therefore guide and ref may contain DNA_Gap as these are aligned sequences. Function align returns this object.
CHOPOFF.align — Functionalign(guide::T, ref::K, k::Int = 4,
ismatch::Function = iscompatible) where {T <: BioSequence, K <: BioSequence}Calculate Levenshtein distance bounded by k maximum edit distance with a twist for guide + off-target comparisons, but return also the alignment as an Align object.
Levenshtein distance is the minimum number of operations (consisting of insertions, deletions, substitutions of a single character) required to change one string into the other. guide input sequence is a gRNA sequence, ref input is reference sequence with expansion on the 3' end of k bases. This extension will not count toward the score, if it is not covered with aligned guide. Return k + 1, if distance is higher than k and terminate early.
Notice that guide and ref have to be oriented towards left side e.g. PAM-guide, and PAM-offtarget-extension! Take a look at the examples below to understand why.
Examples
julia> align(dna"ATGA", dna"AGACCT") # CCT as extension of the ref does not count towards the score in optimal alignment
Aln("ATGA", "A-GA", 1)
julia> align(dna"ATGATCG", dna"AGAAATCGATG") # ATG does not count
Aln("ATG--ATCG", "A-GAAATCG", 3)