Align gRNA and off-target
CHOPOFF.isinclusive
— Functionisinclusive(x::S, y::S) where {S<:BioSymbol}
Check whether x
contains all options of y
or y
contains all options of x
. Can be used as replacement in place of BioSequences.iscompatible
or BioSequences.isequal
.
Examples
julia> isinclusive(DNA_N, DNA_A)
true
julia> isinclusive(DNA_W, DNA_R) # W is (A/T), R is (A/G)
false
julia> isinclusive(DNA_W, DNA_A)
true
CHOPOFF.hamming
— Functionhamming(s1::T, s2::K, ismatch = iscompatible) where {T <: BioSequence, K <: BioSequence}
Check Hamming distance between s1
and s2
using as matching definition ismatch
. Should be faster than pairalign(HammingDistance(), s1, s2)
. Make sure inputs are of the same length.
Examples
julia> hamming(dna"ACGC", dna"AWRC")
1
CHOPOFF.levenshtein
— Functionlevenshtein(guide::T, ref::K, k::Int = 4,
ismatch::Function = iscompatible) where {T <: BioSequence, K <: BioSequence}
Calculate Levenshtein distance bounded by k
maximum edit distance with a twist for guide + off-target comparisons.
Levenshtein distance is the minimum number of operations (consisting of insertions, deletions, substitutions of a single character) required to change one string into the other. guide
input sequence is a gRNA sequence, ref
input is reference sequence with expansion on the 3' end of k
bases. This extension will not count toward the score, if it is not covered with aligned guide. Return k + 1, if distance is higher than k and terminate early. This function should be 10x faster than pairalign(LevenshteinDistance(), guide, ref, distance_only = true)
.
Notice that guide
and ref
have to be oriented towards left side e.g. PAM-guide, and PAM-offtarget-extension! Take a look at the examples below to understand why.
Examples
julia> levenshtein(dna"ATGA", dna"AGACCT") # CCT as extension of the ref does not count towards the score in optimal alignment
1
julia> levenshtein(dna"ATGATCG", dna"AGAAATCGATG") # ATG does not count in optimal alignment ATG--ATCG/A-GAAATCG
3
CHOPOFF.Aln
— Typestruct Aln
guide::String
ref::String
dist::Int
end
Simple data structure to hold information on the optimal alignment. Therefore guide
and ref
may contain DNA_Gap
as these are aligned sequences. Function align
returns this object.
CHOPOFF.align
— Functionalign(guide::T, ref::K, k::Int = 4,
ismatch::Function = iscompatible) where {T <: BioSequence, K <: BioSequence}
Calculate Levenshtein distance bounded by k
maximum edit distance with a twist for guide + off-target comparisons, but return also the alignment as an Align
object.
Levenshtein distance is the minimum number of operations (consisting of insertions, deletions, substitutions of a single character) required to change one string into the other. guide
input sequence is a gRNA sequence, ref
input is reference sequence with expansion on the 3' end of k
bases. This extension will not count toward the score, if it is not covered with aligned guide. Return k + 1, if distance is higher than k and terminate early.
Notice that guide
and ref
have to be oriented towards left side e.g. PAM-guide, and PAM-offtarget-extension! Take a look at the examples below to understand why.
Examples
julia> align(dna"ATGA", dna"AGACCT") # CCT as extension of the ref does not count towards the score in optimal alignment
Aln("ATGA", "A-GA", 1)
julia> align(dna"ATGATCG", dna"AGAAATCGATG") # ATG does not count
Aln("ATG--ATCG", "A-GAAATCG", 3)