Find potential off-targets

CHOPOFF.DBInfoType

DBInfo(filepath::String, name::String, motif::Motif; vcf_filepath::String = "")

Motif defines what genome file is being used for the searches.

Arguments

filepath - Path to the genome file, if file is fasta (ends with .fa or .fasta or .fna) make sure you also have fasta index file with extension .fai. Alternatively, you can use .2bit genome file.

name - Your name for this instance of DBInfo: the genome with connection to the motif and vcf file.

motif - Motif object defining search parameters

vcf_filepath - Optional. Path to the VCF file to include in the searches.

Alignments will be performed from opposite to the extension direction (which is defined by extend5).

Examples

# use CHOPOFF example genome
genome = joinpath(vcat(splitpath(dirname(pathof(CHOPOFF)))[1:end-1], 
    "test", "sample_data", "genome", "semirandom.fa"))
# construct example DBInfo
DBInfo(genome, "Cas9_semirandom_noVCF", Motif("Cas9"))
CHOPOFF.gatherofftargets!Function
function gatherofftargets!(
    output::T,
    dbi::DBInfo;
    remove_pam::Bool = true,
    normalize::Bool = true,
    restrict_to_len::Union{Nothing, Int64} = nothing) where {T<:Union{Vector{String}, Vector{UInt64}, Vector{UInt128}}}

Gathers all off-targets that conform to the given dbi Motif.

This function appends to the output during the run, however it will also return all ambiguous guides in return object. We can use UInt64 and UInt128 to compress space that the gRNAs use. When using large genomes or non-specific PAMs you might run out of memory when using this function.

removepam - whether PAM sequence should be removed normalize - whether all guides should be flipped into PAMseqEXT e.g. GGn-20N-3bp restricttolen - will restrict the guides to be of specific lengths, smaller than the initial motif this includes/excludes PAM based on removepam as remove_pam is applied before the length restriction

Examples

# use CHOPOFF example genome
genome = joinpath(
    vcat(
        splitpath(dirname(pathof(CHOPOFF)))[1:end-1], 
        "test", "sample_data", "genome", "semirandom.fa"))
# construct example DBInfo
dbi = DBInfo(genome, "Cas9_semirandom_noVCF", Motif("Cas9"))
# finally gather all off-targets
guides = Vector{String}()
ambig = gatherofftargets!(guides, dbi)

# here in the format of UInt64 encoding
guides2 = Vector{UInt64}()
ambig2 = gatherofftargets!(guides2, dbi)
guide_with_extension_len = length_noPAM(dbi.motif) + dbi.motif.distance

# transform UInt64 to LongDNA and String
guides2 = String.(LongDNA{4}.(guides2, guide_with_extension_len))