Utils
gRNAs and kmers
CHOPOFF.getseq
— Functiongetseq(n = 20, letters = ['A', 'C', 'G', 'T'])
Randomize sequence of length n
from letters
.
CHOPOFF.as_kmers
— Functionas_kmers(x::LongDNA{4}, kmer_size::Int)
Transforms x
into vector of kmers of size kmer_size
. All ambiguous bases will be expanded.
Examples
julia> as_kmers(dna"ACTGG", 4)
2-element Vector{LongSequence{DNAAlphabet{4}}}:
ACTG
CTGG
CHOPOFF.as_skipkmers
— Functionas_skipkmers(x::LongDNA{4}, kmer_size::Int)
Transforms x
into vector of skip-kmers of size kmer_size
. All ambiguous bases will be expanded. Leftover-bases are ignored!
Examples
julia> as_skipkmers(dna"ACTGG", 2)
2-element Vector{LongSequence{DNAAlphabet{4}}}:
AC
TG
CHOPOFF.all_kmers
— Functionall_kmers(size = 4; alphabet = [DNA_A, DNA_C, DNA_G, DNA_T]
Make a list of all possible kmers with givensize
using bases in the alphabet
.
Examples
julia> all_kmers(2; alphabet = [DNA_A, DNA_N])
4-element Vector{LongSequence{DNAAlphabet{4}}}:
AA
AN
NA
NN
CHOPOFF.minkmersize
— Functionminkmersize(len::Int = 20, d::Int = 4)
Pigeon hole principle: minimum k-mer size that is required for two strings of size len
to be aligned within distance of d
.
Examples
julia> minkmersize(20, 3)
5
julia> minkmersize(20, 6)
2
Persistence
CHOPOFF.save
— Functionsave(object::Any, destination::String)
Uses julia serializer to save the data to binary format. Read more about serialization. Notice that:
- This function will overwrite
destination
! - This serialization is dependent on julia build! This means files can fail to work when reloaded across different julia builds.
CHOPOFF.load
— Functionload(destination::String)
Load file saved with save
function. This may not load properly files saved in other julia builds.
Summarize off-targets
CHOPOFF.summarize_offtargets
— Functionsummarize_offtargets(res::DataFrame; distance::Int = maximum(res.distance))
Summarize all off-targets into count table from the detail file. This does not automatically filters overlaps. You can specify distance to filter out some of the higher distances.
Arguments
res
- DataFrame created by one of the off-target finding methods, it contains columns such as :guide, :chromosome, :strand, :distance, :start
.
distance
- What is the maximum distance to assume in the data frame, its possible to specify smaller distance than contained in the res
DataFrame and autofilter lower distances.
Examples
using CHOPOFF, BioSequences
# make a temporary directory
tdir = tempname()
db_path = joinpath(tdir, "linearDB")
mkpath(db_path)
# use CHOPOFF example genome
chopoff_path = splitpath(dirname(pathof(CHOPOFF)))[1:end-1]
genome = joinpath(vcat(chopoff_path,
"test", "sample_data", "genome", "semirandom.fa"))
# build a linearDB
build_linearDB(
"samirandom", genome,
Motif("Cas9"),
db_path)
# load up example gRNAs
guides_s = Set(readlines(joinpath(vcat(chopoff_path,
"test", "sample_data", "guides.txt"))))
guides = LongDNA{4}.(guides_s)
# finally, make results!
res_path = joinpath(tdir, "linearDB", "results.csv")
search_linearDB(db_path, guides, res_path; distance = 3)
# load results
using DataFrames, CSV
res = DataFrame(CSV.File(res_path))
# filter results by close proximity
res = filter_overlapping(res, 23)
# summarize results into a table of counts by distance
summary = summarize_offtargets(res; distance = 3)
Proximity filter
CHOPOFF.filter_overlapping
— Functionfilter_overlapping(res::DataFrame, distance::Int)
Filter overlapping off-targets. Remember that off-targets have their start relative to the PAM location.
Arguments
res
- DataFrame created by one of the off-target finding methods, it contains columns such as :guide, :chromosome, :strand, :distance, :start
.
distance
- To what distance from the :start
do we consider the off-target to be overlapping?
Examples
using CHOPOFF, BioSequences
# make a temporary directory
tdir = tempname()
db_path = joinpath(tdir, "linearDB")
mkpath(db_path)
# use CHOPOFF example genome
chopoff_path = splitpath(dirname(pathof(CHOPOFF)))[1:end-1]
genome = joinpath(vcat(chopoff_path,
"test", "sample_data", "genome", "semirandom.fa"))
# build a linearDB
build_linearDB(
"samirandom", genome,
Motif("Cas9"),
db_path)
# load up example gRNAs
guides_s = Set(readlines(joinpath(vcat(chopoff_path,
"test", "sample_data", "guides.txt"))))
guides = LongDNA{4}.(guides_s)
# finally, make results!
res_path = joinpath(tdir, "linearDB", "results.csv")
search_linearDB(db_path, guides, res_path; distance = 3)
# load results
using DataFrames, CSV
res = DataFrame(CSV.File(res_path))
# filter results by close proximity
res = filter_overlapping(res, 23)
# summarize results into a table of counts by distance
summary = summarize_offtargets(res; distance = 3)