Average Nucleotide Identity#
Sketch#
- class pyfastani.Sketch#
An index computing minimizers over the reference genomes.
Use this class to add reference genomes with the
add_genomeoradd_draftmethods, then call theindexmethod to obtain aMapperthat can be used to map query genomes.- minimizers#
A view over the minimizers currently recorded in the sketch.
- Type:
- __init__(*, k=16, fragment_length=3000, minimum_fraction=0.2, p_value=0.001, percentage_identity=80, reference_size=5000000000.0, protein=False)#
Create a new FastANI sequence sketch.
- Keyword Arguments:
k (
int) – The size of the k-mers. FastANI authors recommend a size of at most 16, but any positive number below up topyfastani.MAX_KMER_SIZEwill work.fragment_length (
int) – The lengths the blocks should have when splitting the query. Queries smaller than this number won’t be processed.minimum_fraction (
float) – The minimum fraction of genome that must be shared for a hit to be reported. If reference and query genome size differ, smaller one among the two is considered.p_value (
float) – The p-value cutoff. Used to determine the recommended window size.percentage_identity (
int) – An identity percentage above which ANI values between two sequences can be trusted. Used to to determine the recommended window size.reference_size (
int) – An estimate of the reference length. Used to determine the recommended window size.protein (
bool) – Whether or not protein sequences are expected. IfTrue, the alphabet size is changed from 4 to 20, minimizers are not computed on the “reverse” strand, and the window size is set to 1.
- add_draft(name, contigs)#
Add a reference draft genome to the sketcher.
Using this method is fine even when the genome has a single contig, although
Sketch.add_genomeis easier to use in that case.- Parameters:
- Returns:
Sketch– the object itself, for method chaining.
Hint
Contigs smaller than the window size and the k-mer size will be skipped.
- add_genome(name, sequence)#
Add a reference genome to the sketcher.
This method is a shortcut for
Sketch.add_draftwhen a genome is complete (i.e. only contains a single contig).- Parameters:
- Returns:
Sketch– the object itself, for method chaining.
Hint
Sequence must be larger than the window size and the k-mer size to be sketched, otherwise no minifiers will be computed.
- clear()#
Reset the
Sketch, removing any reference genome it may contain.- Returns:
Sketch– the object itself, for method chaining.
- index()#
Index the reference genomes for fast lookups using the minimizers.
Once all the reference sequences have been added to the
Sketch, use this method to create an efficient mapper, dropping the most common minifiers among the reference sequences.- Returns:
Mapper– An indexed mapper that can be used for fast querying.
Mapper#
- class pyfastani.Mapper#
A genome mapper using Murmur3 hashes and k-mers to compute ANI.
- minimizers#
A view over the minimizers recorded in the mapper.
- Type:
- query_draft(contigs, threads=0)#
Query the mapper for a complete genome.
- Parameters:
- Returns:
Hint
Sequence must be larger than the window size, the k-mer size, and the fragment length to be mapped, otherwise an empty list of hits will be returned.
Note
This method is reentrant and releases the GIL when hashing the blocks allowing to query the mapper in parallel for several individual genomes.
Added in version 0.4.0: The
threadsargument.
- query_genome(sequence, threads=0)#
Query the mapper for a complete genome.
- Parameters:
- Returns:
Hint
Sequence must be larger than the window size, the k-mer size, and the fragment length to be mapped, otherwise an empty list of hits will be returned.
Note
This method is reentrant and releases the GIL when hashing the blocks allowing to query the mapper in parallel for several individual genomes.
Added in version 0.4.0: The
threadsargument.
- lookup_index#
The index of initial minimizer positions.
This table is used to retrieve at which positions the minimizers appear in the reference genomes.
- Type:
MinimizerLookupIndex