assnake.api package¶
Submodules¶
assnake.api.anal module¶
assnake.api.bb_stats module¶
-
assnake.api.bb_stats.
basic_info
(data, samples, queries, titles, print_stats=False)¶
-
assnake.api.bb_stats.
get_cov_stats
(prefix, df, samples, tool, preproc, seq_set_path, seq_set_id)¶
-
assnake.api.bb_stats.
get_df_from_query
(data, q)¶
-
assnake.api.bb_stats.
load_centrifuge
(ref)¶
-
assnake.api.bb_stats.
load_cov_stats
(samples, folder, light=False)¶
-
assnake.api.bb_stats.
load_coverage
(samples, folder, ext='.bb_stats')¶
-
assnake.api.bb_stats.
plot_coverage
(seq, samples, cov_dfs_list, cov_info_all, roll=False, remove_outliers=False)¶
-
assnake.api.bb_stats.
plot_gc_cov_portrait
(gc, cov, ax=None, rgba_colors=[[1, 1, 1, 1]], size=10, title='GC-cov-taxa Portrait')¶
-
assnake.api.bb_stats.
plot_gc_cov_portrait_mult
(df, samples, ax, norm=False, size=4, title='cov-gc poratrait')¶
-
assnake.api.bb_stats.
plot_portrait_with_diff
(omg, sample, s_vs_s, ax, size=4)¶
-
assnake.api.bb_stats.
prepare
(data, org, norm, roll=False, remove_outliers=False, agg_win=100)¶
-
assnake.api.bb_stats.
print_seq
(seq_id, seq_dict)¶
-
assnake.api.bb_stats.
remove_outliers
(df_in, col_name)¶
-
assnake.api.bb_stats.
roller
(data, window)¶
-
assnake.api.bb_stats.
to_nucl_res
(data)¶
assnake.api.dada2 module¶
-
assnake.api.dada2.
get_full_track
(fs_prefix, df, preproc)¶ Loads infromation about reads number on all steps of dada2 analysis. Tracking script from R should be runned first.
-
assnake.api.dada2.
rename_seqs_to_asvs
(otu_table, otu_table_renamed_loc, asvs_fasta_loc)¶ Because by default from dada2 we have indexes as full sequences we need to give them new human-readable names, and save this mapping to fasta file.
assnake.api.dataset module¶
assnake.api.fs_helpers module¶
-
assnake.api.fs_helpers.
create_links
(dir_with_reads, original_dir, sample, hard=False)¶ Parameters: - dir_with_reads – куда класть
- original_dir – откуда
- sample – dictionary от get samples dict from dir
- hard – if hard copying is needed or symbolic is sufficient (False)
Returns:
-
assnake.api.fs_helpers.
delete_ds
(dataset)¶ remove assnake dataset from database
-
assnake.api.fs_helpers.
find_files
(base, pattern)¶ Return list of files matching pattern in base folder.
-
assnake.api.fs_helpers.
get_sample_dict_from_dir
(loc, sample_name, variant, ext, modify_name=<function <lambda>>)¶ Parameters: - loc –
- sample_name –
- variant –
- ext –
- modify_name –
Returns:
-
assnake.api.fs_helpers.
get_samples_from_dir
(loc, modify_name=<function <lambda>>)¶ Searches for samples in loc. Sample should contain R1 and R2 :param loc: location on filesystem where we should search :return: Returns list of sample dicts in loc
assnake.api.init_config module¶
-
assnake.api.init_config.
fill_and_write_config
(assnake_db, fna_db_dir, bwa_index_dir, conda_dir, drmaa_log_dir, config_location)¶
assnake.api.loaders module¶
-
assnake.api.loaders.
df_full_info
(prefix, df, preprocessing='longest')¶ Returns dict with samples [{‘df’: df, ‘preproc’: preproc, ‘sample’: sample}]
- params:
- df - name of dataset preprocessing - longest/newest
-
assnake.api.loaders.
filter_mp2
(mp2, level='g__', zeroes_in_samples=0.5)¶
-
assnake.api.loaders.
general_taxa_one
(s)¶
-
assnake.api.loaders.
get_general_taxa_comp_krak_style
(samples)¶
-
assnake.api.loaders.
load_biospecimens_in_df
(df, db_loc, return_as='dict')¶ Returns dict/dataframe with techical info about biospecimens in dataset
-
assnake.api.loaders.
load_count
(fs_prefix, df, preproc, sample, report_bps=False, verbose=False, count_wc='')¶ Loads information about read and bp count in paired-end sample.
-
assnake.api.loaders.
load_df_from_db
(df_name, db_loc='', include_preprocs=False)¶ Returns one dictionary with df info
-
assnake.api.loaders.
load_dfs_from_db
(db_loc)¶ Returns dict of dictionaries with info about datasets from fs database. Key - df name Mandatory fields: df, prefix
-
assnake.api.loaders.
load_hm2
(prefix, samples, dbs='chocophlan__uniref90', index_by='fs_name', norm=False, modifier='unstratified')¶
-
assnake.api.loaders.
load_hm2_grouped
(samples, index_by='fs_name', norm=False, mapping='map_ko_uniref90')¶
-
assnake.api.loaders.
load_mag_contigs
(samples, source, dfs, assembly, assembler, centr, binn, collection)¶ Loads info about one bin from MAGs, returns dataframe with contigs coverage info in samples.
-
assnake.api.loaders.
load_mags_info
(meta, source, dfs, assembly, assembler, centr, collection, report_abundance_as='width')¶ - Loads information about MAGs for specific assembly and samples, estimates abundance and returns a dataframe
- with index corresponding to bins and columns corresponding to abundance in samples. Can be transformed to OTU table by applying df.T
-
assnake.api.loaders.
load_mg_samples_in_df
(df, db_loc, return_as='dict')¶ Returns dict/dataframe with techical info about mg_samples in dataset
-
assnake.api.loaders.
load_mg_samples_in_df_fs
(db_loc, df)¶
-
assnake.api.loaders.
load_mp2_new
(samples, version='__v2.9.12', params='def')¶
-
assnake.api.loaders.
load_mp2_old
(prefix, samples, level='s__', org='Bacteria', index_by='fs_name')¶
-
assnake.api.loaders.
load_resanal_reports
(samples, level='Mechanism', norm=True)¶
-
assnake.api.loaders.
load_sample
(fs_prefix, df, preproc, sample, report_bps=False, report_size=False, verbose=False, sample_dir_wc='', fastq_gz_file_wc='', count_wc='')¶ Loads all necessary info about given sample from file system.
-
assnake.api.loaders.
load_samples_metadata
(prefix, df)¶
-
assnake.api.loaders.
load_sources_in_df
(df, db_loc, return_as='dict')¶ Returns dict/dataframe with techical info about sources in dataset
-
assnake.api.loaders.
mg_samples_for_df_fs
(prefix, df)¶
-
assnake.api.loaders.
read_krak_node
(df, node_name)¶
-
assnake.api.loaders.
samples_in_df
(df, db_loc)¶
-
assnake.api.loaders.
samples_to_pd
(samples)¶
assnake.api.mg_py module¶
-
assnake.api.mg_py.
create_taxa_count_from_dada2
(tax_table, otu_table, rank_names=['Kingdom', 'Phylum', 'Class', 'Order', 'Family', 'Genus'])¶
-
assnake.api.mg_py.
single_level_from_mp2_table
(tax_table, rank='g__', ranks=['k__', 'p__', 'c__', 'o__', 'f__', 'g__', 's__', 't__'])¶ Takes metaphlan2 style table with all taxonomic levels included, like k__Archaea; k__Archaea|p__Euryarchaeota; etc., and desired taxonomic level as input, returns feature table with only selected taxonomic level features.
Parameters: - tax_table (
pandas.DataFrame
) – DataFrame with counts and full taxonomic information. Samples are columns. - rank (str) – Rank level at wich we want to agglomerate data
- levels (list(str)) – List with all ranks present in table
Returns: Return type: tax_table_pruned (
pandas.DataFrame
)- tax_table (
-
assnake.api.mg_py.
tax_glom
(taxa_counts, rank='Phylum', include_na=True, index='simple_long', rank_names=['Kingdom', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species'], ranks_short=['k', 'p', 'c', 'o', 'f', 'g', 's'])¶ Agglomerates data at desired taxonomic rank
Parameters: - tax_table (
pandas.DataFrame
) – DataFrame with counts and multiindex with taxonomic information. Samples are columns. - rank (str) – Rank level at wich we want to agglomerate data
- include_na (bool) – Whether to include counts without classification at desired level
Returns: Return type: agg_counts (
pandas.DataFrame
)- tax_table (
assnake.api.new_loaders module¶
assnake.api.oop module¶
assnake.api.prep module¶
-
assnake.api.prep.
prepare_list_for_multiqc_fastqc
(sample_dicts_list)¶ Saves sample list file for multiqc
-
assnake.api.prep.
prepare_samples_for_dada2
()¶
assnake.api.sample_set module¶
-
class
assnake.api.sample_set.
SampleSet
(fs_prefix, df, preproc, samples_to_add=[], do_not_add=[], pattern='')¶ Bases:
object
Class that agglomerates samples and provides convinience functions for different tasks, such as constructing list of desired results locations, or preparing lists of files for rules.
-
samples_pd
¶ Pandas DataFrame with information about samples
Type: pandas.DataFrame
-
add_samples
(fs_prefix, df, preproc, samples_to_add=[], do_not_add=[], pattern='')¶ This function is used to add samples into the SampleSet.
Parameters: - fs_prefix – Prefix of the dataset on filesystem
- df – Name of the dataset
- preproc – Preprocessing you want to use
- samples_to_add – List of sample names to add
- do_not_add – list of sample names NOT to add
- pattern – sample names must match this glob pattern to be included.
-
config
= {}¶
-
general_taxa
()¶
-
get_locs_for_result
(result, preproc='', params='def')¶
-
prepare_assembly_set
(assembler, params, set_name)¶
-
prepare_dada2_sample_list
(set_name='sample_set')¶
-
prepare_mothur_set
(dir_loc, set_name)¶
-
reads_info
= Empty DataFrame Columns: [] Index: []¶
-
samples_pd
= Empty DataFrame Columns: [df, fs_name, preproc, reads, sample] Index: []
-
wc_config
= {}¶
-
assnake.api.snake_module module¶
assnake.api.update_fs_samples module¶
-
assnake.api.update_fs_samples.
update_fs_samples_csv
(dataset)¶ Scans dataset folder and updates fs_samples.tsv if necessary
Parameters: dataset – Name of the dataset Returns: Returns sample dict in loc