assnake.api package

Submodules

assnake.api.anal module

assnake.api.bb_stats module

assnake.api.bb_stats.basic_info(data, samples, queries, titles, print_stats=False)
assnake.api.bb_stats.get_cov_stats(prefix, df, samples, tool, preproc, seq_set_path, seq_set_id)
assnake.api.bb_stats.get_df_from_query(data, q)
assnake.api.bb_stats.load_centrifuge(ref)
assnake.api.bb_stats.load_cov_stats(samples, folder, light=False)
assnake.api.bb_stats.load_coverage(samples, folder, ext='.bb_stats')
assnake.api.bb_stats.plot_coverage(seq, samples, cov_dfs_list, cov_info_all, roll=False, remove_outliers=False)
assnake.api.bb_stats.plot_gc_cov_portrait(gc, cov, ax=None, rgba_colors=[[1, 1, 1, 1]], size=10, title='GC-cov-taxa Portrait')
assnake.api.bb_stats.plot_gc_cov_portrait_mult(df, samples, ax, norm=False, size=4, title='cov-gc poratrait')
assnake.api.bb_stats.plot_portrait_with_diff(omg, sample, s_vs_s, ax, size=4)
assnake.api.bb_stats.prepare(data, org, norm, roll=False, remove_outliers=False, agg_win=100)
assnake.api.bb_stats.print_seq(seq_id, seq_dict)
assnake.api.bb_stats.remove_outliers(df_in, col_name)
assnake.api.bb_stats.roller(data, window)
assnake.api.bb_stats.to_nucl_res(data)

assnake.api.dada2 module

assnake.api.dada2.get_full_track(fs_prefix, df, preproc)

Loads infromation about reads number on all steps of dada2 analysis. Tracking script from R should be runned first.

assnake.api.dada2.rename_seqs_to_asvs(otu_table, otu_table_renamed_loc, asvs_fasta_loc)

Because by default from dada2 we have indexes as full sequences we need to give them new human-readable names, and save this mapping to fasta file.

assnake.api.dataset module

class assnake.api.dataset.Dataset(df)

Bases: object

biospecimens = None
df = ''
fs_prefix = ''
full_path = ''
mg_samples = None
sample_sets = {}
sources = None
to_dict()

assnake.api.fs_helpers module

Parameters:
  • dir_with_reads – куда класть
  • original_dir – откуда
  • sample – dictionary от get samples dict from dir
  • hard – if hard copying is needed or symbolic is sufficient (False)
Returns:

assnake.api.fs_helpers.delete_ds(dataset)

remove assnake dataset from database

assnake.api.fs_helpers.find_files(base, pattern)

Return list of files matching pattern in base folder.

assnake.api.fs_helpers.get_sample_dict_from_dir(loc, sample_name, variant, ext, modify_name=<function <lambda>>)
Parameters:
  • loc
  • sample_name
  • variant
  • ext
  • modify_name
Returns:

assnake.api.fs_helpers.get_samples_from_dir(loc, modify_name=<function <lambda>>)

Searches for samples in loc. Sample should contain R1 and R2 :param loc: location on filesystem where we should search :return: Returns list of sample dicts in loc

assnake.api.init_config module

assnake.api.init_config.fill_and_write_config(assnake_db, fna_db_dir, bwa_index_dir, conda_dir, drmaa_log_dir, config_location)

assnake.api.loaders module

assnake.api.loaders.df_full_info(prefix, df, preprocessing='longest')

Returns dict with samples [{‘df’: df, ‘preproc’: preproc, ‘sample’: sample}]

params:
df - name of dataset preprocessing - longest/newest
assnake.api.loaders.filter_mp2(mp2, level='g__', zeroes_in_samples=0.5)
assnake.api.loaders.general_taxa_one(s)
assnake.api.loaders.get_general_taxa_comp_krak_style(samples)
assnake.api.loaders.load_biospecimens_in_df(df, db_loc, return_as='dict')

Returns dict/dataframe with techical info about biospecimens in dataset

assnake.api.loaders.load_count(fs_prefix, df, preproc, sample, report_bps=False, verbose=False, count_wc='')

Loads information about read and bp count in paired-end sample.

assnake.api.loaders.load_df_from_db(df_name, db_loc='', include_preprocs=False)

Returns one dictionary with df info

assnake.api.loaders.load_dfs_from_db(db_loc)

Returns dict of dictionaries with info about datasets from fs database. Key - df name Mandatory fields: df, prefix

assnake.api.loaders.load_hm2(prefix, samples, dbs='chocophlan__uniref90', index_by='fs_name', norm=False, modifier='unstratified')
assnake.api.loaders.load_hm2_grouped(samples, index_by='fs_name', norm=False, mapping='map_ko_uniref90')
assnake.api.loaders.load_mag_contigs(samples, source, dfs, assembly, assembler, centr, binn, collection)

Loads info about one bin from MAGs, returns dataframe with contigs coverage info in samples.

assnake.api.loaders.load_mags_info(meta, source, dfs, assembly, assembler, centr, collection, report_abundance_as='width')
Loads information about MAGs for specific assembly and samples, estimates abundance and returns a dataframe
with index corresponding to bins and columns corresponding to abundance in samples. Can be transformed to OTU table by applying df.T
assnake.api.loaders.load_mg_samples_in_df(df, db_loc, return_as='dict')

Returns dict/dataframe with techical info about mg_samples in dataset

assnake.api.loaders.load_mg_samples_in_df_fs(db_loc, df)
assnake.api.loaders.load_mp2_new(samples, version='__v2.9.12', params='def')
assnake.api.loaders.load_mp2_old(prefix, samples, level='s__', org='Bacteria', index_by='fs_name')
assnake.api.loaders.load_resanal_reports(samples, level='Mechanism', norm=True)
assnake.api.loaders.load_sample(fs_prefix, df, preproc, sample, report_bps=False, report_size=False, verbose=False, sample_dir_wc='', fastq_gz_file_wc='', count_wc='')

Loads all necessary info about given sample from file system.

assnake.api.loaders.load_samples_metadata(prefix, df)
assnake.api.loaders.load_sources_in_df(df, db_loc, return_as='dict')

Returns dict/dataframe with techical info about sources in dataset

assnake.api.loaders.mg_samples_for_df_fs(prefix, df)
assnake.api.loaders.read_krak_node(df, node_name)
assnake.api.loaders.samples_in_df(df, db_loc)
assnake.api.loaders.samples_to_pd(samples)

assnake.api.mg_py module

assnake.api.mg_py.create_taxa_count_from_dada2(tax_table, otu_table, rank_names=['Kingdom', 'Phylum', 'Class', 'Order', 'Family', 'Genus'])
assnake.api.mg_py.single_level_from_mp2_table(tax_table, rank='g__', ranks=['k__', 'p__', 'c__', 'o__', 'f__', 'g__', 's__', 't__'])

Takes metaphlan2 style table with all taxonomic levels included, like k__Archaea; k__Archaea|p__Euryarchaeota; etc., and desired taxonomic level as input, returns feature table with only selected taxonomic level features.

Parameters:
  • tax_table (pandas.DataFrame) – DataFrame with counts and full taxonomic information. Samples are columns.
  • rank (str) – Rank level at wich we want to agglomerate data
  • levels (list(str)) – List with all ranks present in table
Returns:

Return type:

tax_table_pruned (pandas.DataFrame)

assnake.api.mg_py.tax_glom(taxa_counts, rank='Phylum', include_na=True, index='simple_long', rank_names=['Kingdom', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species'], ranks_short=['k', 'p', 'c', 'o', 'f', 'g', 's'])

Agglomerates data at desired taxonomic rank

Parameters:
  • tax_table (pandas.DataFrame) – DataFrame with counts and multiindex with taxonomic information. Samples are columns.
  • rank (str) – Rank level at wich we want to agglomerate data
  • include_na (bool) – Whether to include counts without classification at desired level
Returns:

Return type:

agg_counts (pandas.DataFrame)

assnake.api.new_loaders module

assnake.api.oop module

assnake.api.prep module

assnake.api.prep.prepare_list_for_multiqc_fastqc(sample_dicts_list)

Saves sample list file for multiqc

assnake.api.prep.prepare_samples_for_dada2()

assnake.api.sample_set module

class assnake.api.sample_set.SampleSet(fs_prefix, df, preproc, samples_to_add=[], do_not_add=[], pattern='')

Bases: object

Class that agglomerates samples and provides convinience functions for different tasks, such as constructing list of desired results locations, or preparing lists of files for rules.

samples_pd

Pandas DataFrame with information about samples

Type:pandas.DataFrame
add_samples(fs_prefix, df, preproc, samples_to_add=[], do_not_add=[], pattern='')

This function is used to add samples into the SampleSet.

Parameters:
  • fs_prefix – Prefix of the dataset on filesystem
  • df – Name of the dataset
  • preproc – Preprocessing you want to use
  • samples_to_add – List of sample names to add
  • do_not_add – list of sample names NOT to add
  • pattern – sample names must match this glob pattern to be included.
config = {}
general_taxa()
get_locs_for_result(result, preproc='', params='def')
prepare_assembly_set(assembler, params, set_name)
prepare_dada2_sample_list(set_name='sample_set')
prepare_mothur_set(dir_loc, set_name)
reads_info = Empty DataFrame Columns: [] Index: []
samples_pd = Empty DataFrame Columns: [df, fs_name, preproc, reads, sample] Index: []
wc_config = {}

assnake.api.snake_module module

class assnake.api.snake_module.SnakeModule(name, install_dir, snakefiles, invocation_commands, initialization_commands=[], wc_configs=[])

Bases: object

initialization_commands = []
install_dir = ''
invocation_commands = []
name = ''
snakefiles = []
wc_configs = []

assnake.api.update_fs_samples module

assnake.api.update_fs_samples.update_fs_samples_csv(dataset)

Scans dataset folder and updates fs_samples.tsv if necessary

Parameters:dataset – Name of the dataset
Returns:Returns sample dict in loc

assnake.api.viz module

Module contents