deepchem.molnet package

Submodules

deepchem.molnet.check_availability module

deepchem.molnet.preset_hyper_parameters module

Created on Tue Mar 7 00:07:10 2017

@author: zqwu

deepchem.molnet.run_benchmark module

Created on Mon Mar 06 14:25:40 2017

@author: Zhenqin Wu

deepchem.molnet.run_benchmark.benchmark_model(model, all_dataset, transformers, metric, test=False)[source]

Benchmark custom model.

model: user-defined model stucture
For user define model, it should include function: fit, evaluate.
all_dataset: (train, test, val) data tuple.
Returned by load_dataset function.

transformers

metric: string
choice of evaluation metrics.
deepchem.molnet.run_benchmark.load_dataset(dataset, featurizer, split='random')[source]

Load specific dataset for benchmark.

Parameters:

dataset: string

choice of which datasets to use, should be: tox21, muv, sider, toxcast, pcba, delaney, kaggle, nci, clintox, hiv, pdbbind, chembl, qm7, qm7b, qm9, sampl

featurizer: string or dc.feat.Featurizer.

choice of featurization.

split: string, optional (default=None)

choice of splitter function, None = using the default splitter

deepchem.molnet.run_benchmark.run_benchmark(datasets, model, split=None, metric=None, featurizer=None, n_features=0, out_path='.', hyper_parameters=None, test=False, reload=True, seed=123)[source]

Run benchmark test on designated datasets with deepchem(or user-defined) model

Parameters:

datasets: list of string

choice of which datasets to use, should be: bace_c, bace_r, bbbp, chembl, clearance, clintox, delaney, hiv, hopv, kaggle, lipo, muv, nci, pcba, pdbbind, ppb, qm7, qm7b, qm8, qm9, sampl, sider, tox21, toxcast

model: string or user-defined model stucture

choice of which model to use, deepchem provides implementation of logistic regression, random forest, multitask network, bypass multitask network, irv, graph convolution; for user define model, it should include function: fit, evaluate

split: string, optional (default=None)

choice of splitter function, None = using the default splitter

metric: string, optional (default=None)

choice of evaluation metrics, None = using the default metrics(AUC & R2)

featurizer: string or dc.feat.Featurizer, optional (default=None)

choice of featurization, None = using the default corresponding to model (string only applicable to deepchem models)

n_features: int, optional(default=0)

depending on featurizers, redefined when using deepchem featurizers, need to be specified for user-defined featurizers(if using deepchem models)

out_path: string, optional(default=’.’)

path of result file

hyper_parameters: dict, optional (default=None)

hyper parameters for designated model, None = use preset values

test: boolean, optional(default=False)

whether to evaluate on test set

reload: boolean, optional(default=True)

whether to save and reload featurized datasets

deepchem.molnet.run_benchmark_low_data module

Created on Mon Mar 06 14:25:40 2017

@author: Zhenqin Wu

deepchem.molnet.run_benchmark_low_data.run_benchmark_low_data(datasets, model, split='task', metric=None, featurizer=None, n_features=0, out_path='.', K=4, hyper_parameters=None, cross_valid=False, seed=123)[source]

Run low data benchmark test on designated datasets with deepchem(or user-defined) model

Parameters:

datasets: list of string

choice of which datasets to use, should be: muv, tox21, sider

model: string or user-defined model stucture

choice of which model to use, should be: siamese, attn, res

split: string, optional (default=’task’)

choice of splitter function, only task splitter supported

metric: string, optional (default=None)

choice of evaluation metrics, None = using the default metrics(AUC)

featurizer: string or dc.feat.Featurizer, optional (default=None)

choice of featurization, None = using the default corresponding to model (string only applicable to deepchem models)

n_features: int, optional(default=0)

depending on featurizers, redefined when using deepchem featurizers, need to be specified for user-defined featurizers(if using deepchem models)

out_path: string, optional(default=’.’)

path of result file

K: int, optional(default=4)

K-fold splitting of datasets

hyper_parameters: dict, optional (default=None)

hyper parameters for designated model, None = use preset values

cross_valid: boolean, optional(default=False)

whether to cross validate

deepchem.molnet.run_benchmark_models module

Created on Mon Mar 6 23:41:26 2017

@author: zqwu

deepchem.molnet.run_benchmark_models.benchmark_classification(train_dataset, valid_dataset, test_dataset, tasks, transformers, n_features, metric, model, test=False, hyper_parameters=None, seed=123)[source]

Calculate performance of different models on the specific dataset & tasks

Parameters:

train_dataset: dataset struct

dataset used for model training and evaluation

valid_dataset: dataset struct

dataset only used for model evaluation (and hyperparameter tuning)

test_dataset: dataset struct

dataset only used for model evaluation

tasks: list of string

list of targets(tasks, datasets)

transformers: dc.trans.Transformer struct

transformer used for model evaluation

n_features: integer

number of features, or length of binary fingerprints

metric: list of dc.metrics.Metric objects

metrics used for evaluation

model: string, optional (default=’tf’)

choice of which model to use, should be: rf, tf, tf_robust, logreg, irv, graphconv, dag, xgb, weave

test: boolean

whether to calculate test_set performance

hyper_parameters: dict, optional (default=None)

hyper parameters for designated model, None = use preset values

Returns:

train_scores : dict

predicting results(AUC) on training set

valid_scores : dict

predicting results(AUC) on valid set

test_scores : dict

predicting results(AUC) on test set

deepchem.molnet.run_benchmark_models.benchmark_regression(train_dataset, valid_dataset, test_dataset, tasks, transformers, n_features, metric, model, test=False, hyper_parameters=None, seed=123)[source]

Calculate performance of different models on the specific dataset & tasks

Parameters:

train_dataset: dataset struct

dataset used for model training and evaluation

valid_dataset: dataset struct

dataset only used for model evaluation (and hyperparameter tuning)

test_dataset: dataset struct

dataset only used for model evaluation

tasks: list of string

list of targets(tasks, datasets)

transformers: dc.trans.Transformer struct

transformer used for model evaluation

n_features: integer

number of features, or length of binary fingerprints

metric: list of dc.metrics.Metric objects

metrics used for evaluation

model: string, optional (default=’tf_regression’)

choice of which model to use, should be: tf_regression, tf_regression_ft, graphconvreg, rf_regression, dtnn, dag_regression, xgb_regression, weave_regression

test: boolean

whether to calculate test_set performance

hyper_parameters: dict, optional (default=None)

hyper parameters for designated model, None = use preset values

Returns:

train_scores : dict

predicting results(AUC) on training set

valid_scores : dict

predicting results(AUC) on valid set

test_scores : dict

predicting results(AUC) on test set

deepchem.molnet.run_benchmark_models.low_data_benchmark_classification(train_dataset, valid_dataset, n_features, metric, model='siamese', hyper_parameters=None, seed=123)[source]

Calculate low data benchmark performance

Parameters:

train_dataset : dataset struct

loaded dataset, ConvMol struct, used for training

valid_dataset : dataset struct

loaded dataset, ConvMol struct, used for validation

n_features : integer

number of features, or length of binary fingerprints

metric: list of dc.metrics.Metric objects

metrics used for evaluation

model : string, optional (default=’siamese’)

choice of which model to use, should be: siamese, attn, res

hyper_parameters: dict, optional (default=None)

hyper parameters for designated model, None = use preset values

Returns:

valid_scores : dict

predicting results(AUC) on valid set

Module contents