Usage

Representations

Simple moleculear representations can be generated from molreps.descriptors.:

from molreps.descriptors import coulomb_matrix
atoms = ['C','C']
coords = [[0,0,0],[1,0,0]]
cm = coulomb_matrix(atoms,coords)

However, also individual function can be used from molreps.methods. Like in this case the back-direction.:

from molreps.methods.geo_npy import geometry_from_coulombmat
atom_cord = geometry_from_coulombmat(cm)

Graph

For many ML models a graph representation of the molecule is required. The module MolGraph from molreps.graph inherits from networkx’s nx.Graph and can generate a molecular graph based on a mol-object provided by a cheminformatics package like rdkit, openbabel, ase etc. This is a flexible way to use functionalities from both networkx and packages like rdkit. First create a mol object.:

import rdkit
m = rdkit.Chem.MolFromSmiles("CC1=CC=CC=C1")
m = rdkit.Chem.AddHs(m)

The mol object is passed to the MolGraph class constructor but can be further accessed.

import networkx as nx
import numpy as np
from molreps.graph import MolGraph
mgraph = MolGraph(m)
mgraph.mol  # Access the mol object.

The networkx graph is generated by make(), where the features and keys can be specified. There are pre-defined features that can be assigned by an identifier like ‘key’: ‘identifier’ or if further arguments are required by ‘key’ : {‘class’:’identifier’, ‘args’:{‘arg1’: value1,’arg2’: value2 }}. In the latter case also a custom function or class can be provided like ‘key’ : {‘class’: my_fun, ‘args’:{‘arg1’: value1,’arg2’: value2 }}. A dictionary of predifined identifiers is listed in print(MolGraph._mols_implemented).:

mgraph.make()
mgraph.make(nodes = {"AtomicNum" : 'AtomicNum'},
            edges = {"BondType" : 'BondType',
                     "Distance" : {'class':'Distance', 'args':{'bonds_only':True}}},
            state = {"ExactMolWt" : 'ExactMolWt'}
            )

Note, a custom function must accept key and this class as arguments with .mol and can make a list of tuples such as [(i, {key: property})]`for atoms and `[((i,j, {key: property}))] for bonds such that it can be read by add_nodes_from() or add_edges_from(), respectively. Then the generated graph can be viewed and treated as a networkx graph, like plotting nx.draw(mgraph,with_labels=True). Finnaly, a closed form tensor is collected from selected features defined by the key-attribute. For each key an additional function to process the features and a default value can be optionally provided but defaults to np.array. A default value has to be added, if a single node or edge is missing a key, to generate a closed form tensor.:

mgraph.to_tensor()
graph_tensors= mgraph.to_tensor(nodes = ["AtomicNum"],
                                edges = ["BondType" ],
                                state = ["ExactMolWt"],
                                out_tensor = np.array)

The returned dictionary containing the feature tensors can be passed to graph models.