molreps.methods package

Submodules

molreps.methods.geo_npy module

Molecular geometric feature representation based on numpy.

a loose collection of functions. Modular functions to compute distance matrix, angles, coordinates, connectivity, etc. Many functions are written for batches too. Ideally all functions are vectorized. Note: All functions are supposed to work out of the box without any dependencies, i.e. do not depend on each other.

molreps.methods.geo_npy.add_edges_reverse_indices(edge_indices, edge_values=None, remove_duplicates=True, sort_indices=True)[source]

Add the edges for (i,j) as (j,i) with the same edge values. If they do already exist, no edge is added. By default, all indices are sorted.

Parameters
  • edge_indices (np.array) – Index list of shape (N,2).

  • edge_values (np.array) – Edge values of shape (N,M) matching the edge_indices

  • remove_duplicates (bool) – Remove duplicate edge indices. Default is True.

  • sort_indices (bool) – Sort final edge indices. Default is True.

Returns

edge_indices or [edge_indices, edge_values]

Return type

np.array

molreps.methods.geo_npy.all_angle_combinations(ind1, ind2)[source]

Get all angles between ALL possible bonds also unrelated bonds e.g. (1,2) and (17,20) which are not connected. Input shape is (…,N).

Note: This is mostly unpractical and not wanted, see make_angle_list for normal use.

Parameters
  • ind1 (np.array) – Indexlist of start index for a bond. This must be sorted. Shape (…,N)

  • ind2 (np.array) – Indexlist of end index for a bond. Shape (…,N)

Returns
np.array: index touples of shape (…,N*N/2-N,2,2) where the bonds are specified at last axis and

the bond pairs at axis=-2

molreps.methods.geo_npy.coordinates_from_distancematrix(distance, use_center=None, dim=3)[source]

Compute list of coordinates from a distance matrix of shape (N,N).

Uses vectorized Alogrithm: http://scripts.iucr.org/cgi-bin/paper?S0567739478000522 https://www.researchgate.net/publication/252396528_Stable_calculation_of_coordinates_from_distance_information no check of positive semi-definite or possible k-dim >= 3 is done here performs svd from numpy may even wok for (…,N,N) but not tested

Parameters
  • distance (np.array) – distance matrix of shape (N,N) with Dij = abs(ri-rj)

  • use_center (int) – which atom should be the center, dafault = None means center of mass

  • dim (int) – the dimension of embedding, 3 is default

Returns

List of Atom coordinates [[x_1,x_2,x_3],[x_1,x_2,x_3],…]

Return type

np.array

molreps.methods.geo_npy.coordinates_to_distancematrix(coord3d)[source]

Transform coordinates to distance matrix.

Will apply transformation on last dimension. Changing of shape (…,N,3) -> (…,N,N)

Arg:
coord3d (np.array): Coordinates of shape (…,N,3) for cartesian coordinates (x,y,z)

and N the number of atoms or points. Coordinates are last dimension.

Returns

distance matrix as numpy array with shape (…,N,N) where N is the number of atoms

Return type

np.array

molreps.methods.geo_npy.coulombmatrix_to_inversedistance_proton(coulmat, unit_conversion=1)[source]

Convert a coulomatrix back to inverse distancematrix + atomic number.

(…,N,N) -> (…,N,N) + (…,N)

Parameters
  • coulmat (np.array) – Full Coulombatrix of shape (…,N,N)

  • unit_conversion (float) – Whether to scale units for distance. Default is 1.

Returns

[inv_dist,z]

  • inv_dist(np.array): Inverse distance Matrix of shape (…,N,N)

  • z(np.array): Atom Number corresponding diagonal as proton number.

Return type

tuple

molreps.methods.geo_npy.define_adjacency_from_distance(distance_matrix, max_distance=inf, max_neighbours=inf, exclusive=True, self_loops=False)[source]

Construct adjacency matrix from a distance matrix by distance and number of neighbours. Works for batches.

This does take into account special bonds (e.g. chemical) just a general distance measure. Tries to connect nearest neighbours.

Parameters
  • distance_matrix (np.array) – distance Matrix of shape (…,N,N)

  • max_distance (float, optional) – Maximum distance to allow connections, can also be None. Defaults to np.inf.

  • max_neighbours (int, optional) – Maximum number of neighbours, can also be None. Defaults to np.inf.

  • exclusive (bool, optional) – Whether both max distance and Neighbours must be fullfileed. Defaults to True.

  • self_loops (bool, optional) – Allow self-loops on diagonal. Defaults to False.

Returns

[graph_adjacency,graph_indices]

  • graph_adjacency (np.array): Adjacency Matrix of shape (…,N,N) of dtype=np.bool.

  • graph_indices (np.array): Flatten indizes from former array that have Adjacency == True.

Return type

tuple

molreps.methods.geo_npy.distance_to_gaussdistance(distance, bins=30, gauss_range=5.0, gauss_sigma=0.2)[source]

Convert distance array to smooth one-hot representation using Gaussian functions.

Changes shape for gaussian distance (…,) -> (…,GBins) The Default values match units in Angstroem.

Parameters
  • distance (np.array) – Array of distances of shape (…,)

  • bins (int) – number of Bins to sample distance from, default = 30

  • gauss_range (value) – maximum distance to be captured by bins, default = 5.0

  • gauss_sigma (value) – sigma of the gaussian function, determining the width/sharpness, default = 0.2

Returns

Numpy array of gaussian distance with expanded last axis (…,GBins)

Return type

np.array

molreps.methods.geo_npy.geometry_from_coulombmat(coulmat, unit_conversion=1)[source]

Generate a geometry from Coulombmatrix.

Parameters
  • coulmat (np.array) – Coulombmatrix of shape (N,N).

  • unit_conversion (value, optional) – If untis are converted from or to a. Defaults to 1.

Returns

[ats,cords]

  • ats (list): List of atoms e.g. [‘C’,’C’].

  • cords (np.array): Coordinates of shape (N,3).

Return type

list

molreps.methods.geo_npy.get_angles(coords, inds)[source]

Compute angeles between coordinates (…,N,3) from a matching index list that has shape (…,M,3) with (ind0,ind1,ind2).

Angles are between ind1<(ind0,ind2) taking coords[ind]. The angle is oriented as ind1->ind0,ind1->ind2.

Parameters
  • coords (np.array) – list of coordinates of points (…,N,3)

  • inds (np.array) – Index list of points (…,M,3) that means coords[i] with i in axis=-1.

Returns

[angle_sin,angle_cos,angles ,norm_vec1,norm_vec2]

  • angle_sin (np.array): sin() of the angles between ind2<(ind1,ind3)

  • angle_cos (np.array): cos() of the angles between ind2<(ind1,ind3)

  • angles (np.array): angles in rads

  • norm_vec1 (np.array): length of vector ind1,ind2a

  • norm_vec2 (np.array): length of vector ind1,ind2b

Return type

list

molreps.methods.geo_npy.get_connectivity_from_inversedistancematrix(invdistmat, protons, radii_dict=None, k1=16.0, k2=1.3333333333333333, cutoff=0.85, force_bonds=True)[source]

Get connectivity table from inverse distance matrix defined at last dimensions (…,N,N) and corresponding bond-radii.

Keeps shape with (…,N,N). Covalent radii, from Pyykko and Atsumi, Chem. Eur. J. 15, 2009, 188-197. Values for metals decreased by 10% according to Robert Paton’s Sterimol implementation. Partially based on code from Robert Paton’s Sterimol script, which based this part on Grimme’s D3 code

Parameters
  • invdistmat (np.array) – inverse distance matrix defined at last dimensions (…,N,N) distances must be in Angstroem not in Bohr

  • protons (np.array) – An array of atomic numbers matching the invdistmat (…,N), for which the radii are to be computed.

  • radii_dict (np.array) – covalent radii for each element. If default=None, stored values are used. Otherwise array with covalent bonding radii. example: np.array([0, 0.24, 0.46, 1.2, …]) from {‘H’: 0.34, ‘He’: 0.46, ‘Li’: 1.2, …}

  • k1 (value) – default = 16

  • k2 (value) – default = 4.0/3.0

  • cutoff (value) – cutoff value to set values to Zero (no bond) default = 0.85

  • force_bonds (value) – whether to force at least one bond in the bond table per atom (default = True)

Retruns:

np.array: Connectivity table with 1 for chemical bond and zero otherwise of shape (…,N,N) -> (…,N,N)

molreps.methods.geo_npy.get_indexmatrix(shape, flatten=False)[source]

Matrix of indices with a_ijk… = [i,j,k,..] for shape (N,M,…,len(shape)) with Indexlist being the last dimension.

Note: numpy indexing does not work this way but as indexlist per dimension

Parameters
  • shape (list, int) – list of target shape, e.g. (2,2)

  • flatten (bool) – whether to flatten the output or keep inputshape, default=False

Returns

Index array of shape (N,M,…,len(shape)) e.g. [[[0,0],[0,1]],[[1,0],[1,1]]]

Return type

np.array

molreps.methods.geo_npy.inversedistancematrix_to_coulombmatrix(dinv, proton_number)[source]

Calculate Coulombmatrix from inverse distance Matrix plus nuclear charges/proton number.

Transform shape as (…,N,N) + (…,N) -> (…,N,N)

Parameters
  • dinv (np.array) – Inverse distance matrix defined at last two axis. Array of shape (…,N,N) with N number of atoms storing inverse distances.

  • proton_number (np.array) – Nuclear charges given in last dimension. Order must match entries in inverse distance matrix. array of shape (…,N)

Returns

Numpy array with Coulombmatrix at last two dimension (…,N,N).

Function multiplies Z_i*Z_j with 1/d_ij and set diagonal to 0.5*Z_ii^2.4

Return type

np.array

molreps.methods.geo_npy.invert_distance(d, nan=0, posinf=0, neginf=0)[source]

Invert distance array, e.g. distance matrix.

Inversion is done for all entries. Keeping of shape (…,) -> (…,)

Parameters
  • d (np.array) – array of distance values of shape (…,)

  • nan (value) – replacement for np.nan after division, default = 0

  • posinf (value) – replacement for np.inf after division, default = 0

  • neginf (value) – replacement for -np.inf after division, default = 0

Returns

Inverted distance array as numpy array of identical shape (…,) and

replaces np.nan and np.inf with e.g. 0

Return type

np.array

molreps.methods.geo_npy.make_angle_list(ind1, ind2)[source]

Generate list of indices that match all angles for connections defined by (ind1,ind2).

For each unique index in ind1, meaning for each center. ind1 should be sorted. Vectorized but requires memory for connections Max_bonds_per_atom*Number_atoms. Uses masking

Parameters
  • ind1 (np.array) – Indexlist of start index for a bond. This must be sorted. Shape (N,)

  • ind2 (np.array) – Indexlist of end index for a bond. Shape (N,)

Returns

Indexlist containing an angle-index-set. Shape (M,3)

Where the angle is defined by 0-1-2 as 1->0,1->2 or 1<(0,2)

Return type

out (np.array)

molreps.methods.geo_npy.make_rotationmatrix(vector, angle)[source]

Generate rotationmatrix around a given vector with a certain angle.

Only defined for 3 dimensions here.

Parameters
  • vector (np.array, list) – vector of rotation axis (3,) with (x,y,z)

  • angle (value) – angle in degrees ° to rotate around

Returns

Rotation matrix R of shape (3,3) that performs the rotation with y = R*x

Return type

list

molreps.methods.geo_npy.rigid_transform(a, b, correct_reflection=False)[source]

Rotate and shift pointcloud A to pointcloud B. This should implement Kabsch algorithm.

Important: the numbering of points of A and B must match, no shuffled pointcloud. This works for 3 dimensions only. Uses SVD.

Parameters
  • a (np.array) – list of points (N,3) to rotate (and translate)

  • b (np.array) – list of points (N,3) to rotate towards: A to B, where the coordinates (3) are (x,y,z)

  • correct_reflection (bool) – Whether to allow reflections or just rotations. Default is False.

Returns

[A_rot,R,t]

  • A_rot (np.array): Rotated and shifted version of A to match B

  • R (np.array): Rotation matrix

  • t (np.array): translation from A to B

Return type

list

molreps.methods.geo_npy.rotate_to_principle_axis(coord)[source]

Rotate a pointcloud to its principle axis.

This can be a molecule but also some general data. It uses PCA via SVD from numpy.linalg.svd(). PCA from scikit uses SVD too (scipy.sparse.linalg).

Note

The data is centered before SVD but shifted back at the output.

Parameters

coord (np.array) – Array of points forming a pointcloud. Important: coord has shape (N,p) where N is the number of samples and p is the feature/coordinate dimension e.g. 3 for x,y,z

Returns

[R,rotated]

  • R (np.array): rotaton matrix of shape (p,p) if input has (N,p)

  • rotated (np.array): rotated pointcould of coord that was the input.

Return type

tuple

molreps.methods.geo_npy.sort_distmatrix(distance_matrix)[source]

Sort a flexible shaped distance matrix along last dimension.

Keeps shape (…,N,M) -> index (…,N,M) + sorted (…,N,M)

Parameters

distance_matrix (np.array) – Matrix of distances of shape (…,N,M)

Returns

[sorting_index, sorted_distance]

  • sorting_index (np.array): Indices of sorted last dimension entries. Shape (…,N,M)

  • sorted_distance (np.array): Sorted distance Matrix, sorted at last dimension.

Return type

tuple

molreps.methods.geo_npy.value_to_onehot(vals, compare)[source]

Convert array of values e.g. nuclear charge to one-hot representation thereof.

a dictionary of all possible values is required. Expands shape from (…,) + (M,) -> (…,M)

Parameters
  • vals (np.array) – array of values to convert.

  • compare (np.array) – 1D-numpy array with a list of possible values.

Returns

a one-hot representation of vals input with expanded last dimension to match

the compare dictionary. Entries are 1.0 if vals == compare[i] and 0.0 else

Return type

np.array

molreps.methods.legacy_mixed module

Old unused functions.

Do not use! For record and comparison use.

molreps.methods.mol_pybel module

Functions for openbabel.

Specific functions for molecular features.

molreps.methods.mol_pybel.ob_build_xyz_string_from_list(atoms, coords)[source]

Make a xyz string from atom and coordinate list.

Parameters
  • atoms (list) – Atom list of type [‘H’,’C’,’H’,…].

  • coords (array) – Coordinate list of shape (N,3).

Returns

XYZ string for a xyz file.

Return type

str

molreps.methods.mol_pybel.ob_get_bond_table_from_coordinates(atoms, coords)[source]

Get bond order information by reading a xyz string.

The order of atoms in the list should be the same as output. But output is completely generated from OBMol.

Parameters
  • atoms (list) – Atom list of type [‘H’,’C’,’H’,…].

  • coords (array) – Coordinate list of shape (N,3).

Returns

ob_ats,ob_proton,bonds,ob_coord

  • ob_ats (list): Atom list of type [‘H’,’Car’,’O2’,…] of OBType i.e. with aromatic and state info.

  • ob_proton (list): Atomic Number of atoms as list.

  • bonds (list): Bond information of shape [i,j,order].

  • ob_coord (list): Coordinates as list.

Return type

tuple

molreps.methods.mol_pybel.ob_readXYZs(filename)[source]

Ready stacked xyz’s from file.

Parameters

filename (str) – Filepath.

Returns

[elements,coords]

  • elements (list): Coordinate list of shape (Molecules,Atoms,3).

  • coords (list): Molecule list of shape (Molecules,Atoms).

Return type

tuple

molreps.methods.mol_rdkit module

Functions for molecular properties from rdkit.

Note: All functions are supposed to work out of the box without any dependencies, i.e. do not depend on each other.

molreps.methods.mol_rdkit.rdkit_add_conformer(mol, coords, assign_id=False)[source]

Add a confromer to a mol object.

Parameters
  • mol (rdkit.Chem.Mol) – Mol object to add conformer.

  • coords (array) – Array of coordinates of shape (N,3).

  • assign_id (bool,int) – To assign conformer iD. Default is False.

Returns

Mol Object with added conformer.

Return type

rdkit.Chem.Mol

molreps.methods.mol_rdkit.rdkit_atom_list(mol, key, method, args=None)[source]

Make a list of atoms with atomic information from rdkit.mol.

Parameters
  • mol (rdkit.Chem.Mol) – Mol object to get information from.

  • key (str) – Key of property to put in list.

  • method (func) – Class member method for rdkit.Chem.rdchem.Atom.

  • args (dict, optional) – Optinal arguments for class method. Defaults to {}.

Returns

Atomlist that can be used in a networkx graph of shape [(i, {key: property})]

Return type

list

molreps.methods.mol_rdkit.rdkit_bond_distance_list(mol, key, conf_selection=0, methods='ETKDG', seed=61453, max_distance=inf, max_partners=inf, bonds_only=True, exclusive=True)[source]

Generate bond list with distance information from rdkit.mol.

Parameters
  • mol (rdkit.Chem.Mol) – Mol object to get information from.

  • key (str, optional) – Key of property to put in list.

  • conf_selection (int, optional) – Select a conformer. Defaults to 0.

  • methods (string, optional) – Method to generate conformer if none exist. Defaults to ‘ETKDG’.

  • seed (TYPE, optional) – Random seed. Defaults to 0xf00d.

  • max_distance (value, optional) – Maximum distance to allow bonds. Defaults to np.inf.

  • max_partners (value, optional) – Maximum nieghbour to allow bonds. Defaults to np.inf.

  • bonds_only (TYPE, optional) – Only take actual chemical bonds as bonds. Defaults to True.

Returns

Bondlist that can be used in a networkx graph of shape [(i,j, {key: distance})].

Return type

list

molreps.methods.mol_rdkit.rdkit_bond_list(mol, key, method, args=None)[source]

Make a list of bonds with bond-type information from rdkit.mol.

Parameters
  • mol (rdkit.Chem.Mol) – Mol object to get information from.

  • key (str, optional) – Key of property to put in list.

  • method (func) – Class member method for rdkit.Chem.rdchem.Bond.

  • args (dict, optional) – Optinal arguments for class method. Defaults to {}.

  • trafo (func) – Casting or trafo funciton. Default is int.

Returns

Bondlist that can be used in a networkx graph of shape [(i,j, {key: property})]

Return type

list

molreps.methods.mol_rdkit.rdkit_make_mol_from_structure(atoms, bondlist, coordinates=None)[source]

Make a mol object with

Parameters
  • atoms (list) – List of atoms (N,).

  • bondlist (list, np.array) – Bond list matching atom index. Shape (N,3) or (N,2). For (N,3) last entry can give bond order.

  • coordinates (np.array, optional) – Coordinates of Atoms of shape (N,3). Defaults to None.

Returns

Mol Object with added conformer.

Return type

rdkit.Chem.Mol

molreps.methods.mol_rdkit.rdkit_mol_from_atoms_bonds(atoms, bonds, sani=False)[source]

Convert an atom list and bond list to a rdkit mol class.

Parameters
  • atoms (list) – List of atoms (N,).

  • bonds (list, np.array) – Bond list matching atom index. Shape (N,3) or (N,2).

  • sani (bool, optional) – Whether to sanitize molecule. Defaults to False.

Returns

Rdkit Mol object. Molecule generated.

Return type

rdkit.Chem.Mol

molreps.methods.props_py module

Functions using only python.

Note: All functions are supposed to work out of the box without any dependencies, i.e. do not depend on each other.

molreps.methods.props_py.element_list_to_value(elem_list, replace_dict)[source]

Translate list of atoms as string to a list of values according to a dictionary.

This is recursive and should also work for nested lists.

Parameters
  • elem_list (list) – List of elements like [‘H’,’C’,’O’]

  • replace_dict (dict) – python dictionary of atom label and value e.g. {‘H’:1,…}

Returns

List of values for each atom.

Return type

list

molreps.methods.props_py.get_atom_property_dicts(identifier)[source]

Get a Dictionary for properties by identifier.

Parameters

identifier (str) – Which property to get.

Returns

Dictionary of Properties.

Return type

dict

Module contents