molreps.methods package¶
Submodules¶
molreps.methods.geo_npy module¶
Molecular geometric feature representation based on numpy.
a loose collection of functions. Modular functions to compute distance matrix, angles, coordinates, connectivity, etc. Many functions are written for batches too. Ideally all functions are vectorized. Note: All functions are supposed to work out of the box without any dependencies, i.e. do not depend on each other.
-
molreps.methods.geo_npy.
add_edges_reverse_indices
(edge_indices, edge_values=None, remove_duplicates=True, sort_indices=True)[source]¶ Add the edges for (i,j) as (j,i) with the same edge values. If they do already exist, no edge is added. By default, all indices are sorted.
- Parameters
- Returns
edge_indices or [edge_indices, edge_values]
- Return type
np.array
-
molreps.methods.geo_npy.
all_angle_combinations
(ind1, ind2)[source]¶ Get all angles between ALL possible bonds also unrelated bonds e.g. (1,2) and (17,20) which are not connected. Input shape is (…,N).
Note: This is mostly unpractical and not wanted, see make_angle_list for normal use.
- Parameters
ind1 (np.array) – Indexlist of start index for a bond. This must be sorted. Shape (…,N)
ind2 (np.array) – Indexlist of end index for a bond. Shape (…,N)
- Returns
- np.array: index touples of shape (…,N*N/2-N,2,2) where the bonds are specified at last axis and
the bond pairs at axis=-2
-
molreps.methods.geo_npy.
coordinates_from_distancematrix
(distance, use_center=None, dim=3)[source]¶ Compute list of coordinates from a distance matrix of shape (N,N).
Uses vectorized Alogrithm: http://scripts.iucr.org/cgi-bin/paper?S0567739478000522 https://www.researchgate.net/publication/252396528_Stable_calculation_of_coordinates_from_distance_information no check of positive semi-definite or possible k-dim >= 3 is done here performs svd from numpy may even wok for (…,N,N) but not tested
- Parameters
- Returns
List of Atom coordinates [[x_1,x_2,x_3],[x_1,x_2,x_3],…]
- Return type
np.array
-
molreps.methods.geo_npy.
coordinates_to_distancematrix
(coord3d)[source]¶ Transform coordinates to distance matrix.
Will apply transformation on last dimension. Changing of shape (…,N,3) -> (…,N,N)
- Arg:
- coord3d (np.array): Coordinates of shape (…,N,3) for cartesian coordinates (x,y,z)
and N the number of atoms or points. Coordinates are last dimension.
- Returns
distance matrix as numpy array with shape (…,N,N) where N is the number of atoms
- Return type
np.array
-
molreps.methods.geo_npy.
coulombmatrix_to_inversedistance_proton
(coulmat, unit_conversion=1)[source]¶ Convert a coulomatrix back to inverse distancematrix + atomic number.
(…,N,N) -> (…,N,N) + (…,N)
- Parameters
coulmat (np.array) – Full Coulombatrix of shape (…,N,N)
unit_conversion (float) – Whether to scale units for distance. Default is 1.
- Returns
[inv_dist,z]
inv_dist(np.array): Inverse distance Matrix of shape (…,N,N)
z(np.array): Atom Number corresponding diagonal as proton number.
- Return type
-
molreps.methods.geo_npy.
define_adjacency_from_distance
(distance_matrix, max_distance=inf, max_neighbours=inf, exclusive=True, self_loops=False)[source]¶ Construct adjacency matrix from a distance matrix by distance and number of neighbours. Works for batches.
This does take into account special bonds (e.g. chemical) just a general distance measure. Tries to connect nearest neighbours.
- Parameters
distance_matrix (np.array) – distance Matrix of shape (…,N,N)
max_distance (float, optional) – Maximum distance to allow connections, can also be None. Defaults to np.inf.
max_neighbours (int, optional) – Maximum number of neighbours, can also be None. Defaults to np.inf.
exclusive (bool, optional) – Whether both max distance and Neighbours must be fullfileed. Defaults to True.
self_loops (bool, optional) – Allow self-loops on diagonal. Defaults to False.
- Returns
[graph_adjacency,graph_indices]
graph_adjacency (np.array): Adjacency Matrix of shape (…,N,N) of dtype=np.bool.
graph_indices (np.array): Flatten indizes from former array that have Adjacency == True.
- Return type
-
molreps.methods.geo_npy.
distance_to_gaussdistance
(distance, bins=30, gauss_range=5.0, gauss_sigma=0.2)[source]¶ Convert distance array to smooth one-hot representation using Gaussian functions.
Changes shape for gaussian distance (…,) -> (…,GBins) The Default values match units in Angstroem.
- Parameters
distance (np.array) – Array of distances of shape (…,)
bins (int) – number of Bins to sample distance from, default = 30
gauss_range (value) – maximum distance to be captured by bins, default = 5.0
gauss_sigma (value) – sigma of the gaussian function, determining the width/sharpness, default = 0.2
- Returns
Numpy array of gaussian distance with expanded last axis (…,GBins)
- Return type
np.array
-
molreps.methods.geo_npy.
geometry_from_coulombmat
(coulmat, unit_conversion=1)[source]¶ Generate a geometry from Coulombmatrix.
- Parameters
coulmat (np.array) – Coulombmatrix of shape (N,N).
unit_conversion (value, optional) – If untis are converted from or to a. Defaults to 1.
- Returns
[ats,cords]
ats (list): List of atoms e.g. [‘C’,’C’].
cords (np.array): Coordinates of shape (N,3).
- Return type
-
molreps.methods.geo_npy.
get_angles
(coords, inds)[source]¶ Compute angeles between coordinates (…,N,3) from a matching index list that has shape (…,M,3) with (ind0,ind1,ind2).
Angles are between ind1<(ind0,ind2) taking coords[ind]. The angle is oriented as ind1->ind0,ind1->ind2.
- Parameters
coords (np.array) – list of coordinates of points (…,N,3)
inds (np.array) – Index list of points (…,M,3) that means coords[i] with i in axis=-1.
- Returns
[angle_sin,angle_cos,angles ,norm_vec1,norm_vec2]
angle_sin (np.array): sin() of the angles between ind2<(ind1,ind3)
angle_cos (np.array): cos() of the angles between ind2<(ind1,ind3)
angles (np.array): angles in rads
norm_vec1 (np.array): length of vector ind1,ind2a
norm_vec2 (np.array): length of vector ind1,ind2b
- Return type
-
molreps.methods.geo_npy.
get_connectivity_from_inversedistancematrix
(invdistmat, protons, radii_dict=None, k1=16.0, k2=1.3333333333333333, cutoff=0.85, force_bonds=True)[source]¶ Get connectivity table from inverse distance matrix defined at last dimensions (…,N,N) and corresponding bond-radii.
Keeps shape with (…,N,N). Covalent radii, from Pyykko and Atsumi, Chem. Eur. J. 15, 2009, 188-197. Values for metals decreased by 10% according to Robert Paton’s Sterimol implementation. Partially based on code from Robert Paton’s Sterimol script, which based this part on Grimme’s D3 code
- Parameters
invdistmat (np.array) – inverse distance matrix defined at last dimensions (…,N,N) distances must be in Angstroem not in Bohr
protons (np.array) – An array of atomic numbers matching the invdistmat (…,N), for which the radii are to be computed.
radii_dict (np.array) – covalent radii for each element. If default=None, stored values are used. Otherwise array with covalent bonding radii. example: np.array([0, 0.24, 0.46, 1.2, …]) from {‘H’: 0.34, ‘He’: 0.46, ‘Li’: 1.2, …}
k1 (value) – default = 16
k2 (value) – default = 4.0/3.0
cutoff (value) – cutoff value to set values to Zero (no bond) default = 0.85
force_bonds (value) – whether to force at least one bond in the bond table per atom (default = True)
- Retruns:
np.array: Connectivity table with 1 for chemical bond and zero otherwise of shape (…,N,N) -> (…,N,N)
-
molreps.methods.geo_npy.
get_indexmatrix
(shape, flatten=False)[source]¶ Matrix of indices with a_ijk… = [i,j,k,..] for shape (N,M,…,len(shape)) with Indexlist being the last dimension.
Note: numpy indexing does not work this way but as indexlist per dimension
-
molreps.methods.geo_npy.
inversedistancematrix_to_coulombmatrix
(dinv, proton_number)[source]¶ Calculate Coulombmatrix from inverse distance Matrix plus nuclear charges/proton number.
Transform shape as (…,N,N) + (…,N) -> (…,N,N)
- Parameters
dinv (np.array) – Inverse distance matrix defined at last two axis. Array of shape (…,N,N) with N number of atoms storing inverse distances.
proton_number (np.array) – Nuclear charges given in last dimension. Order must match entries in inverse distance matrix. array of shape (…,N)
- Returns
- Numpy array with Coulombmatrix at last two dimension (…,N,N).
Function multiplies Z_i*Z_j with 1/d_ij and set diagonal to 0.5*Z_ii^2.4
- Return type
np.array
-
molreps.methods.geo_npy.
invert_distance
(d, nan=0, posinf=0, neginf=0)[source]¶ Invert distance array, e.g. distance matrix.
Inversion is done for all entries. Keeping of shape (…,) -> (…,)
- Parameters
d (np.array) – array of distance values of shape (…,)
nan (value) – replacement for np.nan after division, default = 0
posinf (value) – replacement for np.inf after division, default = 0
neginf (value) – replacement for -np.inf after division, default = 0
- Returns
- Inverted distance array as numpy array of identical shape (…,) and
replaces np.nan and np.inf with e.g. 0
- Return type
np.array
-
molreps.methods.geo_npy.
make_angle_list
(ind1, ind2)[source]¶ Generate list of indices that match all angles for connections defined by (ind1,ind2).
For each unique index in ind1, meaning for each center. ind1 should be sorted. Vectorized but requires memory for connections Max_bonds_per_atom*Number_atoms. Uses masking
- Parameters
ind1 (np.array) – Indexlist of start index for a bond. This must be sorted. Shape (N,)
ind2 (np.array) – Indexlist of end index for a bond. Shape (N,)
- Returns
- Indexlist containing an angle-index-set. Shape (M,3)
Where the angle is defined by 0-1-2 as 1->0,1->2 or 1<(0,2)
- Return type
out (np.array)
-
molreps.methods.geo_npy.
make_rotationmatrix
(vector, angle)[source]¶ Generate rotationmatrix around a given vector with a certain angle.
Only defined for 3 dimensions here.
-
molreps.methods.geo_npy.
rigid_transform
(a, b, correct_reflection=False)[source]¶ Rotate and shift pointcloud A to pointcloud B. This should implement Kabsch algorithm.
Important: the numbering of points of A and B must match, no shuffled pointcloud. This works for 3 dimensions only. Uses SVD.
Note
Explanation of Kabsch Algorithm: https://en.wikipedia.org/wiki/Kabsch_algorithm For further literature https://link.springer.com/article/10.1007/s10015-016-0265-x https://link.springer.com/article/10.1007%2Fs001380050048 maybe work for (…,N,3), not tested
- Parameters
a (np.array) – list of points (N,3) to rotate (and translate)
b (np.array) – list of points (N,3) to rotate towards: A to B, where the coordinates (3) are (x,y,z)
correct_reflection (bool) – Whether to allow reflections or just rotations. Default is False.
- Returns
[A_rot,R,t]
A_rot (np.array): Rotated and shifted version of A to match B
R (np.array): Rotation matrix
t (np.array): translation from A to B
- Return type
-
molreps.methods.geo_npy.
rotate_to_principle_axis
(coord)[source]¶ Rotate a pointcloud to its principle axis.
This can be a molecule but also some general data. It uses PCA via SVD from numpy.linalg.svd(). PCA from scikit uses SVD too (scipy.sparse.linalg).
Note
The data is centered before SVD but shifted back at the output.
- Parameters
coord (np.array) – Array of points forming a pointcloud. Important: coord has shape (N,p) where N is the number of samples and p is the feature/coordinate dimension e.g. 3 for x,y,z
- Returns
[R,rotated]
R (np.array): rotaton matrix of shape (p,p) if input has (N,p)
rotated (np.array): rotated pointcould of coord that was the input.
- Return type
-
molreps.methods.geo_npy.
sort_distmatrix
(distance_matrix)[source]¶ Sort a flexible shaped distance matrix along last dimension.
Keeps shape (…,N,M) -> index (…,N,M) + sorted (…,N,M)
- Parameters
distance_matrix (np.array) – Matrix of distances of shape (…,N,M)
- Returns
[sorting_index, sorted_distance]
sorting_index (np.array): Indices of sorted last dimension entries. Shape (…,N,M)
sorted_distance (np.array): Sorted distance Matrix, sorted at last dimension.
- Return type
-
molreps.methods.geo_npy.
value_to_onehot
(vals, compare)[source]¶ Convert array of values e.g. nuclear charge to one-hot representation thereof.
a dictionary of all possible values is required. Expands shape from (…,) + (M,) -> (…,M)
- Parameters
vals (np.array) – array of values to convert.
compare (np.array) – 1D-numpy array with a list of possible values.
- Returns
- a one-hot representation of vals input with expanded last dimension to match
the compare dictionary. Entries are 1.0 if vals == compare[i] and 0.0 else
- Return type
np.array
molreps.methods.legacy_mixed module¶
Old unused functions.
Do not use! For record and comparison use.
molreps.methods.mol_pybel module¶
Functions for openbabel.
Specific functions for molecular features.
-
molreps.methods.mol_pybel.
ob_build_xyz_string_from_list
(atoms, coords)[source]¶ Make a xyz string from atom and coordinate list.
-
molreps.methods.mol_pybel.
ob_get_bond_table_from_coordinates
(atoms, coords)[source]¶ Get bond order information by reading a xyz string.
The order of atoms in the list should be the same as output. But output is completely generated from OBMol.
- Parameters
atoms (list) – Atom list of type [‘H’,’C’,’H’,…].
coords (array) – Coordinate list of shape (N,3).
- Returns
ob_ats,ob_proton,bonds,ob_coord
ob_ats (list): Atom list of type [‘H’,’Car’,’O2’,…] of OBType i.e. with aromatic and state info.
ob_proton (list): Atomic Number of atoms as list.
bonds (list): Bond information of shape [i,j,order].
ob_coord (list): Coordinates as list.
- Return type
molreps.methods.mol_rdkit module¶
Functions for molecular properties from rdkit.
Note: All functions are supposed to work out of the box without any dependencies, i.e. do not depend on each other.
-
molreps.methods.mol_rdkit.
rdkit_add_conformer
(mol, coords, assign_id=False)[source]¶ Add a confromer to a mol object.
-
molreps.methods.mol_rdkit.
rdkit_atom_list
(mol, key, method, args=None)[source]¶ Make a list of atoms with atomic information from rdkit.mol.
- Parameters
- Returns
Atomlist that can be used in a networkx graph of shape [(i, {key: property})]
- Return type
-
molreps.methods.mol_rdkit.
rdkit_bond_distance_list
(mol, key, conf_selection=0, methods='ETKDG', seed=61453, max_distance=inf, max_partners=inf, bonds_only=True, exclusive=True)[source]¶ Generate bond list with distance information from rdkit.mol.
- Parameters
mol (rdkit.Chem.Mol) – Mol object to get information from.
key (str, optional) – Key of property to put in list.
conf_selection (int, optional) – Select a conformer. Defaults to 0.
methods (string, optional) – Method to generate conformer if none exist. Defaults to ‘ETKDG’.
seed (TYPE, optional) – Random seed. Defaults to 0xf00d.
max_distance (value, optional) – Maximum distance to allow bonds. Defaults to np.inf.
max_partners (value, optional) – Maximum nieghbour to allow bonds. Defaults to np.inf.
bonds_only (TYPE, optional) – Only take actual chemical bonds as bonds. Defaults to True.
- Returns
Bondlist that can be used in a networkx graph of shape [(i,j, {key: distance})].
- Return type
-
molreps.methods.mol_rdkit.
rdkit_bond_list
(mol, key, method, args=None)[source]¶ Make a list of bonds with bond-type information from rdkit.mol.
- Parameters
mol (rdkit.Chem.Mol) – Mol object to get information from.
key (str, optional) – Key of property to put in list.
method (func) – Class member method for rdkit.Chem.rdchem.Bond.
args (dict, optional) – Optinal arguments for class method. Defaults to {}.
trafo (func) – Casting or trafo funciton. Default is int.
- Returns
Bondlist that can be used in a networkx graph of shape [(i,j, {key: property})]
- Return type
-
molreps.methods.mol_rdkit.
rdkit_make_mol_from_structure
(atoms, bondlist, coordinates=None)[source]¶ Make a mol object with
- Parameters
- Returns
Mol Object with added conformer.
- Return type
rdkit.Chem.Mol
molreps.methods.props_py module¶
Functions using only python.
Note: All functions are supposed to work out of the box without any dependencies, i.e. do not depend on each other.