matador.fingerprints package¶
The fingerprints module provides functionality for calculating, in parallel, structural fingerprints like PDF and PXRD.
- matador.fingerprints.get_uniq_cursor(cursor, sim_tol=0.1, energy_tol=0.01, enforce_same_stoich=True, fingerprint=<class 'matador.fingerprints.pdf.PDF'>, hierarchy_order=None, hierarchy_values=None, debug=False, **fingerprint_calc_args) Tuple[List[int], Dict[int, int], List[Fingerprint], ndarray] [source]¶
Uses fingerprint to filter cursor into unique structures to some tolerance sim_tol, additionally returning a dict of duplicates and the correlation matrix.
The choice of which of the dulpicates is kept in the unique cursor is defined by the “hierarchy”. By default, this will guess the provenance of a document and prefer structures from “primary sources”, i.e. ICSD -> OQMD -> Materials Project -> SWAPS -> AIRSS -> GA. A custom hiearchy can be provided through
hierarchy_order
, which must be accompanied by a list of values per structure to check against that hierarchy.- Parameters
cursor (list) – matador cursor to be filtered
- Keyword Arguments
fingerprint (Fingerprint) – fingerprint object type to compare (DEFAULT: PDF)
sim_tol (float/bool) – tolerance in similarity distance for duplicates (if True, default value of 0.1 used)
energy_tol (float) – compare only structures within a certain energy tolerance (1e20 if enforce_same_stoich is False)
enforce_same_stoich (bool) – compare only structures of the same stoichiometry
debug (bool) – print timings and list similarities
fingerprint_calc_args (dict) – kwargs to pass to fingerprint
- Returns
ordered list of indices of unique documents, a dict with keys from distinct_set, a list of Fingerprint objects, and the sparse correlation matrix of pairwise similarity distances
- class matador.fingerprints.Fingerprint(doc, lazy=True, *args, **kwargs)[source]¶
Bases:
ABC
- fingerprint = None¶
- default_key = None¶
- class matador.fingerprints.FingerprintFactory(cursor, required_inds=None, debug=False, **fprint_args)[source]¶
Bases:
ABC
This class computes Fingerprint objects from a list of structures, using multiprocessing to perform calculations concurrently. The computed fingerprints are stored in each structure’s dictionary under the default key defined by the Fingerprint objects.
Note
The number of processes used to concurrency is set by the following hierarchy:
$SLURM_NTASKS -> $OMP_NUM_THREADS -> psutil.cpu_count(logical=False)
.Compute PDFs over n processes, where n is set by either
$SLURM_NTASKS
,$OMP_NUM_THREADS
or physical core count.- Parameters
fingerprint (Fingerprint) – class to compute for each structure
- Keyword Arguments
- fingerprint = None¶
- default_key = None¶
- class matador.fingerprints.PDF(doc, lazy=False, **kwargs)[source]¶
Bases:
Fingerprint
This class implements the calculation and comparison of pair distribution functions.
- r_space¶
1-D array containing real space grid
- Type
ndarray
- gr¶
1-D array containing total PDF
- Type
ndarray
- elem_gr¶
dict with pairs of element symbol keys, containing 1-D arrays of projected PDFs (if calculated)
- Type
Initialise parameters and run PDF (unless lazy=True).
- Parameters
doc (dict) – matador document to calculate PDF of
- Keyword Arguments
dr (float) – bin width for PDF (Angstrom) (DEFAULT: 0.01)
gaussian_width (float) – width of Gaussian smearing (Angstrom) (DEFAULT: 0.01)
num_images (int/str) – number of unit cell images include in PDF calculation (DEFAULT: ‘auto’)
max_num_images (int) – cutoff number of unit cells before crashing (DEFAULT: 50)
rmax (float) – maximum distance cutoff for PDF (Angstrom) (DEFAULT: 15)
projected (bool) – optionally calculate the element-projected PDF
standardize (bool) – standardize cell before calculating PDF
lazy (bool) – if True, calculator is not called when initializing PDF object
timing (bool) – if True, print the total time taken to calculate the PDF
- class matador.fingerprints.PDFOverlap(pdf_a, pdf_b, projected=False)[source]¶
Bases:
object
Calculate the PDFOverlap between two PDF objects, pdf_a and pdf_b, with number density rescaling.
Perform the overlap and similarity distance calculations.
- Parameters
- Keyword Arguments
projected – if True, attempt to use projected PDFs.
- pdf_overlap()[source]¶
Calculate the overlap of two PDFs via a simple meshed sum of their difference.
- class matador.fingerprints.CombinedProjectedPDF(pdf_cursor)[source]¶
Bases:
object
Take some computed PDFs and add them together.
Create CombinedPDF object from list of PDFs.
- class matador.fingerprints.PXRD(doc, wavelength: float = 1.5406, lorentzian_width: float = 0.03, two_theta_resolution: float = 0.01, two_theta_bounds: Tuple[float, float] = (0, 90), theta_m: float = 0.0, scattering_factors: str = 'RASPA', lazy=False, plot=False, progress=False, *args, **kwargs)[source]¶
Bases:
Fingerprint
This class for computes powder X-ray diffraction patterns of a given crystal for a certain incident wavelength. The cell is standardised with spglib before computing PXRD.
This calculation takes into account atomic scattering factors, Lorentz polarisation and thermal broadening (with Debye-Waller factors set to 1). Note: this class does not perform any q-dependent peak broadening, and instead uses a simple Lorentzian broadening. The default width of 0.03 provides good agreement with e.g. GSAS-II’s default CuKa setup. Only one wavelength can be used at a time, but multiple patterns could be combined post hoc.
- self.peak_positions¶
sorted peak positions as values in 2θ
- Type
- self.hkls¶
Miller indices correspnding to peaks, sorted by peak angle.
- Type
- self.peak_intensities¶
intensity of each peak.
- Type
- self.pattern¶
Lorentzian-broadened pattern at values of self.two_thetas.
- Type
- self.two_thetas¶
continuous space of 2θ values corresponding to sample points of self.pattern.
- Type
Set up the PXRD, and compute it, if lazy is False.
- Parameters
doc (dict/Crystal) – matador document to compute PXRD for.
- Keyword Arguments
lorentzian_width (float) – width of Lorentzians for broadening (DEFAULT: 0.03)
wavelength (float) – incident X-ray wavelength in Å. (DEFAULT: CuKa, 1.5406 Å).
theta_m (float) – the monochromator angle in degrees (DEFAULT: 0)
two_theta_resolution (float) – resolution of grid 2θ used for plotting.
two_theta_bounds (tuple of float) – values between which to compute the PXRD pattern.
scattering_factors (str) – either “GSAS” or “RASPA” (default), which set of atomic scattering factors to use.
lazy (bool) – whether to compute PXRD or just set it up.
plot (bool) – whether to display PXRD as a plot.
Submodules¶
matador.fingerprints.fingerprint module¶
This file implements the base class for all “fingerprints”, which here refers to any object derived from purely structural features of a crystal, e.g. pair distribution functions (PDF) or simulated powder X-ray diffraction (PXRD) spectra.
- class matador.fingerprints.fingerprint.Fingerprint(doc, lazy=True, *args, **kwargs)[source]¶
Bases:
ABC
- fingerprint = None¶
- default_key = None¶
- class matador.fingerprints.fingerprint.FingerprintFactory(cursor, required_inds=None, debug=False, **fprint_args)[source]¶
Bases:
ABC
This class computes Fingerprint objects from a list of structures, using multiprocessing to perform calculations concurrently. The computed fingerprints are stored in each structure’s dictionary under the default key defined by the Fingerprint objects.
Note
The number of processes used to concurrency is set by the following hierarchy:
$SLURM_NTASKS -> $OMP_NUM_THREADS -> psutil.cpu_count(logical=False)
.Compute PDFs over n processes, where n is set by either
$SLURM_NTASKS
,$OMP_NUM_THREADS
or physical core count.- Parameters
fingerprint (Fingerprint) – class to compute for each structure
- Keyword Arguments
- fingerprint = None¶
- default_key = None¶
matador.fingerprints.pdf module¶
This submodule defines classes for computing, combining and convolving pair distribution functions.
- class matador.fingerprints.pdf.PDF(doc, lazy=False, **kwargs)[source]¶
Bases:
Fingerprint
This class implements the calculation and comparison of pair distribution functions.
- r_space¶
1-D array containing real space grid
- Type
ndarray
- gr¶
1-D array containing total PDF
- Type
ndarray
- elem_gr¶
dict with pairs of element symbol keys, containing 1-D arrays of projected PDFs (if calculated)
- Type
Initialise parameters and run PDF (unless lazy=True).
- Parameters
doc (dict) – matador document to calculate PDF of
- Keyword Arguments
dr (float) – bin width for PDF (Angstrom) (DEFAULT: 0.01)
gaussian_width (float) – width of Gaussian smearing (Angstrom) (DEFAULT: 0.01)
num_images (int/str) – number of unit cell images include in PDF calculation (DEFAULT: ‘auto’)
max_num_images (int) – cutoff number of unit cells before crashing (DEFAULT: 50)
rmax (float) – maximum distance cutoff for PDF (Angstrom) (DEFAULT: 15)
projected (bool) – optionally calculate the element-projected PDF
standardize (bool) – standardize cell before calculating PDF
lazy (bool) – if True, calculator is not called when initializing PDF object
timing (bool) – if True, print the total time taken to calculate the PDF
- class matador.fingerprints.pdf.PDFFactory(cursor, required_inds=None, debug=False, **fprint_args)[source]¶
Bases:
FingerprintFactory
This class computes PDF objects from a list of structures, as concurrently as possible. The PDFs are stored under the
pdf
key inside each structure dict.Compute PDFs over n processes, where n is set by either
$SLURM_NTASKS
,$OMP_NUM_THREADS
or physical core count.- Parameters
fingerprint (Fingerprint) – class to compute for each structure
- Keyword Arguments
- default_key = 'pdf'¶
- class matador.fingerprints.pdf.PDFOverlap(pdf_a, pdf_b, projected=False)[source]¶
Bases:
object
Calculate the PDFOverlap between two PDF objects, pdf_a and pdf_b, with number density rescaling.
Perform the overlap and similarity distance calculations.
- Parameters
- Keyword Arguments
projected – if True, attempt to use projected PDFs.
- pdf_overlap()[source]¶
Calculate the overlap of two PDFs via a simple meshed sum of their difference.
matador.fingerprints.pxrd module¶
This file implements the PXRD class for simulating powder XRD pattern of a crystal.
- class matador.fingerprints.pxrd.PXRD(doc, wavelength: float = 1.5406, lorentzian_width: float = 0.03, two_theta_resolution: float = 0.01, two_theta_bounds: Tuple[float, float] = (0, 90), theta_m: float = 0.0, scattering_factors: str = 'RASPA', lazy=False, plot=False, progress=False, *args, **kwargs)[source]¶
Bases:
Fingerprint
This class for computes powder X-ray diffraction patterns of a given crystal for a certain incident wavelength. The cell is standardised with spglib before computing PXRD.
This calculation takes into account atomic scattering factors, Lorentz polarisation and thermal broadening (with Debye-Waller factors set to 1). Note: this class does not perform any q-dependent peak broadening, and instead uses a simple Lorentzian broadening. The default width of 0.03 provides good agreement with e.g. GSAS-II’s default CuKa setup. Only one wavelength can be used at a time, but multiple patterns could be combined post hoc.
- self.peak_positions¶
sorted peak positions as values in 2θ
- Type
- self.hkls¶
Miller indices correspnding to peaks, sorted by peak angle.
- Type
- self.peak_intensities¶
intensity of each peak.
- Type
- self.pattern¶
Lorentzian-broadened pattern at values of self.two_thetas.
- Type
- self.two_thetas¶
continuous space of 2θ values corresponding to sample points of self.pattern.
- Type
Set up the PXRD, and compute it, if lazy is False.
- Parameters
doc (dict/Crystal) – matador document to compute PXRD for.
- Keyword Arguments
lorentzian_width (float) – width of Lorentzians for broadening (DEFAULT: 0.03)
wavelength (float) – incident X-ray wavelength in Å. (DEFAULT: CuKa, 1.5406 Å).
theta_m (float) – the monochromator angle in degrees (DEFAULT: 0)
two_theta_resolution (float) – resolution of grid 2θ used for plotting.
two_theta_bounds (tuple of float) – values between which to compute the PXRD pattern.
scattering_factors (str) – either “GSAS” or “RASPA” (default), which set of atomic scattering factors to use.
lazy (bool) – whether to compute PXRD or just set it up.
plot (bool) – whether to display PXRD as a plot.
- class matador.fingerprints.pxrd.PXRDFactory(cursor, required_inds=None, debug=False, **fprint_args)[source]¶
Bases:
FingerprintFactory
Compute PDFs over n processes, where n is set by either
$SLURM_NTASKS
,$OMP_NUM_THREADS
or physical core count.- Parameters
fingerprint (Fingerprint) – class to compute for each structure
- Keyword Arguments
- default_key = 'pxrd'¶
matador.fingerprints.similarity module¶
This submodule implements filtering based on Fingerprint objects, although only PDF has been implemented so far.
- matador.fingerprints.similarity.get_uniq_cursor(cursor, sim_tol=0.1, energy_tol=0.01, enforce_same_stoich=True, fingerprint=<class 'matador.fingerprints.pdf.PDF'>, hierarchy_order=None, hierarchy_values=None, debug=False, **fingerprint_calc_args) Tuple[List[int], Dict[int, int], List[Fingerprint], ndarray] [source]¶
Uses fingerprint to filter cursor into unique structures to some tolerance sim_tol, additionally returning a dict of duplicates and the correlation matrix.
The choice of which of the dulpicates is kept in the unique cursor is defined by the “hierarchy”. By default, this will guess the provenance of a document and prefer structures from “primary sources”, i.e. ICSD -> OQMD -> Materials Project -> SWAPS -> AIRSS -> GA. A custom hiearchy can be provided through
hierarchy_order
, which must be accompanied by a list of values per structure to check against that hierarchy.- Parameters
cursor (list) – matador cursor to be filtered
- Keyword Arguments
fingerprint (Fingerprint) – fingerprint object type to compare (DEFAULT: PDF)
sim_tol (float/bool) – tolerance in similarity distance for duplicates (if True, default value of 0.1 used)
energy_tol (float) – compare only structures within a certain energy tolerance (1e20 if enforce_same_stoich is False)
enforce_same_stoich (bool) – compare only structures of the same stoichiometry
debug (bool) – print timings and list similarities
fingerprint_calc_args (dict) – kwargs to pass to fingerprint
- Returns
ordered list of indices of unique documents, a dict with keys from distinct_set, a list of Fingerprint objects, and the sparse correlation matrix of pairwise similarity distances