matador.fingerprints package

The fingerprints module provides functionality for calculating, in parallel, structural fingerprints like PDF and PXRD.

matador.fingerprints.get_uniq_cursor(cursor, sim_tol=0.1, energy_tol=0.01, enforce_same_stoich=True, fingerprint=<class 'matador.fingerprints.pdf.PDF'>, hierarchy_order=None, hierarchy_values=None, debug=False, **fingerprint_calc_args) Tuple[List[int], Dict[int, int], List[matador.fingerprints.fingerprint.Fingerprint], numpy.ndarray][source]

Uses fingerprint to filter cursor into unique structures to some tolerance sim_tol, additionally returning a dict of duplicates and the correlation matrix.

The choice of which of the dulpicates is kept in the unique cursor is defined by the “hierarchy”. By default, this will guess the provenance of a document and prefer structures from “primary sources”, i.e. ICSD -> OQMD -> Materials Project -> SWAPS -> AIRSS -> GA. A custom hiearchy can be provided through hierarchy_order, which must be accompanied by a list of values per structure to check against that hierarchy.

Parameters

cursor (list) – matador cursor to be filtered

Keyword Arguments
  • fingerprint (Fingerprint) – fingerprint object type to compare (DEFAULT: PDF)

  • sim_tol (float/bool) – tolerance in similarity distance for duplicates (if True, default value of 0.1 used)

  • energy_tol (float) – compare only structures within a certain energy tolerance (1e20 if enforce_same_stoich is False)

  • enforce_same_stoich (bool) – compare only structures of the same stoichiometry

  • debug (bool) – print timings and list similarities

  • fingerprint_calc_args (dict) – kwargs to pass to fingerprint

Returns

ordered list of indices of unique documents, a dict with keys from distinct_set, a list of Fingerprint objects, and the sparse correlation matrix of pairwise similarity distances

class matador.fingerprints.Fingerprint(doc, lazy=True, *args, **kwargs)[source]

Bases: abc.ABC

fingerprint = None
default_key = None
abstract calculate()[source]
class matador.fingerprints.FingerprintFactory(cursor, required_inds=None, debug=False, **fprint_args)[source]

Bases: abc.ABC

This class computes Fingerprint objects from a list of structures, using multiprocessing to perform calculations concurrently. The computed fingerprints are stored in each structure’s dictionary under the default key defined by the Fingerprint objects.

Note

The number of processes used to concurrency is set by the following hierarchy: $SLURM_NTASKS -> $OMP_NUM_THREADS -> psutil.cpu_count(logical=False).

nprocs

number of concurrent processes to be used.

Type

int

Compute PDFs over n processes, where n is set by either $SLURM_NTASKS, $OMP_NUM_THREADS or physical core count.

Parameters
  • cursor (list of dict) – list of matador structures

  • fingerprint (Fingerprint) – class to compute for each structure

Keyword Arguments
  • pdf_args (dict) – arguments to pass to the fingerprint calculator

  • required_inds (list(int)) – indices in cursor to skip.

fingerprint = None
default_key = None
class matador.fingerprints.PDF(doc, lazy=False, **kwargs)[source]

Bases: matador.fingerprints.fingerprint.Fingerprint

This class implements the calculation and comparison of pair distribution functions.

r_space

1-D array containing real space grid

Type

ndarray

gr

1-D array containing total PDF

Type

ndarray

dr

real-space grid spacing in Å

Type

float

rmax

extent of real-space grid in Å

Type

float

label

structure label

Type

str

elem_gr

dict with pairs of element symbol keys, containing 1-D arrays of projected PDFs (if calculated)

Type

dict

number_density

number density for renormalisation and comparison with other PDFs

Type

float

kwargs

arguments used to create PDF

Type

dict

Initialise parameters and run PDF (unless lazy=True).

Parameters

doc (dict) – matador document to calculate PDF of

Keyword Arguments
  • dr (float) – bin width for PDF (Angstrom) (DEFAULT: 0.01)

  • gaussian_width (float) – width of Gaussian smearing (Angstrom) (DEFAULT: 0.01)

  • num_images (int/str) – number of unit cell images include in PDF calculation (DEFAULT: ‘auto’)

  • max_num_images (int) – cutoff number of unit cells before crashing (DEFAULT: 50)

  • rmax (float) – maximum distance cutoff for PDF (Angstrom) (DEFAULT: 15)

  • projected (bool) – optionally calculate the element-projected PDF

  • standardize (bool) – standardize cell before calculating PDF

  • lazy (bool) – if True, calculator is not called when initializing PDF object

  • timing (bool) – if True, print the total time taken to calculate the PDF

calc_pdf()[source]

Wrapper to calculate PDF with current settings.

calculate()[source]

Alias for self.calc_pdf.

get_sim_distance(pdf_b, projected=False)[source]

Return the similarity between two PDFs.

pdf()[source]

Return G(r) and the r_space for easy plotting.

plot_projected_pdf(**kwargs)[source]

Plot projected PDFs.

Keyword Arguments
  • keys (list) – plot only a subset of projections, e.g. [(‘K’, )].

  • other_pdfs (list of PDF) – other PDFs to plot.

plot(**kwargs)[source]

Plot PDFs.

Keyword Arguments

other_pdfs (list of PDF) – other PDFs to add to the plot.

class matador.fingerprints.PDFOverlap(pdf_a, pdf_b, projected=False)[source]

Bases: object

Calculate the PDFOverlap between two PDF objects, pdf_a and pdf_b, with number density rescaling.

pdf_a

first PDF to compare.

Type

PDF

pdf_b

second PDF to compare.

Type

PDF

fine_dr

fine grid scale on which to compare.

Type

float

similarity_distance

“distance” between PDFs.

Type

float

overlap_int

the value of the overlap integral.

Type

float

Perform the overlap and similarity distance calculations.

Parameters
  • pdf_a (PDF) – first PDF to compare.

  • pdf_b (PDF) – second PDF to compare.

Keyword Arguments

projected – if True, attempt to use projected PDFs.

pdf_overlap()[source]

Calculate the overlap of two PDFs via a simple meshed sum of their difference.

projected_pdf_overlap()[source]

Calculate the overlap of two projected PDFs via a simple meshed sum of their difference.

plot_diff()[source]

Simple plot for comparing two PDFs.

plot_projected_diff()[source]

Simple plot for comparing two projected PDFs.

class matador.fingerprints.CombinedProjectedPDF(pdf_cursor)[source]

Bases: object

Take some computed PDFs and add them together.

Create CombinedPDF object from list of PDFs.

Parameters

pdf_cursor (list of PDF) – list of PDF objects to combine.

plot_projected_pdf()[source]

Plot the combined PDF.

class matador.fingerprints.PXRD(doc, wavelength: float = 1.5406, lorentzian_width: float = 0.03, two_theta_resolution: float = 0.01, two_theta_bounds: Tuple[float, float] = (0, 90), theta_m: float = 0.0, scattering_factors: str = 'RASPA', lazy=False, plot=False, progress=False, *args, **kwargs)[source]

Bases: matador.fingerprints.fingerprint.Fingerprint

This class for computes powder X-ray diffraction patterns of a given crystal for a certain incident wavelength. The cell is standardised with spglib before computing PXRD.

This calculation takes into account atomic scattering factors, Lorentz polarisation and thermal broadening (with Debye-Waller factors set to 1). Note: this class does not perform any q-dependent peak broadening, and instead uses a simple Lorentzian broadening. The default width of 0.03 provides good agreement with e.g. GSAS-II’s default CuKa setup. Only one wavelength can be used at a time, but multiple patterns could be combined post hoc.

self.peak_positions

sorted peak positions as values in 2θ

Type

numpy.ndarray

self.hkls

Miller indices correspnding to peaks, sorted by peak angle.

Type

numpy.ndarray

self.peak_intensities

intensity of each peak.

Type

numpy.ndarray

self.pattern

Lorentzian-broadened pattern at values of self.two_thetas.

Type

numpy.ndarray

self.two_thetas

continuous space of 2θ values corresponding to sample points of self.pattern.

Type

numpy.ndarray

Set up the PXRD, and compute it, if lazy is False.

Parameters

doc (dict/Crystal) – matador document to compute PXRD for.

Keyword Arguments
  • lorentzian_width (float) – width of Lorentzians for broadening (DEFAULT: 0.03)

  • wavelength (float) – incident X-ray wavelength (DEFAULT: CuKa, 1.5406).

  • theta_m (float) – the monochromator angle in degrees (DEFAULT: 0)

  • two_theta_resolution (float) – resolution of grid 2θ used for plotting.

  • two_theta_bounds (tuple of float) – values between which to compute the PXRD pattern.

  • scattering_factors (str) – either “GSAS” or “RASPA” (default), which set of atomic scattering factors to use.

  • lazy (bool) – whether to compute PXRD or just set it up.

  • plot (bool) – whether to display PXRD as a plot.

calc_pxrd()[source]

Calculate the PXRD pattern.

calculate()[source]

Alias for calculating the PXRD pattern.

atomic_scattering_factor(q_mag, species)[source]

Return fit for particular atom at given q-vector.

Parameters
  • q_mag (float) – magnitude of the q_vector.

  • species (str) – the element label.

Returns

the atomic scattering factor.

Return type

float

plot(**kwargs)[source]

Wrapper function to plot the PXRD pattern.

save_pattern(fname)[source]

Write a file to fname that contains the xy coordinates of the PXRD pattern.

save_peaks(fname)[source]

Write a file to fname that contains the peak list.

Submodules

matador.fingerprints.fingerprint module

This file implements the base class for all “fingerprints”, which here refers to any object derived from purely structural features of a crystal, e.g. pair distribution functions (PDF) or simulated powder X-ray diffraction (PXRD) spectra.

class matador.fingerprints.fingerprint.Fingerprint(doc, lazy=True, *args, **kwargs)[source]

Bases: abc.ABC

fingerprint = None
default_key = None
abstract calculate()[source]
class matador.fingerprints.fingerprint.FingerprintFactory(cursor, required_inds=None, debug=False, **fprint_args)[source]

Bases: abc.ABC

This class computes Fingerprint objects from a list of structures, using multiprocessing to perform calculations concurrently. The computed fingerprints are stored in each structure’s dictionary under the default key defined by the Fingerprint objects.

Note

The number of processes used to concurrency is set by the following hierarchy: $SLURM_NTASKS -> $OMP_NUM_THREADS -> psutil.cpu_count(logical=False).

nprocs

number of concurrent processes to be used.

Type

int

Compute PDFs over n processes, where n is set by either $SLURM_NTASKS, $OMP_NUM_THREADS or physical core count.

Parameters
  • cursor (list of dict) – list of matador structures

  • fingerprint (Fingerprint) – class to compute for each structure

Keyword Arguments
  • pdf_args (dict) – arguments to pass to the fingerprint calculator

  • required_inds (list(int)) – indices in cursor to skip.

fingerprint = None
default_key = None

matador.fingerprints.pdf module

This submodule defines classes for computing, combining and convolving pair distribution functions.

class matador.fingerprints.pdf.PDF(doc, lazy=False, **kwargs)[source]

Bases: matador.fingerprints.fingerprint.Fingerprint

This class implements the calculation and comparison of pair distribution functions.

r_space

1-D array containing real space grid

Type

ndarray

gr

1-D array containing total PDF

Type

ndarray

dr

real-space grid spacing in Å

Type

float

rmax

extent of real-space grid in Å

Type

float

label

structure label

Type

str

elem_gr

dict with pairs of element symbol keys, containing 1-D arrays of projected PDFs (if calculated)

Type

dict

number_density

number density for renormalisation and comparison with other PDFs

Type

float

kwargs

arguments used to create PDF

Type

dict

Initialise parameters and run PDF (unless lazy=True).

Parameters

doc (dict) – matador document to calculate PDF of

Keyword Arguments
  • dr (float) – bin width for PDF (Angstrom) (DEFAULT: 0.01)

  • gaussian_width (float) – width of Gaussian smearing (Angstrom) (DEFAULT: 0.01)

  • num_images (int/str) – number of unit cell images include in PDF calculation (DEFAULT: ‘auto’)

  • max_num_images (int) – cutoff number of unit cells before crashing (DEFAULT: 50)

  • rmax (float) – maximum distance cutoff for PDF (Angstrom) (DEFAULT: 15)

  • projected (bool) – optionally calculate the element-projected PDF

  • standardize (bool) – standardize cell before calculating PDF

  • lazy (bool) – if True, calculator is not called when initializing PDF object

  • timing (bool) – if True, print the total time taken to calculate the PDF

calc_pdf()[source]

Wrapper to calculate PDF with current settings.

calculate()[source]

Alias for self.calc_pdf.

get_sim_distance(pdf_b, projected=False)[source]

Return the similarity between two PDFs.

pdf()[source]

Return G(r) and the r_space for easy plotting.

plot_projected_pdf(**kwargs)[source]

Plot projected PDFs.

Keyword Arguments
  • keys (list) – plot only a subset of projections, e.g. [(‘K’, )].

  • other_pdfs (list of PDF) – other PDFs to plot.

plot(**kwargs)[source]

Plot PDFs.

Keyword Arguments

other_pdfs (list of PDF) – other PDFs to add to the plot.

class matador.fingerprints.pdf.PDFFactory(cursor, required_inds=None, debug=False, **fprint_args)[source]

Bases: matador.fingerprints.fingerprint.FingerprintFactory

This class computes PDF objects from a list of structures, as concurrently as possible. The PDFs are stored under the pdf key inside each structure dict.

nprocs

number of concurrent processes.

Type

int

Compute PDFs over n processes, where n is set by either $SLURM_NTASKS, $OMP_NUM_THREADS or physical core count.

Parameters
  • cursor (list of dict) – list of matador structures

  • fingerprint (Fingerprint) – class to compute for each structure

Keyword Arguments
  • pdf_args (dict) – arguments to pass to the fingerprint calculator

  • required_inds (list(int)) – indices in cursor to skip.

default_key = 'pdf'
fingerprint

alias of matador.fingerprints.pdf.PDF

class matador.fingerprints.pdf.PDFOverlap(pdf_a, pdf_b, projected=False)[source]

Bases: object

Calculate the PDFOverlap between two PDF objects, pdf_a and pdf_b, with number density rescaling.

pdf_a

first PDF to compare.

Type

PDF

pdf_b

second PDF to compare.

Type

PDF

fine_dr

fine grid scale on which to compare.

Type

float

similarity_distance

“distance” between PDFs.

Type

float

overlap_int

the value of the overlap integral.

Type

float

Perform the overlap and similarity distance calculations.

Parameters
  • pdf_a (PDF) – first PDF to compare.

  • pdf_b (PDF) – second PDF to compare.

Keyword Arguments

projected – if True, attempt to use projected PDFs.

pdf_overlap()[source]

Calculate the overlap of two PDFs via a simple meshed sum of their difference.

projected_pdf_overlap()[source]

Calculate the overlap of two projected PDFs via a simple meshed sum of their difference.

plot_diff()[source]

Simple plot for comparing two PDFs.

plot_projected_diff()[source]

Simple plot for comparing two projected PDFs.

class matador.fingerprints.pdf.CombinedProjectedPDF(pdf_cursor)[source]

Bases: object

Take some computed PDFs and add them together.

Create CombinedPDF object from list of PDFs.

Parameters

pdf_cursor (list of PDF) – list of PDF objects to combine.

plot_projected_pdf()[source]

Plot the combined PDF.

matador.fingerprints.pxrd module

This file implements the PXRD class for simulating powder XRD pattern of a crystal.

class matador.fingerprints.pxrd.PXRD(doc, wavelength: float = 1.5406, lorentzian_width: float = 0.03, two_theta_resolution: float = 0.01, two_theta_bounds: Tuple[float, float] = (0, 90), theta_m: float = 0.0, scattering_factors: str = 'RASPA', lazy=False, plot=False, progress=False, *args, **kwargs)[source]

Bases: matador.fingerprints.fingerprint.Fingerprint

This class for computes powder X-ray diffraction patterns of a given crystal for a certain incident wavelength. The cell is standardised with spglib before computing PXRD.

This calculation takes into account atomic scattering factors, Lorentz polarisation and thermal broadening (with Debye-Waller factors set to 1). Note: this class does not perform any q-dependent peak broadening, and instead uses a simple Lorentzian broadening. The default width of 0.03 provides good agreement with e.g. GSAS-II’s default CuKa setup. Only one wavelength can be used at a time, but multiple patterns could be combined post hoc.

self.peak_positions

sorted peak positions as values in 2θ

Type

numpy.ndarray

self.hkls

Miller indices correspnding to peaks, sorted by peak angle.

Type

numpy.ndarray

self.peak_intensities

intensity of each peak.

Type

numpy.ndarray

self.pattern

Lorentzian-broadened pattern at values of self.two_thetas.

Type

numpy.ndarray

self.two_thetas

continuous space of 2θ values corresponding to sample points of self.pattern.

Type

numpy.ndarray

Set up the PXRD, and compute it, if lazy is False.

Parameters

doc (dict/Crystal) – matador document to compute PXRD for.

Keyword Arguments
  • lorentzian_width (float) – width of Lorentzians for broadening (DEFAULT: 0.03)

  • wavelength (float) – incident X-ray wavelength (DEFAULT: CuKa, 1.5406).

  • theta_m (float) – the monochromator angle in degrees (DEFAULT: 0)

  • two_theta_resolution (float) – resolution of grid 2θ used for plotting.

  • two_theta_bounds (tuple of float) – values between which to compute the PXRD pattern.

  • scattering_factors (str) – either “GSAS” or “RASPA” (default), which set of atomic scattering factors to use.

  • lazy (bool) – whether to compute PXRD or just set it up.

  • plot (bool) – whether to display PXRD as a plot.

calc_pxrd()[source]

Calculate the PXRD pattern.

calculate()[source]

Alias for calculating the PXRD pattern.

atomic_scattering_factor(q_mag, species)[source]

Return fit for particular atom at given q-vector.

Parameters
  • q_mag (float) – magnitude of the q_vector.

  • species (str) – the element label.

Returns

the atomic scattering factor.

Return type

float

plot(**kwargs)[source]

Wrapper function to plot the PXRD pattern.

save_pattern(fname)[source]

Write a file to fname that contains the xy coordinates of the PXRD pattern.

save_peaks(fname)[source]

Write a file to fname that contains the peak list.

class matador.fingerprints.pxrd.PXRDFactory(cursor, required_inds=None, debug=False, **fprint_args)[source]

Bases: matador.fingerprints.fingerprint.FingerprintFactory

Compute PDFs over n processes, where n is set by either $SLURM_NTASKS, $OMP_NUM_THREADS or physical core count.

Parameters
  • cursor (list of dict) – list of matador structures

  • fingerprint (Fingerprint) – class to compute for each structure

Keyword Arguments
  • pdf_args (dict) – arguments to pass to the fingerprint calculator

  • required_inds (list(int)) – indices in cursor to skip.

fingerprint

alias of matador.fingerprints.pxrd.PXRD

default_key = 'pxrd'

matador.fingerprints.similarity module

This submodule implements filtering based on Fingerprint objects, although only PDF has been implemented so far.

matador.fingerprints.similarity.get_uniq_cursor(cursor, sim_tol=0.1, energy_tol=0.01, enforce_same_stoich=True, fingerprint=<class 'matador.fingerprints.pdf.PDF'>, hierarchy_order=None, hierarchy_values=None, debug=False, **fingerprint_calc_args) Tuple[List[int], Dict[int, int], List[matador.fingerprints.fingerprint.Fingerprint], numpy.ndarray][source]

Uses fingerprint to filter cursor into unique structures to some tolerance sim_tol, additionally returning a dict of duplicates and the correlation matrix.

The choice of which of the dulpicates is kept in the unique cursor is defined by the “hierarchy”. By default, this will guess the provenance of a document and prefer structures from “primary sources”, i.e. ICSD -> OQMD -> Materials Project -> SWAPS -> AIRSS -> GA. A custom hiearchy can be provided through hierarchy_order, which must be accompanied by a list of values per structure to check against that hierarchy.

Parameters

cursor (list) – matador cursor to be filtered

Keyword Arguments
  • fingerprint (Fingerprint) – fingerprint object type to compare (DEFAULT: PDF)

  • sim_tol (float/bool) – tolerance in similarity distance for duplicates (if True, default value of 0.1 used)

  • energy_tol (float) – compare only structures within a certain energy tolerance (1e20 if enforce_same_stoich is False)

  • enforce_same_stoich (bool) – compare only structures of the same stoichiometry

  • debug (bool) – print timings and list similarities

  • fingerprint_calc_args (dict) – kwargs to pass to fingerprint

Returns

ordered list of indices of unique documents, a dict with keys from distinct_set, a list of Fingerprint objects, and the sparse correlation matrix of pairwise similarity distances