CrystalFp: the crystal fingerprinting project

CrystalFp started as a way to solve a problem with the USPEX crystal structure predictor. Every USPEX run produces hundreds or thousands of crystal structures, some of which may be identical. To ease the extraction of unique and potentially interesting structures a method to find and remove duplicated structures has to be found.

[CrystalFp end user application]

The approach adopted was to apply usual high-dimensional classification concepts to the unusual field of crystallography.

We adopted a visual design and validation method to develop a classifier library (CrystalFp) and an end-user application to select and validate method choices, to gain users' acceptance and to tap into their domain expertise.

Using the end-user application with real datasets, we experimented with various crystal structure descriptors, distinct distance measures and tried different clustering methods to identify groups of similar structures. These methods are already applied in combinatorial chemistry to organic molecules for a different goal and in somewhat different forms, but are not widely used for crystal structures classification.

The use of the classifier has already accelerated the analysis of USPEX output by at least one order of magnitude, promoting some new crystallographic insight and discovery. Furthermore the visual display of key algorithm indicators has led to diverse, unexpected discoveries that will improve the USPEX algorithms.

Resources

Here are some resources about the CrystalFp crystal classification project. Note that the project is still work-in-progress because we are still exploring the implications and results from the analysis done so far.

  • First of all the people behind the project: myself and Prof. Artem R. Oganov of the Dept. of Geosciences and New York Center for Computational Science, State University of New York at Stony Brook. Contact us if you want more information on the project.
  • The paper and presentation given at the IEEE VAST 2008 conference.
  • The STM4 chemistry visualization toolkit. It is the base on which the CrystalFp user application is built.
  • The library code as zip or compressed tar and a simple usage example (this example, a little expanded is contained also in the source code with the name main.cxx).
  • The CrystalFp library documentation.
  • Summary of the datasets used for testing and the corresponding correlation measure curves.
  • As a service to the CSCS users, I have compiled a short report: A Look at High Dimensional Spaces – Can They Be Useful In My Research? It collects references and peculiarities of high dimensional spaces.

Publications

[1] A. R. Oganov and M. Valle, How to quantify energy landscapes of solids, The Journal of Chemical Physics, vol. 130, p. 104504, Mar. 14 2009. [ bib ]
[2] M. Valle and A. R. Oganov, Crystal Structures Classifier for an Evolutionary Algorithm Structure Predictor, in Proceedings IEEE VAST 2008, Oct. 19 - 24 2008. (One of the paper images has been selected for the proceedings' cover). [ bib | html | conference site ]
[3] A. R. Oganov, M. Valle, A. Lyakhov, Y. Ma, and Y. Xie, Evolutionary crystal structure prediction and its applications to materials at extreme conditions, in Proceedings IUCr2008, Aug. 23 - 31 2008. [ bib | html | conference site ]
[4] A. R. Oganov, Y. Ma, C. W. Glass, and M. Valle, Evolutionary crystal structure prediction: overview of the USPEX method and some of its applications, Psi-k Newsletter, vol. 84, pp. 1-10, Dec. 2007. [ bib | pdf ]