카테고리 없음

[I2S]DeepSMILES

ML.chang 2020. 8. 26. 15:57

https://github.com/baoilleach/deepsmiles

 

baoilleach/deepsmiles

DeepSMILES - A variant of SMILES for use in machine-learning - baoilleach/deepsmiles

github.com

DeepSMILES

This Python module can convert well-formed SMILES (that is, as written by a cheminformatics toolkit) to DeepSMILES. It also does the reverse conversion.

 

Install the latest version with:

 

pip install --upgrade deepsmiles

 

DeepSMILES is a SMILES-like syntax suited to machine learning. Rings are indicated using a single symbol instead of two, while branches do not use matching parentheses but rather use a right parenthesis as a 'pop' operator.

 

For example, benzene is c1ccccc1 in SMILES but cccccc6 in DeepSMILES (where the 6 indicates the ring size). As a branch example, the SMILES C(Br)(OC)I can be converted to the DeepSMILES CBr)OC))I. For more information, please see the corresponding preprint (https://doi.org/10.26434/chemrxiv.7097960.v1) or the lightning talk at https://www.slideshare.net/NextMoveSoftware/deepsmiles.

The library is used as follows:

 

Abstract
Background
There has been increasing interest in the use of deep neural networks for de novo design of molecules with desired properties. A common approach is to train a generative model on SMILES
strings and then use this to generate SMILES strings for molecules with a desired property.

Unfortunately, these SMILES strings are often not syntactically valid due to elements of SMILES syntax that must occur in pairs.

Results
We describe a SMILES-ike syntax called DeepSMILES that addresses two of the main reasons for invalid syntax when using a probabilistic model to generate SMILES strings. The DeepSMILES syntax avoids the problem of unbalanced parentheses by only using close parentheses, where the number of parentheses indicates the branch length. In addition, DeepSMILES avoids the problem of pairing ring closure symbols by using only a single symbol at the ring closing location, where the symbol indicates the ring size. We show that this syntax can be interconverted to/from SMILES with string processing without any loss of information, including stereo configuration.

Conclusion
We believe that DeepSMILES will be useful, not just for those using SMILES in deep neural networks,
but also for other computational methods that use SMILES as the basis for generating molecular
structures such as genetic algorithms.

 

요약
배경
원하는 특성을 가진 분자의 새로운 디자인을위한 심층 신경망 사용에 대한 관심이 증가하고 있습니다. 일반적인 접근 방식은 SMILES에서 생성 모델을 학습하는 것입니다.
문자열을 사용하여 원하는 속성을 가진 분자에 대한 SMILES 문자열을 생성합니다.

불행히도 이러한 SMILES 문자열은 쌍으로 발생해야하는 SMILES 구문의 요소로 인해 구문 적으로 유효하지 않은 경우가 많습니다.

결과
확률 모델을 사용하여 SMILES 문자열을 생성 할 때 잘못된 구문에 대한 두 가지 주요 이유를 해결하는 DeepSMILES라는 SMILES 유사 구문을 설명합니다.
DeepSMILES 구문은 닫는 괄호 만 사용하여 불균형 괄호 문제를 방지합니다. 여기서 괄호의 수는 분기 길이를 나타냅니다.
또한 DeepSMILES는 링 닫힘 위치에서 단일 기호 만 사용하여 링 닫힘 기호를 페어링하는 문제를 방지합니다. 여기서 기호는 링 크기를 나타냅니다. 이 구문은 스테레오 구성을 포함하여 정보 손실없이 문자열 처리를 통해 SMILES와 상호 변환 할 수 있음을 보여줍니다.

결론
DeepSMILES가 유용 할 것이라고 믿습니다.
심층 신경망에서 SMILES를 사용하는 사람들뿐만 아니라 유전 알고리즘과 같은 분자 구조를 생성하기위한 기초로 SMILES를 사용하는 다른 계산 방법에도 적용됩니다.