DNA storage technology, creating a new method of yin and yang dual coding

thumbnail

arts/

At present, information is being produced at an increasingly fast speed, and with it, the problem of how to effectively store data - such as magnetic disks, hard disks, flash memory and other traditional storage media such as magnetic or optical has gradually been unable to meet the needs of the world. In-scope data storage needs. The DNA molecule is becoming a practical new information storage medium due to its stability, high storage density and low maintenance cost.

From the principle of storing information, each information is actually a sequence (binary composition), which may be 0 and 1. Whether it is text or song, it can be stored in this form. DNA is actually a sequence. DNA is a combination of several different bases of ATCG.

Based on this, people can assign a value to each letter, for example, A is 00, C is 01, so that the code of this DNA can be described by binary, of course, synthetic chemistry technology can also do this. If people want to read DNA information, they can put it on a sequencer, and read the stored data through the sequencer .

However, for this process, the encoding and decoding of DNA storage has always been limited. Before 2017, the encoding and decoding technologies failed to achieve complete technical compatibility, and the GC content of the generated sequences largely depended on the 0/1 distribution of the original data. In 2017, the DNA fountain code developed by the research team of Columbia University in the United States almost solved this problem, but the directly applied channel coding technology has a strong data type preference, so there is a high data unrecoverable in practical storage applications. question of risk.

In order to solve this problem, the research team from Shenzhen BGI Life Sciences Research Institute was inspired by the DNA double-stranded model, combined with the idea of ​​"yin and yang" in Chinese culture, the research team cleverly applied the DNA encoding and decoding system to Two sets of different rules are used to compile and convert two pieces of binary information "one-to-one", and then take the part of the unified intersection of the two as the final solution, to realize the combination of two independent pieces of information into a string of DNA sequences.

At the same time, the researchers introduced a screening mechanism to filter sequences that are not compatible with existing sequencing-by-synthesis technologies through preset screening conditions. According to different combination methods, the system can provide a total of 1536 different coding rule combinations, which greatly expands the scope of its application scenarios.

The researchers also used the theoretical derivation of coding and the simulated coding of files of different data types to prove that the system has a significant performance improvement in data recovery stability under the premise of ensuring information density. The average recovery rate of stored data is higher than The current level of DNA fountain codes has improved by nearly two orders of magnitude.

The research team tested the data recovery stability of the system after storage and passage in yeast cells. The results show that the information of the yeast strain as a carrier can still be completely recovered after more than 1,000 generations. This storage method is close to the theoretical limit of the physical information density of natural DNA molecules. The amount of information that can be stored per gram of DNA is about 432.2EB.

Undoubtedly, in recent years, with the rapid development of synthetic biology, represented by high-throughput DNA synthesis technology and artificial chromosome synthesis, it marks that human's ability to design, synthesize, edit and read DNA has entered a new stage. era, and every technological update will play a positive role in the research on new mediums for long-term storage of massive data.

Related Posts