DNA- and RNA-Based Computing Systems. Группа авторов
in problem size leads to a significant bottleneck in scaling the existing DNA computing procedures for large size problem formulations. This is primarily because the amount of DNA required increases exponentially with size even though the number of biochemical steps required increases with a polynomial function. Further, there is a significant compounding of experimental errors involved, which makes these procedures redundant for solving the bigger size formulations. Also, the real‐life problems often have continuous search spaces and multiple optimal solutions, whereas the existing DNA computing procedures are mostly developed for discrete search spaces involving a single optimal solution.
2.2 Key Developments in DNA Computing
In 1994, Adleman [2] used the DNA for computation in the first‐ever molecular experiment performed to solve the HPP. Molecular biology experiments were performed to address the instance graph having seven vertices and fourteen edges. A year later, Lipton [3] presented a new model for solving another NP‐complete (nondeterministic polynomial time complete) problem known as satisfiability (SAT) problem using a DNA computer. SAT problem is also solved by Smith et al. [4] and Liu et al. [6] with new surface‐based DNA computing models.
Along with these models, other researchers [7,11,12] also solved NP‐complete problems with a different approach. Sakamoto et al. [5] used DNA hairpin formation to solve the SAT problem. Chao et al. [13] developed a single‐molecule “DNA navigator” to solve a maze. These models of DNA computing are discussed in the next section.
2.2.1 Adleman Model
Adleman's DNA computer [2] solved a small instance of a hard computational problem, the HPP. For a given graph, this problem asks for a path with a fixed start and end vertices that visits every vertex exactly once (Figure 2.1a). In his representation, each vertex is encoded using 20 bp nucleotides. Edges are encoded as 20‐mers, with the first 10 nucleotides complementary to the last 10 nucleotides of the start vertex and the last 10 complementary to the first 10 of the end vertex (Figure 2.1b). By mixing and ligating oligonucleotides corresponding to vertices and edges, concatenates are formed that represent paths through the network (Figure 2.1c). A path through the graph involves all vertices such that each vertex represented only once is a correct solution to the problem and is referred to as the Hamiltonian path. However, the random nature of ligations leads to the formation of other incorrect paths that do not meet the required condition of the Hamiltonian path. Therefore, several biochemical steps are required (Figure 2.1d,e) to extract the correct path from the set of all paths generated in the ligation process. First, such biochemical step is the PCR, which amplifies only those paths that start and end with right vertices. In the second step, only the paths of correct length are extracted using gel electrophoresis. Finally, the presence of every vertex sequence is confirmed using affinity separation. If any DNA remains after the last separation, this must correspond to a Hamiltonian path.
Figure 2.1 Adleman's DNA computing procedure [2]. For (a) given super graph, vertices and edges are encoded using the (b) encoding strategy, and (c) mixed to generate all possible paths through the graph using ligation. The correct length path is selected using (d) gel electrophoresis. All correct length paths are further screened to confirm the presence of sequence corresponding to each vertex one by one starting from first to the last vertex using (e) affinity chromatography.
Consider a supergraph shown in Figure 2.1a involving five vertices. The objective is to obtain a path that starts from vertex 1 and ends at vertex 5 such that each vertex is represented only once. Initially, the DNA sequences of 20 bp are designed for each vertex and edge such that these should lead to linear DNA. For starting and ending edge, the complementary sequence of all 20 bp of starting and ending vertex is used instead of just 10 complementary base pairs. All encoded single‐stranded DNA (ssDNA) for each vertex and edge are mixed together to form double‐stranded DNA (dsDNA) representing all possible paths in the given supergraph in the ligation step. Since each vertex must be visited exactly once in the Hamiltonian path, a graph of N (= 5) vertices having each vertex encoded with L (= 20 bp) nucleotides must comprise total N × L (= 100 bp) nucleotides. Therefore, all dsDNA molecules of different lengths obtained after the ligation step are separated using gel electrophoresis. In this step, the gel slice corresponding to the band of the desired length (= 100 bp) is then separated from the gel by a cutter as the correct DNA sequence corresponding to an optimal solution is expected to be present only in this slice. Further, this band may comprise some undesired solutions with the correct length of 100 bp. The DNA solutions are extracted from the gel slice and are amplified using PCR to generate enough number of desired sequences of DNA representing the solution to the given HPP beginning with one and ending with five. Subsequently, the DNA solution undergoes affinity separation process using streptavidin–biotin magnetic beads to check the presence of vertices 1–5 sequentially one after another in tubes 1–5. The PCR products of these tubes are then analyzed by gel electrophoresis where the bands of respective lengths are obtained, signifying the location of these vertices in the entire sequence. If these tubes give the bands of 20, 40, 60, 80, and 100 bp, then it confirms that all N (= 5) vertex are visited, and depending on the location of the primer, the location of the vertex in the entire sequence is also determined. For a given example, the Hamiltonian path is 1–3–4–2–5, which corresponds to the bands of 20, 40, 60, 80, and 100 bp, respectively, on the gel image.
2.2.2 Lipton's Model
Lipton [3] solved the SAT problem with a linear complexity of biomolecular operations. In SAT problem the operators logical OR (∨), logical AND (∧), and logical NOT (¬) along with the variables (x1, x2, x3, …, xn) constitute a Boolean formula, e.g. (x1∨x2) ∧ (¬x2∨x3). In this formula, a literal can be either a variable or the negation of the variable, e.g. x1 or ¬ x1. The clause (Ci) is a disjunction of literals, e.g. (x1∨x2). Finally, a Boolean formula is a conjunction of clauses [e.g. (x1∨x2) ∧ (¬x2∨x3)] such as C1 ∧ C2 ∧ C3 ∧ …. Cm for m clauses. In a SAT problem, the objective is to confirm whether there exists a set of input conditions (i.e. the input variables, x1, x2, x3, …, xn ∈ [0, 1]) for which a given Boolean formula is satisfiable, i.e. the output is 1.
To solve the SAT problem, first, the variables are arranged in a string as x1 x2 x3 … xn. In Lipton's model, this variable string, x1 x2 x3 … xn, is represented in a graph form. Since the variables can have 0 or 1 values, the vertices with no bars (e.g. x1, x2, etc.) represent the 1 value, whereas those with bars (e.g.