摘要: |
GenoCAD(www.genocad.com)是一种基于Web的免费合成生物学设计软件,用它可以进行表达载体及人工基因网络设计。持续点击代表各种合成生物学标准“零件”的图标,以一种语法进行设计,最后就可以得到由数十个功能片段组成的复杂质粒载体。但是在GenoCAD中,每一类的合成生物学标准“零件”数量众多。随着这些标准“零件”的不断开发,其数量也在进一步增加,目前选择合适的“零件”组装成功能性的质粒载体费时费力并且容易发生错误。在进行载体设计的最后阶段,从众多的“零件”中选择合适的往往比较困难。为解决这一问题,本文采用了自然语言处理的统计语言模型,它最初用于自然语言识别,用来估算一组词串成为一个正确语句的概率的大小。本文最后以该模型为基础应用动态规划算法优化质粒载体设计,从众多的选项中找出最优者。利用这一方法可以减少进行生物学实验的冗余操作,从而减少载体构建过程中的花费。 |
关键词: 合成生物学 统计语言模型 动态规划算法 生物学“零件” GenoCAD |
DOI:10.3969/j.issn.1672-5565.2016.03.08 |
分类号:Q291 |
文献标识码:A |
基金项目:国家自然科学基金项目(No.61173113)。 |
|
Dynamic programming optimization to GenoCAD design |
FANG Gang1,2
|
(1.School of Biological and Environmental Engineering, Xi’an University, Xi’an 710065 ,China;2.School of Information, Xi’an University of Finance and Economics, Xi’an 710100,China)
|
Abstract: |
GenoCAD (www.genocad.com) is a free web-based application that guides the user to design protein expression vector, artificial gene networks and other genetic constructs composed of genetic parts. By successively click icons representing actual genetic parts according to grammatical models, complex genetic constructs composed of dozens of functional blocks can be designed. But at the last step of design, usually every icon representing genetic parts has its option. With the increasing of genetic parts database, more and more parts were imported into GenoCAD library. The process of assembling more than a few of sets of genetic parts can be costly, time consuming and error prone. At the last step of design it is somewhat difficult to make decision which part should be selected. Based on statistical language model, which is a probability distribution P(s) over strings S that attempts to reflect how frequently a string S occurs as a sentence, the most commonly used parts will be selected. Then a dynamic programming algorithm was designed to solve the problem. The algorithm optimizes the results of GenoCAD design and finds an optimal solution. In this way, redundant operations can be reduced and the time and cost required for conducting biological experiment can be minimized. |
Key words: Synthetic biology Statistical language model Dynamic programming BioBrick GenoCAD |