首页 | 官方网站   微博 | 高级检索  
     

基于SAS的多元统计方法实现芯片数据挖掘
引用本文:黄晓韵,曹波,杨跃.基于SAS的多元统计方法实现芯片数据挖掘[J].生物信息学,2010,8(2):147-149.
作者姓名:黄晓韵  曹波  杨跃
作者单位:1. 北京大学临床肿瘤学院,北京肿瘤医院暨北京市肿瘤防治研究所胸外科,恶性肿瘤发病机制及转化研究教育部重点实验室,北京,100142
2. 北京大学医学部生物数学与生物统计教研室,北京,100191
基金项目:北京市教委科研基金资助 
摘    要:利用SAS软件对GEO的一个肺癌芯片实验进行挖掘。采用非参数检验,判别分析和回归分析对该芯片实验中14个核受体的表达信息进行分析。结果表明,在0.05显著性水平下,ER1、VDR、RARα和RORα四个基因在腺癌和鳞癌表达具有统计学差异;RARβ在复发组和非复发组表达有差异。判别分析结果显示VDR和RORα表达量可以对病理类型进行预测,但是总误判率很高(0.2389);RARβ和PPARα对判别是否复发的总误判率更高(0.3457)。建立回归方程预测病理类型,入选模型的变量也是VDR和RORα,两者OR分别为0.126和4.452。可见,基于SAS的多元统计方法是芯片数据挖掘的一种潜在方法,一旦芯片实验标准化,利用SAS对不同芯片实验数据整合分析的结论将有益于推动假说形成。

关 键 词:数据挖掘  芯片  SAS

Microarray data mining is achieved by multivariate statistics based on SAS
HUANG Xiao-yun,CAO Bo,YANG Yue.Microarray data mining is achieved by multivariate statistics based on SAS[J].China Journal of Bioinformation,2010,8(2):147-149.
Authors:HUANG Xiao-yun  CAO Bo  YANG Yue
Affiliation:1. Key laboratory of Carcinogenesis and Translational Research, Ministry of Education, Department of Thoracic Surgery, Peking University School of Ontology, Beijing Cancer Hospital & Institute, Belting 100142, China ;2. Department of Biomathematies and Biostatistics, Peking University Health Science Center,Beijing 100191, China)
Abstract:Multivariate statistics using SAS is applied to mine a dataset from GEO. Expression data of fourteen nuclear receptors in a lung cancer mieroarray experiment is analyzed by non - parameter test, diseriminant analysis and regression analysis. As a result, ER1, VDR, RARer and RORα is differentially expressed between adenoeareinoma and squamous cell carcinoma under signifieanee of 0.05 ; RARβ is differentially expressed between recurrent and non - recurrent cancer ; diseriminant analysis shows VDR and RORα together can predict pathotype, and RARβ and PPARα together can discriminate recurrence ; the false - rate is 0. 2389 and 0.3457, respectively. Logistic regression is established to predict pathotype and variables included are also VDR and RORα, with OR at 0. 126 and 4. 452, respectively. Therefore, multivariate statistics based on SAS is a potential way to mine mieroarray data and conclusions based on SAS integration of different mieroarray experiments might be helpful for establishing hypothesis once mieroarray experiments can be standardized.
Keywords:SAS
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号