引用本文: | 吴文峰,刘毅慧.高维蛋白质波谱癌症数据特征提取[J].生物信息学,2015,13(2):131-140. |
| WU Wenfeng,LIU Yihui.Feature selection for high-dimensional cancer protein mass spectrometry data[J].Chinese Journal of Bioinformatics,2015,13(2):131-140. |
|
摘要: |
高维蛋白质波谱癌症数据分析,一直面临着高维数据的困扰。针对高维蛋白质波谱癌症数据在降维过程中的问题,提出基于小波分析技术和主成分分析技术的高维蛋白质波谱癌症数据特征提取的方法,并在特征提取之后,使用支持向量机进行分类。对8-7-02数据集进行2层小波分解时,分别使用db1、db3、db4、db6、db8、db10、haar小波基,并使用支持向量机进行分类,正确率分别达到98.18%、98.35%、98.04%、98.36%、97.89%、97.96%、98.20%。在进一步提高分类识别正确率的同时,提高了时间率。 |
关键词: 小波分析 主成分分析 蛋白质波谱 降维 分类 |
DOI:10.3969/j.issn.1672-5565.2015.02.10 |
分类号:Q629.73 |
基金项目: |
|
Feature selection for high-dimensional cancer protein mass spectrometry data |
WU Wenfeng,LIU Yihui
|
(School of Information,Qilu University of Technology,Jinan 250353,China)
|
Abstract: |
The analysis of high-dimensional cancer protein mass spectrometry data is full of trouble from high-dimensional data.We propose method for selecting the feature of high-dimensional cancer protein mass spectrometry data based on the wavelet analysis and principal component analysis,and solving the faled problems when we reduce the dimensionality of high-dimensional cancer protein mass spectrometry data.After feature selection,we use the support Vector Machine(SVM) for classification.We use wavelet decomposition on 8-7-02 data set at second level,use different wavelet basis(db1,db3,db4,db6,db8,db10,haar) and classify them with the SVM, then we get different recognition rates:98.18%,98.35%,98.04%,98.36%,97.89%,97.96%,98.20%.Improve the classification accuracy and the efficiency of time simultaneously. |
Key words: Wavelet analysis Principal component analysis Protein mass spectrometry Dimensionality reduction Classify |