摘要: |
在机器学习问题中,类别不平衡问题严重影响一些标准分类器的性能。因此,解决类别不平衡问题尤为重要。上采样是解决类不平衡问题的常用方法,其通过合成新的少数类样本来平衡类的分布。在文中,使用一种基于高斯混合模型的上采样方法来解决不平衡学习问题。通过高斯混合模型来模拟少数类的分布,在此基础上使用高斯模型来生成新的少数类样本。在UCI类别不平衡数据集上的实验结果表明,所提出的方法能够缓解类不平衡所带来的负面影响并帮助提升分类性能。 |
关键词: 不平衡学习 支持向量机 高斯混合模型 上采样 |
DOI:10.3969/j.issn.1672-5565.20161019001 |
分类号:TP181 |
文献标识码:A |
基金项目: |
|
A new over-sampling algorithm by gaussian mixture model |
SHEN Leyang, SUN Tingkai
|
(School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China)
|
Abstract: |
Its significant to solve the class-imbalance problems which have a serious impact on the performance of standard classifiers in machine learning problems. Over-sampling is a popular method in dealing with class-imbalance problems, which attempts to balance the sizes of different classes by generating additional samples for minority class. We propose a new over-sampling algorithm that synthesizes new additional samples for minority classes by the Gaussian mixture model. Comparing with several state-of-art related methods on UCI datasets,the experimental results demonstrate that the proposed over-sampling algorithm can reduce the side effect of the class imbalance and help improve the classification performance. |
Key words: Imbalance learning Support vector machine Gaussian mixture model Over-sample |