FPG是FP-growth算法的简称,推荐算法=》关联算法中最有名的算法之一,是Apriori算法的性能优化版。
参考了一些示例,自行再实现,具体算法如下。
步骤归纳为:
1、第一次遍历获取HeaderTable,包括去重、计频繁数、依据最小支持度去项、重排序(频繁数倒序);
2、第二次遍历更新原列表,包括依据headerTable去除小于最小支持度的项、重排序
3、建FP Tree,包括创建新节点、相似元素项节点合并
调用入口(test04.py):
#coding:utf-8import test03oneDimList = []def loadSimpDat():simpDat = [['r', 'z', 'h', 'j', 'p'],['z', 'y', 'x', 'w', 'v', 'u', 't', 's'],['z'],['r', 'x', 'n', 'o', 's'],['y', 'r', 'x', 'z', 'q', 't', 'p'],['y', 'z', 'x', 'e', 'q', 's', 't', 'm']]return simpDat# 数据准备simpDat = loadSimpDat()for list in simpDat:oneDimList += list# 建header tableheaderTable = test03.HeaderTable().create(oneDimList, 3)# 建FP treebuildTree = test03.BuildTree()updDat = buildTree.refactor(simpDat, headerTable)fpTree = buildTree.update(updDat)# 打印结果for fpTreeItem in fpTree:print 'parent:' + fpTreeItem.parent + ', name:' + fpTreeItem.name + \', num:' + str(fpTreeItem.numOccur)
核心实现 (test03.py):
#coding:utf-8class HeaderTable:def __init__(self):passdef create(self, dat, minsup):headerTable = {}# 去重setDat = set(dat)# 计频繁数for key in setDat:headerTable[key] = dat.count(key)# 依据最小支持度去项for k,v in headerTable.items():if v < minsup:del(headerTable[k])# 重排序headerTable = sorted(headerTable.items(),key=lambda i:i[1],reverse=True)print headerTablereturn headerTableclass FPTreeItem:def __init__(self, key, name, numOccur, parent):self.key = key # keyself.name = name # 项名self.numOccur = numOccur # 频繁值self.parent = parent # 父节点class BuildTree:# inDat: (list) [[],[]]def refactor(self, inDat, headerTable):lineCounter = 0datLine =[]dat = []# 依据headerTable去除小于最小支持度的项、重排序for list in inDat:lineCounter += 1for i in headerTable:if i[0] in list:datLine.append(i[0])dat.append(datLine)datLine = []return dat# updDat: (list) [[],[]]def update(self, updDat):fpTree = []for list in updDat:parent = ''keyLink = ''for item in list:parent = keyLinkkeyLink += itemfor fpTreeItem in fpTree:if keyLink == fpTreeItem.key:# 相似元素项节点合并fpTreeItem.numOccur += 1break# 没有这个元素项时创建一个新节点else:fpTreeItem = FPTreeItem(keyLink, item, 1, parent)fpTree.append(fpTreeItem)return fpTree
参考:
/zhangchaoyang/articles/2198946.html
/qwertWZ/p/4510857.html
如果觉得《[Python]FPG(FP-growth)算法核心实现》对你有帮助,请点赞、收藏,并留下你的观点哦!