sql计算用户留存率的方法
概念
电商业务中需要计算用户留存情况,在时间维度上可以分为用户次日、多日后的留存情况,用留存率表示。留存率计算方式如下:
次日留存率 = 当日新用户在次日登录的数量 / 当日新用户注册数量
三日留存率 = 当日新用户在第四天登录的数量 / 当日新用户注册数量
七日留存率 = 当日新用户在第八天登录的数量 / 当日新用户注册数量
其他时间的留存率计算同理。
情况一
假设现在有两张表,一张是注册信息表register_info,另一张是登录信息表login_info,两个表格数据如下。
现在分别计算出每日的新增用户数,以及次日、三日、七日留存率。
首先,将两张表做一个连接,用datediff函数选出用户注册后七日仍有登录的数据(条件1),连接条件是用户id相同(条件2)。查询代码如下:
1 2 3 4 5 6 7 8 9 10 | select
r.uid,
date (r.register_date) as rt,
date (l.login_time) as lt,
datediff(l.login_time, r.register_date) as tdiff
from register_info r
left join login_info l
on r.uid=l.uid
and date (l.login_time) between date (r.register_date)+interval 1 day and date (r.register_date)+interval 7 day
order by uid asc
|
查询结果如下:(这里每个用户的注册时间只有一个,所以会得出每个用户所有注册登录时间的排列组合便于后续计算)
之后利用case条件判断和count函数计算出各种留存率情况。代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | select
rt as 日期,
count ( distinct uid) as 新增用户数,
count ( distinct case when tdiff=1 then uid end )/ count ( distinct uid) as 次日留存率,
count ( distinct case when tdiff=3 then uid end )/ count ( distinct uid) as 三日留存率,
count ( distinct case when tdiff=7 then uid end )/ count ( distinct uid) as 七日留存率
from
(
select
r.uid,
date (r.register_date) as rt,
date (l.login_time) as lt,
datediff(l.login_time, r.register_date) as tdiff
from register_info r
left join login_info l
on r.uid=l.uid
and date (l.login_time) between date (r.register_date)+interval 1 day and date (r.register_date)+interval 7 day
order by uid asc
) t1
group by rt
|
得到结果如下:
情况二
假如现在只有一张用户登录信息表user_login_info,其中默认每个用户最早的登录时间就是注册时间,表格数据如下:
所以我们需要先用两个子查询得出类似(1)中的注册表t1和登录表t2,之后操作与(1)中相同,代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | select
count ( distinct uid) 新增用户数,
count ( distinct case when tdiff=1 then uid end )/ count ( distinct uid) as 次日留存率,
count ( distinct case when tdiff=3 then uid end )/ count ( distinct uid) as 三日留存率,
count ( distinct case when tdiff=7 then uid end )/ count ( distinct uid) as 七日留存率
from
(
select t1.uid, t1.rt, t2.lt, datediff(t2.lt,t1.rt) as tdiff
from
(
( select user_id as uid, min ( date (login_time)) as rt from user_login_info group by user_id) t1
LEFT JOIN
( select user_id as uid, date (login_time) as lt from user_login_info) t2
on t1.uid=t2.uid and t2.lt between t1.rt + interval 1 day and t1.rt + interval 7 day
)
) t3
group by rt
order by rt asc
|
其中t3就是情况一中得到的t1.
查询结果如下: