A Data Quality Framework for CRM Analytics

Case Study Queries

1) Query Performance and Scalability

Queries that were executed can be classified into three broad categories.
  • Sales:Compute product sales across clients, events, and customer segments.
  • Customer: Determine customers that belong to each stage of the lifecycle,and customer characteristics that lead to increased sales.
  • Aggregation queries: Compute statistical summaries to extract information about historical sales trends, customer buying activity, and product popularity.

1.1) Query Definitions

1.1.1) SALES QUERIES

1) Total sales across all events. SELECT name AS event name,SUM(amount) AS Total event sales FROM transactions,event GROUP BY name; 2) Total sales across all clients for one event. SELECT name AS event name,name AS clients name,transaction date,SUM(amount) AS total amount FROM transactions,event,client WHERE event id = '957' GROUP BY name,transaction date; 3) Client and Customer details for products purchased at an event. SELECT id,product name,name AS clients name,name AS event name,first name FROM product transactions,products,transactions,event,clients,customer WHERE product id IN (SELECT product_id FROM product transactions, transactions WHERE event_id = '957' GROUP BY product id, clients id) AND id = '957'; 4) Identify the top-5 sold products (in terms of quantity sold) at a given event. Return the product id and name for each product. SELECT id, product name FROM product WHERE id IN (SELECT product id FROM product transactions,transactions WHERE event_id = '957' GROUP BY product_id ORDER BY count(product id) DESC FETCH FIRST 5 ROWS ONLY);

1.1.2) CUSTOMER QUERIES

5) For each customer-to-product interaction at a given client, return the customer’s product rating and the customer feedback. SELECT DISTINCT question, answer, customer id FROM survey answers,questions,answers WHERE question id BETWEEN 49 AND 109 AND customer id IN (SELECT customer id FROM customer product interaction WHERE Clients id = 1 ); 6) Return the top-5 products according to men (measured by their interaction time). SELECT id, product name FROM product WHERE id IN (SELECT product id FROM customer product interaction,survey answers WHERE question id = '33' AND answer id = '39'GROUP BY product id ORDER BY COUNT(product id) DESC FETCH FIRST 5 ROWS ONLY; 7) ROI per client across all events. SELECT DISTINCT id, first name, Clients id, name AS Clients name, SUM(amount) AS transaction sum FROM ps_behavior_profile_500K,customer,clients WHERE transaction id IS NOT NULL AND customer id IN (SELECT customer id FROM transactions t) GROUP BY id, first name, Clients id, name ORDER BY transaction sum DESC; 8) Identify all products interacted and later purchased by a given customer. SELECT id AS ProdID Interacted, prod name AS ProdName Interacted,id AS ProdID purchased, prod name AS ProdName Purchased FROM (SELECT DISTINCT product id AS id, product name AS prod name FROM customer product interaction,product WHERE customer id = 232 AND product id NOT IN (SELECT product id FROM transactions,product transactions WHERE customer_id = 232)) temp1 ,(SELECT DISTINCT product id AS id, p1.product name AS prod name FROM customer product interaction, product WHERE customer id = 232 AND product id IN (SELECT product id FROM transactions,product transactions WHERE customer id = 232)) temp2;

1.1.3) AGGREGATION QUERIES

10C) For each potential customer, return client name and product name he/she expressed interest. SELECT id, first name,last name,email,notify_customer_flag,name AS event name,name AS Clients name,product name,Time_spent_product FROM customer,customer product interaction,product,event,clients,behavior profile WHERE ft7.Clients id = id AND event id = id AND product id = id AND id IN (SELECT customer id FROM customer product interaction WHERE NOT EXISTS (SELECT 'X' FROM transactions WHERE customer id = customer id ));

1.1.4) OVERLAPPING QUERY GROUPS

A) SALES AGGREGATION QUERIES

Following two queries return most popular products in terms of customer interaction time. Query 9A) runs over warehouse data and Query 9B) runs over transactional data. We compare the running time of both the queries. 9A) SELECT event id, name AS event name, product id, product name AS product name, SUM(Time_spent_product) FROM behavior profile, event, product GROUP BY product id, event id , product name, name ORDER BY SUM(Time_spent_product) DESC; 9B) SELECT event id, name AS event name, product id , product name AS product name, SUM(TotalTime spent on Product) FROM customer product interaction,event,product GROUP BY product id,event id,product name,name ORDER BY SUM(TotalTime spent on Product) DESC;

B) CUSTOMER AGGREGATION QUERIES

Following two queries return client name and product name each potential customer expressed interest. Query 10A) runs over warehouse data and Query 10B) runs over transactional data. We compare the running time of both the queries. 10A) SELECT id, first name, last name, email, name AS event name, name AS Clients_name, product name, Time_spent_product FROM ps_customer,behavior profile,product, event,Clients AND id IN (SELECT customer id FROM behavior profile WHERE NOT EXISTS (SELECT 'X' FROM transactions WHERE customer id = customer id)); 10B) SELECT id, first name, last name, email, notify customer flag,name AS event name, name AS Clients name, product name, TotalTime spent on Product FROM ps_customer,customer product interaction,product,event,Clients AND id IN (SELECT customer id FROM customer product interaction WHERE NOT EXISTS (SELECT 'X' FROM transactions WHERE customer id = customer id));

1.2) Query Running Times

1.2.1) Query Group Running Times

Database size (#records) Sales (ms) Customers (ms) Aggregation (ms)
0.5M 161 120 8803
1M 241 160 18803
2.5M 380 330 60083
6M 625 480 160303
10M 1005 780 360303

1.2.2) Individual Query Running Times (ms)

Database size (#records) Query1 Query2 Query3 Query4 Query5 Query10C Query6 Query7 Query8 Query9a Query9b Query10a Query10b
0.5M 271 69 271 85 120 8803 60 274 128 300 495 334 374
1M 320 105 441 125 160 18803 150 480 180 670 730 419 480
2.5M 470 205 641 205 330 60083 350 1024 480 1370 1730 941 1081
6M 770 285 941 505 480 160303 850 2024 870 2370 3730 1941 1961
10M 1250 425 1341 1005 780 360303 1607 3124 1280 5370 6930 2841 3361

1.2.3) QUERY ANALYSIS

Figure 1: Sales Queries

Figure 2: Customer Queries

Figure 3: Overlapping Queries

Figure 4: Aggregation Queries

a) The above figures show the running times of all the queries. b) Figure 1 and Figure 2 show 7 out of 8 queries ran in less than 1.5 seconds for 10 million records demonstrating the efficiency and value of our framework. In Figure 1, Query1 runs almost 3 times longer than Query2 as it runs across a larger dataset. Query1 extracts total sales across all events while Query2 extracts total sales at a given event. c) In Figure 3, we compare running times for similar queries executed over the transactional tables and over the warehousing tables. Query 9a, Query 9b identify products with high customer interaction times. Query 9a is run using the warehousing tables which contain pre-computed statistical summaries. These statistical summaries eliminate the computational overhead of computing statistical aggregations, which are necessary in Query 9b. d) Figure 4 shows the query running times to extract information from both warehousing tables and transactional tables. Due to larger table sizes and complex statistical summaries, we observe that the aggregation queries have higher running times than the sales and customer queries. Customer queries show improved performance over the sales queries due to a number of records being pruned based on qualifying sales and customer characteristics.

2) COMPARATIVE QUERY PERFORMANCE

In this section, we provide the definitions of the eight queries we used to evaluate comparative performance, using the original (e.g., Q1_orig) versus the new design (e.g., Q1_new).
  • Identify the top five products across all clients.
  • Q1_orig: select distinct t2.product_name,t1.product_id,count(t1.product_id) as Quantity from client1_check_in t1 inner join client1_product_list t2 on t1.product_id = t2.id group by t1.product_id,t2.product_name union all select distinct t3.product_name, t4.product_id, count(t4.product_id) as Quantity from client2_tap_history t4 inner join client2_product_list t3 on t4.product_id = t3.id group by t4.product_id,t3.product_name order by Quantity desc fetch first 5 rows only; Q1_new: select distinct t2.product_name,t1.product_id, count(product_id) as quantity from n_user_product_interaction t1 inner join n_product_list t2 on t1.product_id = t2.id group by t1.product_id,t2.product_name order by quantity desc fetch first 5 rows only;
  • Identify all the customers and the products they have expressed their interest, do this across all clients.
  • Q2_orig: select distinct t1.user_id,t2.first_name,t2.last_name,t3.product_name from client1_check_in t1 inner join client1_user_profile t2 on t1.user_id = t2.user_id inner join client1_product_list t3 on t1.product_id = t3.id union all select distinct t7.user_id,t8.first_name,t8.last_name,t9.product_name from client2_tap_history t7 inner join client2_user_profile t8 on t7.user_id = t8.user_id inner join client2_product_list t9 on t7.product_id = t9.id; Q2_new: select distinct t1.user_id,t2.first_name,t2.last_name,t3.product_name from n_user_product_interaction t1 inner join n_user t2 on t1.user_id = t2.id inner join n_product_list t3 on t1.product_id = t3.id;
  • Identify all the customers who are not interested in any products.
  • Q3_orig: select distinct t1.user_id,t1.first_name,t1.last_name from client1_user_profile t1 where t1.user_id not in (select user_id from client1_check_in t2) union all select distinct t1.user_id,t1.first_name,t1.last_name from client2_user_profile t1 where t1.user_id not in (select user_id from client2_tap_history t2); Q3_new: select distinct t1.id,t1.first_name,t1.last_name from n_user t1 where t1.id not in (select user_id from n_user_product_interaction t2);
  • Identify distinct customers across all clients.
  • Q4_orig: select distinct t1.user_id,t1.first_name,t1.last_name from client1_user_profile t1 union all select distinct t2.user_id,t2.first_name,t2.last_name from client2_user_profile t2; Q4_new: select distinct t1.id,t1.first_name,t1.last_name from n_user t1;
  • Identify all customers who have indicated they would like clients to contact them regarding the products they have expressed interest in.
  • Q5_orig: select t1.user_id,t1.first_name,t1.last_name from client1_user_profile t1 where contact_consent = 1 union all select t1.user_id,t1.first_name,t1.last_name from client2_user_profile t1 where contact_consent = 1; Q5_new: select distinct t1.id,t1.first_name,t1.last_name from n_user t1 where id in (select user_id from n_vendor_visit where contact_consent = 1);
  • Return survey responses for a given customer.
  • Q6_orig: select t2.question,t3.answer from client1_survey_data t1 inner join client1_questions t2 on t1.question_id = t2.id inner join client1_answers t3 on t1.answer_id = t3.id where user_id in (select user_id from client1_user_profile where first_name = 'John' and last_name = 'Doe' fetch first row only) union all select t2.question,t3.answer from client2_survey_data t1 inner join client2_questions t2 on t1.question_id = t2.id inner join client2_answers t3 on t1.answer_id = t3.id where user_id in (select user_id from client2_user_profile where first_name = 'John' and last_name = 'Doe'fetch first row only); Q6_new: select t2.question,t3.answer from n_survey_answers t1 inner join n_questions t2 on t1.question_id = t2.id inner join n_answers t3 on t1.answer_id = t3.id where user_id in (select id from n_user where first_name = 'John' and last_name = 'Doe');
  • Return customer ids of customers associated with a smart (NFC) card, do this for all smart cards.
  • Q7_orig: select id from (select id from client2_nfc_list union all select id from client1_nfc_list); Q7_new: select id from n_nfc_list;
  • Identify all the customers without a smart card.
  • Q8_orig: select user_id from client1_nfc_list where uuid is null; union all select user_id from client2_nfc_list where uuid is null; Q8_new: select user_id from n_nfc_list where uuid is null;

Contact details