Data mining

Data mining
Question 1: Business Applications of Data Mining
(Web/Library research) List some business applications of data mining techniques. Case studies
and success stories will be helpful to you.
. See for example: http://www]01.ibm.com/software/success/cssdb.nsf/CS/STRD]
8QSKHK?OpenDocument&Site=spss&cty=en_us
o This is just one example of a place to find case studies and success stories. You should find
at least one other reference to use for this assignment
. Create a table that lists:
. Each business application (including a brief description of the business objective, and the
company/companies that have used or could use data mining for these
applications/business-objectives)
o Example business application: ‘Predicting responses to a marketing campaign’
. The data mining techniques/algorithm(s) that are/were helpful in achieving the business
objective for each business application
o Your syllabus lists some of the most popular data mining techniques
. Possible/typical outputs of data mining in that business application area
o Example outputs: ‘Married women with one or more children are more likely to
respond to the campaign’, ‘People who buy chili are more likely to by antacids’,
etc.
. The web-address of the page where you found the information or the citation for the
article or book where you found the information. You are required to provide at least 3
different web-addresses/citations.
(300-400 words [approx. 1 page]; 10 points).
Question 2: The Data Mining Process
Describe the industry standard CRISP-DM data mining process model
(http://www.dataminingtechniques.net/data]mining]tutorial/data]mining]processes/) and
SAS’s SEMMA model
(http://www.sas.com/offices/europe/uk/technologies/analytics/datamining/miner/semma.htm
l )
. Choose one of your business applications from Question 1, illustrate the usage of either
CRISP-DM or SEMMA for that application: e.g. in the ‘data preparation’ phase of CRISPDM
describe the specific data sources you would use for that application, etc.
Further Instructions for Assignment 1
. For question 1, you should format your answer as a table with the headings ‘business
objective’, ‘companies (that do or could pursue this data mining objective)’, ‘data mining
algorithm/technique’, ‘sample output’, and ‘web address’ as requested in the question.
o Also, you should give specific answers. For example, a business objective of ‘provide
payment processing solutions’ is not specific enough; rather say ‘produce a model to
score transactions and identify the transactions most likely to be fraudulent’.
o Similarly, for outputs of data mining, ‘company lowers percentage of fraudulent
transactions’ is fine as a general goal, but give me more details of possible specific
outputs: e.g. give example rules that could be produced like ‘large transactions by
people in Queens who have held accounts for less than 6 weeks are likely to be
fraudulent’.
. For question 2, on the application of data mining techniques, make sure to describe both the
CRISP-DM and SEMMA processes. However, you only need to apply one of them
(preferably CRISP-DM).
o The goal is to provide good descriptions of how specifically to apply each of the 6
stages of the process to your particular case.
o More importantly, you should give specific actions for each phase of the data mining
process when explaining how the process could be applied to your particular case.
. For example, ‘understand data’ is not specific enough and would not earn you
any of the points for the ‘application’ part of the question; instead, under the
heading ‘data understanding’ give details like ‘gather customer data from
internal database, including customer identifier, age, purchase history, …’.
. Similarly ‘prepare data’ is not detailed enough. Instead, under the heading
‘data preparation’, write ‘bin the customers into 5 equal bins by income
attribute; compute aggregates for past 3 months, past 6 months, and past 12
months customer purchases, …’.
. Under the heading ‘evaluation’ you might explain that a model that picks
fraudulent transactions with 98% recall, and 80% precision is probably
sufficient and that low recall is costly because each fraudulent transaction that
falls through our checks is expensive, whereas low precision is not too costly
as transactions that were rejected falsely do not lose us a lot of profits.
. (Model evaluation is dealt with more detail in lectures after the
assignment is due; you should have picked up knowledge about
evaluation criteria from reading the Two Crows reading).
o For the deployment phase (of CRISP-DM), you should explain how the Company
could exploit (profit from) the model produced and what specific actions they did or
could take: e.g. they could use the model to score prospects, and email high-scoring
customers. Obviously the details would depend on the application you chose, but the
important thing is to be specific and apply the process to your particular application.
. Always cite the source of any comparative performance figures, or seemingly
unsubstantiated data, which you give.

You can leave a response, or trackback from your own site.

Leave a Reply

Powered by WordPress | Designed by: Premium WordPress Themes | Thanks to Themes Gallery, Bromoney and Wordpress Themes