Any topic (writer’s choice)

The Hepatitis. arff Data set contains information about patients affected by Hepatitis. The task is to generate a classification model to predict Hepatitis histology: Yes/No.

Submit a report based on the answers for the following questions:

a)    Select a suitable decision tree model for predicting Histology.
–    Which model evaluation method did you use (CW, H-O)? Provide an overview of this model, why was it preferred? 
–    Interpret the classification outputs: the tree topology, the accuracy rates.   

b)    Provide a detailed description of the classification model:
–    The tree induction algorithm
–    The attributes selection criteria.
–    The pruning method

c)    Vary the model parameters and discuss the impact on the classification results:
–    Set the REP parameter (Reduced Error Pruning) to TRUE. Explain this tree pruning method. What impact has it made on the outputs, why?
–    Set the parameter unpruned to TRUE, Report and explain any change in the accuracy of results and in the tree structure.
–    Change the confidence factor to 15%, report the impact on the classification outputs, explaining the causes of change.

d)    Visualise the tree and Generate a set of rules along the subtree path: Varices – Ascites Spiders Bilirubin Sex Class No. If you were to generate association rules from the tree how could you reduce the number of rules (hint: speculate about Support and Confidence)?

e)    Perform predictions using two other classification models of your choice: e.g. ANN, SVM, Ensemble learner. Report on the accuracy metrics, discuss the superiority/inferiority of these models performance compared to the decision tree.

f)    Create ROC and Lift charts and interpret them.

You can leave a response, or trackback from your own site.

Leave a Reply

Powered by WordPress | Designed by: Premium WordPress Themes | Thanks to Themes Gallery, Bromoney and Wordpress Themes