Introduction to Data Mining (11446)
Course Project
Deadline: Last week in the semester
[exact date to be announced later]
[exact date to be announced later]
Objectives:
In this project you will have a chance to apply the data mining concepts you have learned in class to solve a real-world problem. Once done, you will have performed some or all of the following:- Practiced cleaning and exploring data.
- Implemented a data mining algorithm.
- Performed automatic feature extraction from raw data.
- Practiced using a data mining tool.
- Practiced evaluating the performance of data mining algorithms.
- Practiced communicating results in a written report and in an oral presentation.
Description:
You can either work individually or in a team of two. Work on your project following the steps below:1) Choose a suitable data mining problem:
Choose any classification problem that you are interested in.
Avoid choosing a problem that is too complex and large or that is too simple and naive (e.g. predicting if a customer would cheat or not based on the dataset presented in the slides).
You can start by browsing available datasets (see the resources tab) and thinking of problems based on them, or by thinking of a problem and then looking for a way to collect data for it. Note that there will be a bonus for very well chosen and interesting problems.
In all cases, make sure to discuss with me your ideas early.
2) Implement a data mining algorithm:
After (or during) pre-processing, cleaning and initial exploration of the data, choose a suitable data mining algorithm and implement it from scratch.
You will have to discuss in the report the advantages and disadvantages of your choice on your specific project problem.
Once done, implement an interface that allows accessing and using the algorithm on your dataset. The user interface can be either graphical or command line based.
If your problem involves a difficult/time consuming feature extraction step, you do not have to implement a data mining algorithm from scratch. In this case, you can use a tool like WEKA. If there is no feature extraction step or if this step is simple, then you must implement your chosen data mining algorithm from scratch.
The type of implementation you will do will be mutually decided (between me and you) directly once the project idea has been approved.
3) Evaluate the performance of the algorithm:
Test your algorithm with different parameters and record the results of your tests and what the estimation of the generalization error is.
4) Write a report about your work:
The report should include:
- A description of the data mining task.
- A description of the dataset and any pre-processing steps performed.
- A description of the feature extraction step if any.
- A description of the algorithm and parameters used and a discussion of the suitability of the algorithm for the problem of the project.
- A description of the performed tests and a listing, explanation and discussion of the results.
- Snapshots that show that your program runs well.
5) Prepare a presentation:
The presentation should be 10-15 minutes long and should summarize what you have done in the project along the lines of what you have discussed in the report.
Grading Scheme:
Implementation (65%):
- Choice of the data mining problem.
- Data collection and preprocessing.
- Algorithm choice and implementation.
- Performance evaluation.
- User Interface.
Report (25%):
- Clarity.
- Depth of discussion.
Presentation (10%):
- Content Value.
- Communication skills.
- Ability to respond to questions well.
There could be a bonus of up to 4% out of the total course mark, depending on the value of the project.
Notes:
- You need to send me an email (within a week from now) indicating if you will work individually or in a team (including your teammate name).
- You need to send me an email (within two weeks from now) indicating the problem you have chosen to work on. Write a short paragraph that summarizes the data mining task you are going to work on.
Have fun :)
No comments:
Post a Comment