COURSE DESCRIPTION

This course will serve to introduce students to data mining and knowledge management. Data mining (DM) is concerned with the discovery of “hidden” knowledge in large data sets. This knowledge represents one aspect of an organization’s intellectual capital and is often expressed in the form of trends or major themes that re-occur in the data. Knowledge management (KM) systems are designed to exploit the results of data mining and facilitate the analysis and evaluation of both tangible and intangible knowledge assets. In this course, students will explore data mining methods used for prediction and knowledge discovery. These methods include regression, nearest neighbor, clustering, K-means, decision trees, association rules, and neural networks. In addition, students will become familiar with the current theories, practices, tools, and techniques used to management knowledge assets.

COURSE TOPICS

Concept and techniques in data mining and knowledge management
Models, components, and implementation issues in knowledge-based systems
Classification and prediction using data mining algorithms
Tools and practices in the area of data mining

COURSE OBJECTIVES

After completing this course, you should be able to:

CO1 Explain data mining concepts, principles, and tasks.

CO2 Implement efficient algorithms for preprocessing of large data sets.

CO3 Generate a plan to design and implement different phases of data mining process.

CO4 Explore large data to discover patterns using existing algorithms.

CO5 Apply data classification techniques to analyze data sets.

CO6 Identify and implement clustering algorithms pertaining to data mining.

CO7 Evaluate analytical problems in various areas of computational data analysis and knowledge management.

CO8 Use an open source data mining tool to analyze data sets.

COURSE MATERIALS

You will need the following materials to complete your coursework. Some course materials may be free, open source, or available from other providers. You can access free or open-source materials by clicking the links provided below or in the module details documents. To purchase course materials, please visit the University's textbook supplier.

Required Textbook

Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Pearson.

ISBN-13: 978-0321321367

Textbook Resources from Authors

Required Open Source Software

R for windows: https://cran.r-project.org/
R for Mac: https://cran.r-project.org/bin/macosx/
RStudio Desktop (open source edition) for Mac/Windows: https://www.rstudio.com/products/rstudio/download/

COURSE STRUCTURE

Data Mining and Knowledge Management is a three-credit, online course consisting of six modules. Modules include an overview, topics, learning objectives, study materials, and activities. Module titles are listed below.

Module 1: Introduction and Getting to Know Your Data
Course objectives covered in this module: CO1, CO2
Module 2: Exploring Data and Basic Classification Models
Course objectives covered in this module: CO3, CO4, CO5, CO8
Module 3: Classification Techniques and Association Analysis
Course objectives covered in this module: CO4, CO5, CO8
Module 4: Advanced Concepts in Association Analysis
Course objectives covered in this module: CO1, CO4
Module 5: Cluster Analysis
Course objectives covered in this module: CO6, CO8
Module 6: Advanced Topic: Anomaly Detection
Course objectives covered in this module: CO7

ASSESSMENT METHODS

For your formal work in the course, you are required to participate in online discussion forums, complete written assignments, and finish programming projects. See below for details.

Consult the Course Calendar for due dates.

Promoting Originality

One or more of your course activities may utilize a tool designed to promote original work and evaluate your submissions for plagiarism. More information about this tool is available in this document.

Discussion Forums

You will be required to participate in ten graded online discussion assignments. There is also one ungraded but required Introductions Forum in Module 1.

Discussion forums are on a variety of topics associated with the course modules. The purpose of the discussion forums is to help make the connection between the course concepts and the goals of the course. In discussion posts, you express your opinions and thoughts, provide support and evidence for the position(s) you take on a subject, and have the opportunity to ask questions and expand on insights provided by your classmates. Active participation is vital to your overall success in this course.

Located within the Evaluation Rubrics section of the course website is the online discussion forum rubric used to aid in the grading of all online discussion assignments.

Written Assignments

You are required to complete six written assignments. The written assignments are on a variety of topics associated with the course modules.

Programming Assignments

You are required to complete three programming projects using R (an open source programming language and software environment for statistical computing and graphics) and RStudio (a set of integrated tools designed to help you be more productive with R). Both R and RStudio are widely used for data mining and analysis. You are required to use them to generate reports on statistical properties of a data set, to implement a classification technique to classify different data sets, and to utilize a clustering technique on a different data set.

GRADING AND EVALUATION

Your grade in the course will be determined as follows:

Online discussions (10)—40 percent
Written assignments (6)—30 percent
Programming projects (3)—30 percent

All activities will receive a numerical grade of 0–100. You will receive a score of 0 for any work not submitted. Your final grade in the course will be a letter grade. Letter grade equivalents for numerical grades are as follows:

A	=	93–100	B	=	83–87
A–	=	90–92	C	=	73–82
B+	=	88–89	F	=	Below 73

To receive credit for the course, you must earn a letter grade of C or higher on the weighted average of all assigned course work (e.g., assignments, discussion postings, projects). Graduate students must maintain a B average overall to remain in good academic standing.

STRATEGIES FOR SUCCESS

First Steps to Success

To succeed in this course, take the following first steps:

Read carefully the entire Syllabus, making sure that all aspects of the course are clear to you and that you have all the materials required for the course.
Take time to read the entire Online Student Handbook. The Handbook answers many questions about how to proceed through the course and how to get the most from your educational experience at Thomas Edison State University.
Familiarize yourself with the learning management systems environment—how to navigate it and what the various course areas contain. If you know what to expect as you navigate the course, you can better pace yourself and complete the work on time.
If you are not familiar with web-based learning be sure to review the processes for posting responses online and submitting assignments before class begins.

Study Tips

Consider the following study tips for success:

To stay on track throughout the course, begin each week by consulting the Course Calendar. The Course Calendar provides an overview of the course and indicates due dates for submitting assignments, posting discussions, and scheduling and taking examinations.
Check Announcements regularly for new course information.

ACADEMIC POLICIES

To ensure success in all your academic endeavors and coursework at Thomas Edison State University, familiarize yourself with all administrative and academic policies including those related to academic integrity, course late submissions, course extensions, and grading policies.

For more, see: