CIS 490: Introduction to Programming Techniques for Big Data Analytics

3 credit hours

Prerequisites: CIS 200 or CIS 209 or instructor permission. This is an introductory course on programming techniques for working with big data, including data sets that are terascale and larger as well as complex, heterogeneous data from various domains. It is intended for students who have had at least one introductory programming course and are prepared to learn a new programming language and one or more libraries. No additional background is assumed. The course will survey programming concepts that underlie the MapReduce architecture, their implementation using platforms such as Apache Hadoop, and specific tools for data integration and data transformation such as Apache Hive and Scalding. Basic NoSQL databases and query processing will be presented in the context of real-world problems and students will be given full-scale data sets and problems to work with, along with the opportunity to bring data and problems to work on from other disciplines.

Course information and interest form

CIS 798: Programming Techniques for Big Data Analytics

3 credit hours

Prerequisites: CIS 560 or CIS 300 and instructor permission. This is an advanced undergraduate course or first graduate-level course on programming architectures, algorithms, and techniques for working with big data, including terascale and petascale (or larger) data sets, text corpora for natural language processing and information retrieval, and complex, heterogeneous data from various domains of science, business, and the humanities. It is intended for students who have had at least two introductory programming courses, including one on data structures, and are prepared to learn one or more new programming languages and libraries. Additional background is not assumed, but programming experience is recommended and a first course in databases will be helpful. The course will survey programming paradigms that underlie parallel and distributed processing architectures such as MapReduce, their implementation using platforms such as Apache Hadoop, and specific tools for data integration and data transformation such as Apache Hive and Scalding. A brief survey of machine learning and data mining techniques using Apache Mahout will also be given. NoSQL databases and query processing will be presented in the context of real-world problems. Students will be given full-scale data sets and problems to work with, or may choose to bring data and problems to work on from their own disciplines and areas of research.

Course information and interest form