MODULE 1 – FUNDAMENTALS OF PYTHON
Data scientists must know how to code – start by learning the fundamentals of one of the most popular programming languages – Python.
- Basics of Python
- Conditional and Loops
- String and List Objects
- Functions & OOPs Concepts
- Exception Handling
- Database Programming
MODULE 2 – DATA WRANGLING
Once you have the core skill of programming covered– dip your feet in the nitty – gritties of working with data by learning how to wrangle and visualize them.
- Reading CSV, JSON, XML and HTML files using Python
- NumPy & Pandas
- Relational Databases and Data Manipulation with SQL
- Scipy Libraries
- Loading, Cleaning, Transforming, Merging, and Reshaping Data
MODULE 3 – STATISTICS AND PROBABILITY
It is impossible to use data without knowledge of statistics. Collect, organize, analyze, interpret, and present data using these concepts of statistics.
- Descriptive Statistics & Data Distributions
- Probability Concepts and Set Theory
- Probability Mass Functions
- Probability Distribution Functions
- Cumulative Distribution Functions
- Modeling Distributions
- Inferential Statistics
- Hypothesis Testing
- Implementation of Statistical Concepts in Python
MODULE 4 – MACHINE LEARNING MODELS IN PYTHON
Machines have increased the ability to interpret large volumes of complex data. Combine aspects of computer science with statistics to formulate algorithms that help machines draw insights from structured and unstructured data.
- Building Models Using Below Algorithms
- Linear and Logistics Regression
- Decision Trees
- Support Vector Machines (SVMs)
- Random Forests
- K Nearest Neighbour & Hierarchical Clustering
- Principal Component Analysis
- Text Analytics and Time Series Forecasting
MODULE 5 – DATA VISUALIZATION USING MATPLOTLIB
Complex data sets call for simple representations that are easy to follow. Visualize and communicate key insights derived from data effectively by using tools like Matplotlib.
- Interactive Visualizations with Matplotlib
MODULE 6 – DEEP LEARNING USING TENSORFLOW
Go beyond superficial analysis of data by learning how to interpret them deeply. Use deep-learning nets to uncover hidden structures in even unlabeled and unstructured data using TensorFlow.
- Basics of Neural Network
- Linear Algebra
- Implementation of Neural Network in Vanilla
- Basics of TensorFlow
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Generative Models
- Semi-supervised Learning using GAN
- Seq-to-seq Model
- Encoder and Decoder
MODULE 7 – HANDLING BIG DATA WITH SPARK
Lastly, manage your infrastructure with a data engineering platform like Spark so that your efforts can be focused on solving data problems rather than problems of machines.
- Introduction to Big Data & Spark
- RDD’s in Spark, Data Frames & Spark SQL
- Spark Streaming, MLib & GraphX
On CAMPUS COMPONENT
This program includes two on-campus components of 3 days each which will take place at IIIT Allahabad campus. On Campus session 1 will be held in early January 2021 and On Campus Session 2 will take place in early June 2021. The dates for On Campus sessions will be communicated in due course. Attendance to On Campus Component is mandatory for all participants of this course.