Synthetic program: Introduction to Machine Learning
Motivations of machine learning. Machine learning, artificial intelligence and big data. Applications of machine learning. Representation of input data. Machine learning process.
Exploratory data analysis
Data validation and cleansing, outlier and missing values detection. Data transformation. Data reduction. Sampling. Feature selection. Features extraction by filtering. Principal component analysis. Data discretization. Univariate analysis: graphical analysis, measures of central tendency, dispersion, relative location, heterogeneity, analysis of the empirical density. Bivariate analysis: graphical analysis, measures of correlation, contingency tables. Multivariate analysis: graphical analysis, measures of correlation.
Supervised learning: classification and regression
Taxonomy of supervised methods. Evaluation of classification models: holdout, cross-validation, confusion matrix and derived metrics, ROC curve, cumulative gain and lift. Treatment of categorical attributes. Nearest neighbor. Classification and regression trees: splitting, stopping and pruning. Bayesian methods: naive methods, Bayesian networks. Logistic regression. Neural networks: Rosenblatt perceptron, multi-level feed-forward networks. Support vector machines: structural risk minimization, maximal margin hyperplane for linear separation, nonlinear separation. Simple and multiple linear regression. Assumptions on residuals. Least square regression: normality and independence of residuals, significance of coefficients, analysis of variance, coefficients of determination and linear correlation, multicollinearity, confidence and prediction limits. Selection of predictive variables. Ridge regression. Generalized linear regression.
Association rules
Motivation and evaluation of association rules. Single-dimension association rules. Apriori algorithm. Generation of frequent itemsets, generation of strong rules. General association rules.
Clustering
Taxonomy of clustering methods. Affinity measures. Partition methods: K-means, K-medoids. Hierarchical methods: agglomerative methods, divisive methods. Evaluation of clustering models.
Applications and use cases
Introduction to Python programming language and its main libraries for machine learning (Scikit-learn, Keras). Applications in relational marketing using Python: lifetime value analysis, acquisition, retention, cross-selling, market basket analysis. Web mining. Social market analysis. Speech recognition. Text mining. Fraud and anomaly detection. Bioinformatics.
Lecture Notes
Complete course:
Divided by topic:
Type |
File name |
Year |
File not available... |
Other:
Type |
File name |
Year |
File not available... |
Exercises
Complete course:
Type |
File name |
Year |
File not available... |
Divided by topic:
Type |
File name |
Year |
File not available... |
Other:
Type |
File name |
Year |
File not available... |
Exams
First partial exam:
Type |
Date |
Text |
20/11/2014 |
Second partial exam:
Type |
Date |
Text |
26/01/2015 |
Full exam:
Oral exam:
Type |
Date |
File not available... |
Multiple choice test:
Type |
Date |
File not available... |
Other:
File name |
File not available... |
Other
Laboratory:
File name |
Year |
File not available... |
Projects:
File name |
Year |
File not available... |
Presentations:
File name |
Year |
File not available... |
Collections of notes, exercises or exams:
Tables:
File name |
Year |
File not available... |
Etc:
File name |
Year |
File not available... |
Live
Quick daily notes, exercises and audio recordings. Files will be approved on priority but deleted after 365 days. 2 points will be assigned by default.
Quick contents:
File name |
Date |
File not available... |