Aller au contenu. | Aller à la navigation

Outils personnels

Navigation

Voir le monde en interaction

  • Logo CNRS
  • Logo ENSL
  • logo-enssib.jpg
Vous êtes ici : Accueil / Agenda / Séminaires / Research school on Data Mining: Statistical Modeling and Learning from Data

Research school on Data Mining: Statistical Modeling and Learning from Data

The course aims to provide basic skills for analysis and statistical modeling of data, with special attention to machine learning both supervised and unsupervised.
Quand ? Du 11/01/2016 à 09:30
au 15/01/2016 à 16:45
Où ? ENS Lyon site Monod, Amphi B
S'adresser à
Participants Teachers:
Ciro Cattuto,
Laetitia Gauvin
et André Panisson
ISI Torino)
Ajouter un événement au calendrier vCal
iCal

An important objective of the course is the operational knowledge of the techniques and algorithms treated, and for this aim the lectures will focus on both theoretical and practical aspects of machine learning, and for the practical part it is required to have a good knowledge of programming, preferentially in Python language. The expected outcomes include (1) understanding the theoretical foundations of machine learning and (2) ability to use some Python libraries for machine learning in the context of simple applications.

 
Topics will include:
– The major paradigms of learning from data, the learning problem, the feasibility of learning
– The architecture of machine learning algorithms: model structure, scoring, and model selection ­ The theory of generalization, model complexity, the approximation­generalization tradeoff, bias and variance, the learning curve
– Score functions and optimization techniques. Gradient descent and stochastic gradient descent.
– Validation and Cross­Validation: validation set, leave­one­out cross validation, K­fold cross­validation
– Linear Models: linear classification, linear regression, ordinary least squares, logistic regression, non­linear transformations
– Non­linear models for classification: support vector machines, tree models, nearest­neighbor methods, Naive Bayes
– Overfitting and Regularization: model complexity and overfitting, commonly used regularizers, Lasso.
– Unsupervised learning: cluster analysis, the K­means algorithm, hierarchical clustering
– Feature selection and dimensionality reduction: Singular Value Decomposition, Matrix Factorisation
– Information retrieval, text representation and classification, term weighting
 
Overview of the theoretical aspects of machine learning will be followed by the application of algorithms in real problems such as: image classification, text mining, spam detection… The exercises will be implemented with the help of an interactive Python environment, with the use of standard tools for data analysis and visualization, such as the Scientific Python stack, Scikit­Learn, Pandas and NLTK.

Plus d'information sur cet événement…