Paris-Saclay Data Science Master

The two-year master allows students to gain the necessary skills for handling today’s massive amount of data, characterized by its 4 V’s: volume, velocity, variety and veracity. The curricula contains courses along 3 axis : 1) massive data management, both theoretically (principle of DBMS, NoSQL and graph databases) and applied, 2) artificial intellligence, both symbolic and numerical (machine learning), and 3) data mining and knowledge management. Theoretical bases are complemented by applied projects and research or industrial internships.

To validate the master, students need to validate 120 ECTS, in a combination of mandatory classes, optional classes, soft skills classes, and projects/internships.

In the first year (M1), the following mandatory courses need to be validated:

  • [DS] Bases de données avancées I : Optimisation
  • [DS] Bases de données avancées II : Transactions
  • [DS] Intelligence Artificielle, Logique et Contraintes I
  • [DS] Intelligence Artificielle, Logique et Contraintes II
  • [DS] Distributed Systems for Massive Data Management

  • [AI] Machine Learning
  • [AI] Datacomp 2

In the second year (M2), the following mandatory courses need to be validated:

  • [DS] Algorithms for Data Science
  • [DS] Semantic Web and Ontologies
  • [DS] Social and Graph Data Management
  • [DS] Knowledge Discovery from Graph Data
  • [DS] Data Science Project

  • [AI] Optimization
  • [HCI] Visual Analytics

For optional classes, students can choose any course from the other master in the computer science department. The full list can be found [here]. Exceptionally, students admitted directly in the second year can choose first-year mandatory Data Science classes as their optionals.

Schedule: first classes week of September 7th, 2020 with orientation on September 4th, 2020. The schedule of the track and the master in general can be found [here]. Classes will take place in building [PUIO] of Paris-Saclay University and building [Eiffel] of CentraleSupélec.

Contacts: [Silviu Maniu] (first year, overall), [Fatiha Saïs] (second year, internships), [Alexandre Verrecchia] (administrative issues)

Master Presentation (French) [pdf]

Other Links [Computer Science Master] (French), [AI Track] (English) [PDCS Track] (English)

Data Science Project (M2 Data Science, U. Paris-Saclay)

Language: English Last version: 2020–2021

Online via [eCampus]

Schedule:

  • 08/01/2021: Project Presentation – Collaborative Filtering-Based Systems [pdf]
  • 15/01/2021: Team composition [list]
  • 22/01/2021: First presentation [guidelines]

Datasets:

  • GroupLens - MovieLens ratings of movies, also contains tags of movies

Bibliography:

  1. R. Chen, Q. Hua, Y.-S. Chang, B. Wang, L. Zhang, X. Kong. “A Survey of Collaborative Filtering-Based Recommender Systems: From Traditional Methods to Hybrid Methods Basedon Social Networks”. IEEE Access, 2018 [pdf]
  2. J. Leskovec, A. Rajaraman, J. Ullman. “Mining of Massive Datasets”. (chapters 9, 3, 11) [site]

Social and Graph Data Management (M2 Data Science, U. Paris-Saclay)

Language: English Last version: 2020–2021

Course and labs are via eCampus

Lectures:

  • 06/11/2020: Introduction [pdf]; Graph Models [pdf] – recording available [here]
  • 13/11/2020: Degree Correlations [pdf] – recording available [here]
  • 20/11/2020: Network Robustness [pdf]; Communities [pdf] – recording available [here]
  • 27/11/2020: Spreading Phenomena [pdf]; Social Influence [pdf] – recording available [here]
  • 11/12/2020: Node Measures [pdf]

Labs:

Project:

  • 27/11/2020: Network Analysis Project [pdf] – deadline January 8th 2021, 23:59 CET

Exam:

  • 18/12/2020: 14:00 CET, room D203 (PUIO)

References:

  1. A.-L. Barabási. “Network Science.” Cambridge University Press [site]
  2. M. Newman. “Networks: An Introduction.” Oxford University Press
  3. D. Easley, J. Kleingber. “Networks, Crowds, and Markets.” Cambridge University Press [site]

Bases de données (Polytech APP3, U. Paris-Saclay)

Langue : Français Dernière version : 2020–2021

Ce cours reprend le cours BD2 (L3 Info Paris-Saclay) par Emmanuel Waller

Seances de cours :

  • 15/10/2020 : Introduction [pdf], Modèle [pdf] , Mises à jour [pdf], Persistance [pdf], Interrogation [pdf]
  • 15/10/2020 : Contraintes [pdf]
  • 19/10/2020 : PL/SQL - Intro [pdf], Bases [pdf]
  • 19/10/2020 : PL/SQL - Curseurs [pdf]
  • 07/12/2020 : JDBC [pdf1] [pdf2]
  • 11/12/2020 : JDBC [pdf1] [pdf2]

TD/TP :

Cahier de charges pour les TD/TP [pdf] ; Instructions de connexion à la base Oracle [pdf]

  • 15/10/2020 : Mises à jour [pdf]; corrigé [sql]
  • 15/10/2020 : Contraintes [pdf]; corrigé [sql]
  • 19/10/2020 : PL/SQL Bases [pdf]; corrigé [sql]
  • 22/10/2020 : PL/SQL Curseurs [pdf]; corrigé [sql]
  • 07/12/2020 : JDBC 1 [pdf] [Menu.java]; corrigé [java]
  • 07/12/2020 : JDBC 2 [pdf]; corrigé [java]
  • 11/12/2020 : JDBC 3 [pdf]; corrigé [java]
  • 17/12/2020 : JDBC 4 [pdf]

Exemples :

Algorithms for Data Science (M2 DataScience, U. Paris-Saclay)

Language: English Last version: 2020–2021

Lectures:

  • 11/09/2020: Intro [slides], Frequent Itemsets [slides], Finding Similar Items [slides]; recording available [here]
  • 25/09/2020: Data Streams I [slides]; recording available [here]
  • 02/10/2020: Data Streams II [slides]; recording available [here]
  • 16/10/2020: Advertising on the Web [slides]; recording available [here]

Labs:

Project:

  • 09/10/2020: [text] – deadline November 9th, 23h59 CET

Exam:

References:

  1. J. Leskovec, A. Rajaraman, J. Ullman. “Mining of Massive Datasets”. [site]

Web Data Models (M2 Data&Knowledge, U. Paris-Saclay)

Language: English Last version: 2018–2019

Course dates and slides:

Practical labs and project:

References:

  1. Makoto Murata, Dongwon Lee, Murali Mani, and Kohsuke Kawaguchi. 2005. “Taxonomy of XML schema languages using formal language theory”. ACM Trans. Internet Technol. 5, 4, 660-704. [paper]
  2. Georg Gottlob, Christoph Koch, and Reinhard Pichler. 2005. “Efficient algorithms for processing XPath queries”. ACM Trans. Database Syst. 30, 2, 444-491. [paper]
  3. Todd J. Green, Ashish Gupta, Gerome Miklau, Makoto Onizuka, and Dan Suciu. 2004. “Processing XML streams with deterministic automata and stream indexes”. ACM Trans. Database Syst. 29, 4, 752-788. [paper]
  4. Michael Benedikt and Christoph Koch. 2009. “XPath leashed”. ACM Comput. Surv. 41, 1, Article 3, 54 pages. [paper]
  5. Thomas Schwentick. 2004. “XPath query containment”. SIGMOD Rec. 33, 1, 101-109. [paper]
  6. Gerome Miklau and Dan Suciu. 2004. “Containment and equivalence for a fragment of XPath”. J. ACM 51, 1, 2-45. [paper]
  7. Felipe Pezoa, Juan L. Reutter, Fernando Suarez, Martín Ugarte, and Domagoj Vrgoc. 2016. “Foundations of JSON Schema”. ACM WWW. [paper]

Useful reading:

  • C. Maneth’s course “XML and Databases” [page]
  • S. Abiteboul et al. “Web Data Management”. 2011. Cambridge University Press [page]
  • H. Comon et al. “Tree Automata Techniques and Applications”. 2007 [page]
  • W3Schools tutorials [site]

Previous exams: 2015–2016 [pdf], 2017–2018 [pdf]

Architectures for Massive Data Management (M2 Data&Knowledge, U. Paris-Saclay)

Language: English Last version: 2018–2019

Courses:

  • 02/10/2018: JSON Stores [slides]
  • 23/10/2018: Graph Stores [slides]

Practical labs: