Enrichment Course on Web Data Extraction and Regression Analysis
Fired with an educational zeal, Preferred.AI conducted a 5-day (Oct 3-8 2018) enrichment course on Web Data Extraction and Regression Analysis, which was organized by the SMU School of Information Systems in conjunction with the impending launch of its BSc (Computer Science).
We welcomed a group of 20 bright students from Singapore Polytechnic (SP), Ngee Ann Polytechnic (NP), and Temasek Polytechnic (TP). They are part of the prestigious Industry Preparation for Pre-graduate (iPREP) Programme or Infocomm Polytechnic (iPoly) Scholarship run by IMDA.
Through the course, we inculcated the appreciation of a data science pipeline, by requiring the students to go through the different stages of data collection, feature engineering, and building a prediction model using regression, culminating on an end-to-end project.
On Day 1, Ween anchored the data collection module of the course, which is based on Venom, a focused crawler framework for the deep web developed in-house by Preferred.AI and open-sourced for public use.
To get students to internalize the lessons meaningfully, they worked in small groups on a realistic project involving a specific Web site. Working with each group was a coach from Preferred.AI. That the groups managed to build working crawlers by the end of Day 2 spoke of their effective teamwork, the coaches’ able guidance, and Venom‘s powerful features.
On Day 3, Max instructed the students on how to use machine learning techniques such as linear regression and logistic regression to build prediction models. On Day 4, the project groups began training their models using the data they were collecting with the crawlers they’d built earlier.
In true SIS learning style, on Day 5 the student groups took turns presenting, and defending, their projects. The team camaraderie was palpable. The coaches were as anxious as their proteges. In just 5 short days, they bonded. In the post hoc feedback, 90% of the students rated the coaches as ‘Helpful’ or even ‘Awesome’, without whom the learning experience just wouldn’t be the same.
The projects were finally evaluated by Hady and a guest judge, Wu Huayu (VP, Data Science – DBS Bank). Huayu had earlier kindly shared his expertise by giving a talk titled “Applications of Data Science in Industries”, a wide-ranging coverage of the history of big data and AI and how their applications touched various industries including banking, manufacturing, etc.
We awarded two prizes. The Best Project was awarded to the group with the most creative, technically rigorous, impactful, and well-presented project. The Best Class Participation was awarded to the individual student who contributed the most to the class learning.
Once the prizes were announced, the winners were finally revealed. Yet there was no loser in sight. After all, when learning takes place, we’re all winners.