Crash Course on Data Science Pipeline
Building on the success of the premiere of Preferred.AI-designed enrichment course on Web Data Extraction & Regression Analysis Using Java, we staged another run of the course from 12 to 14 December 2018. It was attended by 21 cheerful undergraduates from SMU School of Information Systems.
The emphasis of the course was on the data science pipeline, i.e., the awareness that every stage of the pipeline would be important in deriving a useful outcome. Starting from how data was derived, we inculcated a healthy respect for exploring and processing the data, before building a model by engineering useful features and continually interpreting as well as refining the model.
The course was densely informative, packed with hands-on exercises based on our libraries Venom for focused crawling and CSV Processing Language for data manipulation and regression. In just two and half days, students learnt to solve practical data science problems, e.g., estimating the prices of property for sale or rental, used cars, and food, as well as predicting job salaries, all based on real-world data. All the teams landed their projects in time for evaluation at the end of the crash course, without crashing (pun intended).
For an inside look, enjoy our photo slideshow.