June 1, 2019June 1, 2019

Top Skills for a Data Scientist – 2019

Facebook

Twitter

Tumblr

Synopsis

We look at the top key skills required for a data scientist updated for 2019. We do so by gathering 2000 job posts and using text mining to retrieve information. We also use algorithms to find out how each skill is related to another. Finally, we look at implications of identifying the in-demand skills that will effect the workforce and economy. Read More “Top Skills for a Data Scientist – 2019” →

October 6, 2017June 1, 2018

Data Science Webinar

Data Science Machine Learning Natural Language Processing Statistical Analysis Text Mining Visualization

Facebook

Twitter

Tumblr

I was recently invited to deliver a webinar regarding the data science field. I’d like to share my presentation slides. Click on the full screen icon below to watch it on a larger display. Feel free to contact me if you have further questions. Stay tuned for my next webinar.

April 11, 2016April 11, 2016

Open Data Day: Code and the City

Data Science Machine Learning Natural Language Processing Text Mining

Facebook

Twitter

Tumblr

I was a participant in a hackathon called Code and the City. The event was held in celebration of Open Data Day. Along with industry sponsors like Soti, Amazon, Microsoft and Cisco, the event sponsors included the City of Mississauga and Sheridan College.

codeandthecity

The idea was to answer a problem set that would benefit the City of Mississauga with a population of almost 800,000 using open data:

How can Mississauga gain greater awareness and engagement with the community in a digital environment?

March 14, 2016March 14, 2016

Wearable Fitness Tracker Predictive Modeling

Data Science Machine Learning Projects

Facebook

Twitter

Tumblr

Synopsis

This report was created for a Canadian startup that builds wearable fitness trackers used in gyms and an accompanying mobile application. My solution yielded the best actual results among all report submissions from select individuals with highly qualified backgrounds. Some of the code has intentionally been removed.

May 11, 2015June 1, 2018

Human Activity Recognition and Machine Learning

Data Science Machine Learning Projects

Facebook

Twitter

Tumblr

Synopsis

Human Activity Recognition is emerging as a new field where wearable devices are commonly used to quantify the amount of time an activity is performed. In our analysis, we instead look at how well weight lifting exercises were performed in a study. Each individual in the experiment had various accelerometer data collected from devices on different parts of the body while performing barbell exercises in five different ways. We developed machine learning algorithms that predict the way they were performed based on accelerometer data. Our final model that gave us a 100% In Sample accuracy and a 99.0% Out of Sample accuracy was the random forest algorithm with a 10-fold cross-validation repeated 5 times.

May 6, 2015June 8, 2018

Parallelize Machine Learning in R with Multi-Core CPUs

Data Science Machine Learning

Facebook

Twitter

Tumblr

R supports parallel computations with the core parallel package. What the doParallel package does is provide a backend while utilizing the core parallel package. The caret package is used for developing and testing machine learning models in R. This package as well as others like plyr support multicore CPU speedups if a parallel backend is registered before the supported instructions are called.

The train instruction of the caret package has built-in support for parallel backends, but you have to call and set it up. If you don’t register a backend, train will resort to single-core computations. With a registered parallel backend, any caret model training will use multi-cores of the CPU, since by default the trainControl argument is already set as allowParallel=TRUE.
Read More “Parallelize Machine Learning in R with Multi-Core CPUs” →

Data Acumen

Notes from the life of a data scientist and electrical engineer

Top Skills for a Data Scientist – 2019

Synopsis

Data Science Webinar

Open Data Day: Code and the City

Wearable Fitness Tracker Predictive Modeling

Synopsis

Human Activity Recognition and Machine Learning

Synopsis

Parallelize Machine Learning in R with Multi-Core CPUs