sci-kit learn and Pandas

I’ve gotten into horse racing. I am not a horse person, but I am an engineer. And like any engineer, I wanted to engineer a solution which could assist me in finding which horse might win a race. I decided on trying to use a technique in supervised machine learning called Learning to Rank to find the horses most likely to win.

During this project I tried different models for ranking. When I implemented ranking with sci-kit learns implementation of XGBoost, I found the documentation lacking and I was having a hard time progressing. …

This is the third and last part of my article series on how to get started with Machine Learning on GCP.
So far we have taken data from a source system and uploaded into Google BigQuery. In the second article, we used the interactive notebook environment on GCP to explore our data and identify some issues. Finally, we used Google Dataflow to address those issues in a data stream pipeline and load them into a datastore with clean and quality controlled data.

For the last…

In the last article we covered how to make our data accessible and available. Now that we have our raw data in a BigQuery table we can begin exploring and getting to know our data.
Data exploration is a highly iterative process. It helps to work in an interactive environment, where the data is kept in memory. This allows the user to make multiple transformations, visualizations and test hypothesis without needing to query the raw data multiple times. …

Over the past few months we have been working with improving data insights and innovation. We have achieved this by copying data from on prem systems to a new data platform on GCP.

The easy-to-use, and powerful tooling in the GCP Data Platform have proven valuable in setting up a complete data pipeline; going from an empty sheet to a functional machine learning (ML) setup proved to be both speedy and cost-effective.

In this series of articles, I will show how we can take data from an on prem system in a one time export, load it into GCP, and…

Simon Lind

Master of Science in Biotechnology Engineering with focus Bioinformatics. Cloud + ML + Data + Python + Java.

