Thanks for reading. Here you will find a huge range of information in text, audio and video on topics such as Data Science, Data Engineering, Machine Learning Engineering, DataOps and much more. The show notes for “Data Science in Production” are also collated here.

10 reasons why Azure Databricks for Machine Learning Rocks!

Struggling to choose a machine learning platform for building and deploying models? Check out our blog on 10 reasons why Azure Databricks for machine learning is a great choice.

AI, Machine Learning, Data ScienceGavita RegunathFebruary 14, 2023Machine Learning, AIComment

Testing Data

This blog post will cover the topic of testing data within DBT, focusing on the easiest aspect first - verifying the reliability of the query applied to the data. The author will discuss the importance of defining what is being tested and how this can impact the validity of the data. The other testing questions, more aligned with integration and regression testing, will be saved for another time.

Ust OldfieldFebruary 9, 2023dbt, sql, testing, databricks, data, anayticsComment

Image Classification — Dealing with Imbalance in Datasets

Image classification is a standard computer vision task and involves training a model to assign a label to a given image, such as a model to classify images of different root vegetables. A big problem with classification is bias, and the models favouring a particular image class above the others. A common cause of this can be dataset imbalance, and it is often hard to spot as a model trained on an imbalanced dataset can often still have good accuracy. E.g. if there are 1000 images in the test dataset, 950 potatoes and 50 carrots and the model predicted all 1000 images to be potatoes it would still have 95% accuracy. This is also an example of why more metrics than accuracy should be considered… but let’s leave that discussion for another day.

Alexander BillingtonFebruary 2, 2023Comment

Blog