Thanks for reading. Here you will find a huge range of information in text, audio and video on topics such as Data Science, Data Engineering, Machine Learning Engineering, DataOps and much more. The show notes for “Data Science in Production” are also collated here.

Recommender Systems — Using Snorkel for Relevance Labelling

In this example we will look at how Snorkel can be used as part of a recommender system to label the relevance of different books for a user. The dataset used will be an augmented and normalised version of the Goodreads dataset, containing user-book pairings and extensive metadata on each book.

Alexander BillingtonJanuary 19, 2022Comment

On-Premise Self Hosted Integration Runtime for Azure Data Factory: How to configure with Private Endpoints and a Proxy Server

In this post we look at a very specific configuration of an Azure Data Factory Self Hosted Integration Runtime (SHIR): 1. The SHIR is installed on an on-prem machine, 2. The on-prem machine uses a proxy server, and 3. The SHIR has to talk to the Data Factory resource via a Private Endpoint. This post is aimed at people who are familiar with Azure Data Factory.

Grace O'HalloranJanuary 19, 2022 Comments

Getting Started with Graph Analysis in NetworkX

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex graphs. It’s a really cool package that contains heaps of graph algorithms for all different uses. In this tutorial, I will cover how to create a graph from an edge list and different ways we can query it.

Tori TompkinsJanuary 18, 2022Comment

A Beginner’s Guide To Understanding Feature Stores

Feature stores are rapidly gaining popularity in the machine learning environment. Find out what feature stores are all about and the benefits they offer when implemented in a machine learning pipeline.

Data Science, AIGavita RegunathJanuary 13, 2022AI, Machine LearningComment

How to Fix Different Types of Model Drift

Model drift refers to the decline of model performance due to changes in data and relationships. Most drift is caused by things entirely out of our control so while we can’t stop it from happening, we can identify and mitigate it.

Tori TompkinsJanuary 11, 2022Comment

10 Commonly Used Data Wrangling Codes Using Pandas And PySpark.

Data, Advancing Spark, Data Science, AIAyodeji OgunlamiJanuary 11, 2022Python, PySpark, Data Wrangling, Data ScienceComment

Optimisation in Business

Data ScienceLuke MenziesJanuary 11, 2022Data Science, Optimization, Business IntelligenceComment

DevOps for Databricks: Using Rest API & Python in YAML CI/CD Pipelines

Anna WykesJanuary 4, 2022Comment

Blog