09-BooksAndCourses

Recommended Books, Courses, and Podcasts#

Contents#

About Books, Courses, and Podcasts#

This is a collection of books and courses I can recommend personally. They are great for every data engineering learner.

I either have used or own these books during my professional work.

I also looked into every online course personally.

If you want to buy a book or course and support my work, please use one of my links below. They are all affiliate marketing links that help me fund this passion.

Of course all this comes at no additional expense to you, but it helps me a lot.

You can find even more interesting books and my whole podcast equipment on my Amazon store:

Go to the Amazon store

PS: Don't just get a book and expect to learn everything

  • Course certificates alone help you nothing
  • Have a purpose in mind, like a small project
  • Great for use at work

Books#

Languages#

Java#

Learning Java: A Bestselling Hands-On Java Tutorial

Python#

Learning Python, 5th Edition

Scala#

Programming Scala: Scalability = Functional Programming + Objects

Swift#

Learning Swift: Building Apps for macOS, iOS, and Beyond

Data Science Tools#

Apache Spark#

Learning Spark: Lightning-Fast Big Data Analysis

Apache Kafka#

Kafka Streams in Action: Real-time apps and microservices with the Kafka Streams API

Apache Hadoop#

Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale

Apache HBase#

HBase: The Definitive Guide: Random Access to Your Planet-Size Data

Business#

The Lean Startup#

The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses

Zero to One#

Zero to One: Notes on Startups, or How to Build the Future

The Innovators Dilemma#

The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail (Management of Innovation and Change)

Crossing the Chasm#

Crossing the Chasm, 3rd Edition (Collins Business Essentials)

Crush It!#

Crush It!: Why Now Is The Time To Cash In On Your Passion

Community Recommendations#

Designing Data-Intensive Applications#

"In my opinion, the knowledge contained in this book differentiates a data engineer from a software engineer or a developer. The book strikes a good balance between breadth and depth of discussion on data engineering topics, as well as the tradeoffs we must make due to working with massive amounts of data." -- David Lee on LinkedIn

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Online Courses#

Preparation courses#

Course nameCourse descriptionCourse URL
The Bits and Bytes of Computer NetworkingThis course is designed to provide a full overview of computer networking. We’ll cover everything from the fundamentals of modern networking technologies and protocols to an overview of the cloud to practical applications and network troubleshooting.https://www.coursera.org/learn/computer-networking
Learn SQL | CodecademyIn this SQL course, you'll learn how to manage large datasets and analyze real data using the standard data management language.https://www.codecademy.com/learn/learn-sql
Learn Python 3 | CodecademyLearn the basics of Python 3, one of the most powerful, versatile, and in-demand programming languages today.https://www.codecademy.com/learn/learn-python-3

Data engineering courses#

Course nameCourse descriptionCourse URL
1. Data Engineering Basics
Introduction to Data EngineeringIntroduction to Data Engineering with over 1 hour of videos including my journey here.https://learndataengineering.com/p/introduction-to-data-engineering
Computer Science FundamentalsA complete guide of topics and resources you should know as a Data Engineer.https://learndataengineering.com/p/data-engineering-fundamentals
Introduction to PythonLearn all the fundamentals of Python to start coding quickhttps://learndataengineering.com/p/introduction-to-python
Python for Data EngineersLearn all the Python topics a Data Engineer needs even if you don't have a coding backgroundhttps://learndataengineering.com/p/python-for-data-engineers
Docker FundamentalsLearn all the fundamental Docker concepts with hands-on exampleshttps://learndataengineering.com/p/docker-fundamentals
Successful Job ApplicationEverything you need to get your dream job in Data Engineering.https://learndataengineering.com/p/successful-job-application
Data Preparation & Cleaning for MLAll you need for preparing data to enable Machine Learning.https://learndataengineering.com/p/data-preparation-and-cleaning-for-ml
2. Platform & Pipeline Design Fundamentals
Data Platform And Pipeline DesignLearn how to build data pipelines with templates and examples for Azure, GCP and Hadoop.https://learndataengineering.com/p/data-pipeline-design
Platform & Pipelines SecurityLearn the important security fundamentals for Data Engineeringhttps://learndataengineering.com/p/platform-pipeline-security
Choosing Data StoresLearn the different types of data stores and when to use which.https://learndataengineering.com/p/choosing-data-stores
Schema Design Data StoresLearn how to design schemas for SQL, NoSQL and Data Warehouses.https://learndataengineering.com/p/data-modeling
3. Fundamental Tools
Building APIs with FastAPILearn the fundamentals of designing, creating and deploying APIs with FastAPI and Dockerhttps://learndataengineering.com/p/apis-with-fastapi-course
Apache Kafka FundamentalsLearn the fundamentals of Apache Kafkahttps://learndataengineering.com/p/apache-kafka-fundamentals
Apache Spark FundamentalsApache Spark quick start course in Python with Jupyter notebooks, DataFrames, SparkSQL and RDDs.https://learndataengineering.com/p/learning-apache-spark-fundamentals
Data Engineering on DatabricksEverything you need to get started with Databricks. From setup to building ETL pipelines & warehousing.https://learndataengineering.com/p/data-engineering-on-databricks
MongoDB FundamentalsLearn how to use MongoDBhttps://learndataengineering.com/p/mongodb-fundamentals-course
Log Analysis with ElasticsearchLearn how to monitor and debug your data pipelineshttps://learndataengineering.com/p/log-analysis-with-elasticsearch
Airflow Workflow OrchestrationLearn how to orchestrate your data pipelines with Apache Airflowhttps://learndataengineering.com/p/learn-apache-airflow
Snowflake for Data EngineersEverything you need to get started with Snowflakehttps://learndataengineering.com/p/snowflake-for-data-engineers
dbt for Data EngineersEverything you need to work with dbt and Snowflakehttps://learndataengineering.com/p/dbt-for-data-engineers
4. Full Hands-On Example Projects
Data Engineering on AWSFull 5 hours course with complete example project. Building stream and batch processing pipelines on AWS.https://learndataengineering.com/p/data-engineering-on-aws
Data Engineering on AzureIngest, Store, Process, Serve and Visualize Streams of Data by Building Streaming Data Pipelines in Azure.https://learndataengineering.com/p/build-streaming-data-pipelines-in-azure
Data Engineering on GCPEverything you need to start with Google Cloud.https://learndataengineering.com/p/data-engineering-on-gcp
Modern Data Warehouses & Data LakesHow to integrate a Data Lake with a Data Warehouse and query data directly from fileshttps://learndataengineering.com/p/modern-data-warehouses
Machine Learning & Containerization On AWSBuild a app that analyzes the sentiment of tweets and visualizing them on a user interface hosted as containerhttps://learndataengineering.com/p/ml-on-aws
Contact Tracing with ElasticsearchTrack 100,000 users in San Francisco using Elasticsearch and an interactive Streamlit user interfacehttps://learndataengineering.com/p/contact-tracing-with-elasticsearch
Document Streaming ProjectDocument Streaming with FastAPI, Kafka, Spark Streaming, MongoDB and Streamlithttps://learndataengineering.com/p/document-streaming
Storing & Visualizing Time Series Data with InfluxDB and GrafanaLearn how to use InfluxDB to store time series data and visualize interactive dashboards with Grafanahttps://learndataengineering.com/p/time-series-influxdb-grafana
Data Engineering with HadoopHadoop Project with HDFS, YARN, MapReduce, Hive and Sqoop!https://learndataengineering.com/p/data-engineering-with-hadoop
Dockerized ETLLearn how quickly set up a simple ETL script with AWS TDengine & Grafanahttps://learndataengineering.com/p/timeseries-etl-with-aws-tdengine-grafana

Certifications#

Here's a list of great certifications that you can do on AWS and Azure. We left out GCP here, because the adoption of AWS and Azure is a lot higher and that's why I recommend to start with one of these. The costs are usually for doing the certification tests. We also added the level and prerequisites to make it easier for you make the decision which one fits for you.

PlatformCertification NamePriceLevelPrerequisite ExperienceURL
AWSAWS Certified Cloud Practitioner (maybe)100BeginnerFamiliarity with the AWS platform is recommended but not required.Link
AWSAWS Certified Solutions Architect300ExpertAWS Certified Solutions Architect - Professional is intended for individuals with two or more years of hands-on experience designing and deploying cloud architecture on AWS.Link
AWSAWS Certified Solutions Architect150IntermediateThis is an ideal starting point for candidates with AWS Cloud or strong on-premises IT experience. This exam does not require deep hands-on coding experience, although familiarity with basic programming concepts would be an advantage.Link
AWSAWS Certified Data Engineer150IntermediateThe ideal candidate for this exam has the equivalent of 2-3 years of experience in data engineering or data architecture and a minimum of 1-2 years of hands-on experience with AWS services.Link
AzureMicrosoft Certified: Azure Cosmos DB Developer Specialty165IntermediateLink
AzureMicrosoft Certified: Azure Data Engineer Associate - DP 203165IntermediateLink
AzureMicrosoft Certified: Azure Data Fundamentals99BeginnerLink
AzureMicrosoft Certified: Azure Database Administrator Associate165IntermediateLink
AzureMicrosoft Certified: Azure Developer Associate165IntermediateLink
AzureMicrosoft Certified: Azure Fundamentals99BeginnerLink
AzureMicrosoft Certified: Azure Solutions Architect Expert165ExpertMicrosoft Certified: Azure Administrator Associate certificationLink
AzureMicrosoft Certified: Fabric Analytics Engineer Associate165IntermediateLink
AzureMicrosoft Certified: Fabric Data Engineer Associate165IntermediateLink
AzureMicrosoft Certified: Power BI Data Analyst Associate165IntermediateLink

Podcasts#

Top five podcasts by the number of episodes created.

Super Data Science#

The latest machine learning, A.I., and data career topics from across both academia and industry are brought to you by host Dr. Jon Krohn on the Super Data Science Podcast.

Data Skeptic#

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Data Engineering Podcast#

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Roaring Elephant BiteSized Big Tech#

A weekly community podcast about Big Technology with a focus on Open Source, Advanced Analytics and other modern magic.

SQL Data Partners Podcast#

Hosted by Carlos L Chacon, the SQL Data Partners Podcast focuses on Microsoft data platform related topics mixed with a sprinkling of professional development. Carlos and guests discuss new and familiar features and ideas and how you might apply them in your environments.

Complete list#

Host namePodcast nameAccess podcast
Jon KrohnSuper Data Sciencehttps://www.superdatascience.com/podcast
Kyle PolichData Skeptichttps://dataskeptic.com/
Tobias MaceyData Engineering Podcasthttps://www.dataengineeringpodcast.com/
Dave RussellRoaring Elephant - Bite-Sized Big Techhttps://roaringelephant.org/
Carlos L ChaconSQL Data Partners Podcasthttps://sqldatapartners.com/podcast/
Jason HimmelsteinBIFocal - Clarifying Business Intelligencehttps://bifocal.show/
Scott HirlemanData Mesh Radiohttps://daappod.com/data-mesh-radio/
Jonathan SchwabishPolicyVizhttps://policyviz.com/podcast/
Al MartinMaking Data Simplehttps://www.ibm.com/blogs/journey-to-ai/2021/02/making-data-simple-this-week-we-continue-our-discussion-on-data-framework-and-what-is-meant-by-data-framework/
John David AriansenHow to Get an Analytics Jobhttps://www.silvertoneanalytics.com/how-to-get-an-analytics-job/
Moritz StefanerData Storieshttps://datastori.es/
Hilary ParkerNot So Standard Deviationshttps://nssdeviations.com/
Ben LoricaThe Data Exchange with Ben Loricahttps://thedataexchange.media/author/bglorica/
Juan SequedaCatalog & Cocktailshttps://data.world/resources/podcasts/
Wayne EckersonSecrets of Data Analytics Leadershttps://www.eckerson.com/podcasts/secrets-of-data-analytics-leaders
Guy GlantserSQL Server Radiohttps://www.sqlserverradio.com/
Eitan BluminSQL Server Radiohttps://www.sqlserverradio.com/
Jason TanThe Analytics Showhttps://ddalabs.ai/the-analytics-show/
Hugo Bowne-AndersonDataFramedhttps://www.datacamp.com/podcast
Kostas PardalisThe Data Stack Showhttps://datastackshow.com/
Eric DoddsThe Data Stack Showhttps://datastackshow.com/
Catherine KingThe Business of Data Podcasthttps://podcasts.apple.com/gb/podcast/the-business-of-data-podcast/id1528796448
The Business of Datahttps://business-of-data.com/podcasts/
James LeDatacasthttps://datacast.simplecast.com/
Mike DelgadoDataTalkhttps://podcasts.apple.com/us/podcast/datatalk/id1398548129
Matt HousleyMonday Morning Data Chathttps://podcasts.apple.com/us/podcast/monday-morning-data-chat/id1565154727
Francesco GadaletaData Science at Homehttps://datascienceathome.com/
Alli TorbanData Viz Todayhttps://dataviztoday.com/
Steve JonesVoice of the DBAhttps://voiceofthedba.com/
Lea PicaThe Present Beyond Measure Show: Data Storytelling, Presentation & Visualizationhttps://leapica.com/podcast/
Samir SharmaThe Data Strategy Showhttps://podcasts.apple.com/us/podcast/the-data-strategy-show/id1515194422
Cindi HowsonThe Data Chiefhttps://www.thoughtspot.com/data-chief/podcast
Cole Nussbaumer Knaflicstorytelling with data podcasthttps://storytellingwithdata.libsyn.com/
Margot GerritsenWomen in Data Sciencehttps://www.widsconference.org/podcast.html
Jonas ChristensenLeaders of Analyticshttps://www.leadersofanalytics.com/episode/the-future-of-analytics-leadership-with-john-thompson
Matt BradyZUMA: Data For Goodhttps://www.youtube.com/@zuma-dataforgood
Julia SchottensteinThe Analytics Engineering Podcasthttps://roundup.getdbt.com/s/the-analytics-engineering-podcast
Data Unlockedhttps://dataunlocked.buzzsprout.com/
Boris JabesThe Sequel Showhttps://www.thesequelshow.com/
Data Radicalshttps://www.alation.com/podcast/
Nicola AskhamThe Data Governancehttps://www.nicolaaskham.com/podcast
Boaz FarkashThe Data Engineering Showhttps://www.dataengineeringshow.com/
Bob HaffnerThe Engineering Side of Datahttps://podcasts.apple.com/us/podcast/the-engineering-side-of-data/id1566999533
Dan LinstedtData Vault Alliancehttps://datavaultalliance.com/category/news/podcasts/
Dustin SchimekData Ideashttps://podcasts.apple.com/us/podcast/data-ideas/id1650322207
Alex MercedThe datanationhttps://podcasts.apple.com/be/podcast/the-datanation-podcast-podcast-for-data-engineers/id1608638822
Thomas BustosLet's Talk AIhttps://www.youtube.com/@lets-talk-ai
Jahanvee NarangDecoding Data Analyticshttps://www.youtube.com/@decodingdataanalytics/videos