Data is the most important asset for every company in today’s era. Data engineering as a role is getting more familiar in recent times. The purpose of this article to share the details about the skills required to be a successful data engineer.
Data Engineering is a discipline in engineering which focuses on handling the data transformation process to provide business teams with actionable insights. It covers the wide spectrum in the lifecycle of data which transforms from raw format to an actionable insight. Below are the primary responsibilities of a data engineering team.
AWS announced its new-generation redshift node types named RA3 which decouples the compute and storage enabling users to manage and scale independently based on needs.
The purpose of article to introduce RA3 node types and provide a comparison with existing node types. In the end I will also discuss about the migration options to RA3 nodes Let’s get started.
RA3 nodes are built on the next generation Nitro powered compute instances which comes with high-bandwidth networking, managed storage that uses local SSD-based storage backed by Amazon Simple Storage Service (S3).
RA3 instances with managed storage use high performance SSDs for…
Flask is a web framework for python that provides a simple interface for dynamically generating responses to web requests.
Docker is an open-source application that allows administrators to create, manage, deploy, and replicate applications using containers.
The purpose of this article is to provide step-by-step instructions for running a FLASK app integrated with gunicorn and NGINX running inside a single container hosted in AWS EC2 .
Amazon Redshift is a fully managed petabyte scale datawarehouse designed to handle large scale datasets, perform data analysis and business intelligence reporting.
Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes.
The scope of this article is to share the table design practices which showed significant performance improvements.
Amazon Redshift is based on MPP architecture in which cluster is the core component. A cluster is composed on leader and compute nodes.
Leader Node: Coordinates the compute nodes and handles external communication.