EMR on ACK is newly released to help enterprises efficiently build big data platforms

Alibaba Cloud EMR on ACK provides users with a new way to build a big data platform. Users can deploy open source big data services on Alibaba Cloud Container Service (ACK). Taking advantage of ACK's service deployment and high-performance and scalable container application management capabilities, users only need to focus on the big data jobs themselves. Users can easily execute Spark, Presto, and Flink jobs on the ACK cluster, which is 100% compatible with open source and has better performance than open source.

1. Background

Technology Trends

Separation of storage and computing, evolution towards cloud native Online business, AI, and big data are uniformly connected to the ACK cluster, peak-shifting scheduling, offline and online co-location, and improved machine utilization Unified operation and maintenance entry, unified operation and maintenance tool chain, and unified monitoring system Cluster-centric -> job-centric Multi-version support, for example, Spark 2.x and Spark 3.x can be run at the same time

Cloud Native Faces Challenges

Computing and storage separation: How to build an HCFS file system based on object storage OSS

Need to be fully compatible with the existing HDFS

Performance comparable to HDFS, with lower costs

Computing engine shuffle Data storage and computing separation: How to solve ACK hybrid heterogeneous models

Heterogeneous models do not have local disks

Community [Spark-25299] discussed and supported Spark dynamic resources, which became an industry consensus

ACK Scheduling Capabilities: How to Solve Scheduling Performance Bottlenecks

Performance benchmarking Yarn

Multi-level queue management

Peak-shifting scheduling

Leveraging the capabilities of the K8s operating system to orchestrate the peaks and troughs of various businesses

Advantages of EMR on ACK

Remote Shuffle Service provides a storage and computing separation solution for intermediate shuffle data

It can make computing nodes without local disk and cloud disk

Supports enabling Spark dynamic resource function, the ultimate solution for Spark-25299

JindoFS provides lake acceleration solutions for OSS storage

Block mode 1TB TPCDS scenario has more than 15% performance improvement

The scheduling layer supports Scheduler Framework V2

Scheduling performance is more than 3x higher than that of the community

Provide multi-level queue management

Engine Capability Enhancement

In the 10TB TPCDS Benchmark scenario, EMR Spark has a 3x performance improvement over the community

Hudi and DeltaLake have enhanced performance compared to community functions

Complete peak-shifting scheduling solution

2. EMR containerized architecture

EMR on ACK Architecture

Lightweight management and control, connecting to existing data platforms, submitting to different execution platforms through data development clusters/scheduling platforms for peak-shift scheduling, adjusting the cloud-native data lake architecture according to business peak and off-peak strategies, ACK has strong elastic expansion and contraction capabilities
ACK manages heterogeneous clusters with good flexibility

3. Product Introduction

Product Home

Reference link: https://www.aliyun.com/product/emapreduce

Create a new cluster

Region: Currently open to Hangzhou, Shanghai, Beijing, Shenzhen and other regions (continuously open)
Cluster type: Spark, Shuffle Service, Presto
Spark — a general-purpose distributed big data processing engine that provides ETL, offline batch processing, data modeling, and other capabilities

Shuffle Service — Provides optimized Shuffle service for EMR computing engine to solve the dependency problem on local disks under Kubernetes

Solve the network and disk IO bottlenecks of large-scale computing clusters

Supports computing and storage separation architecture and can serve multiple EMR clusters

Presto — A distributed SQL interactive query engine based on memory that supports multiple data sources

Suitable for complex analysis of PB-level massive data and cross-data source queries

Component version: Spark (3.1.1)
Dedicated nodes:
Existing ACK cluster, share some nodes to EMR

Create a new ACK cluster and select the entire cluster as a dedicated node

OSS Bucket: used to store jobs, logs, jar packages, and other information

Cluster Management

Cluster ID/Name: Click to enter job management

Cluster status: Check whether the cluster is available. ACK cluster: Can be associated with an existing ACK. Cluster configuration: Spark job configuration. Release: Release space.

<<: Looking at the future from the perspective of performance, how will operators enter the second half of the 5G competition?

>>: 5G has no presence? Wrong! It has already "bloomed in many places"

EPM business-finance integration is gaining popularity. How does FONE break through the boundaries of financial digitalization?

Blog

Can mobile phone numbers still be used normally if virtual operators go bankrupt? MIIT responds: Basic telecommunications companies will take over

Blog

Digital China 2019 Technology Annual Conference was held grandly: focusing on technological innovation, industry-university-research strategic cooperation, and promoting digital transformation

Blog

Understanding TCP/IP protocol stack HTTP2.0

Blog

edgeNAT is 30% off on Double 11, and you can apply for 50% off when you top up 500 yuan. Hong Kong/Korea/US CN2 is available

Blog

The fourth largest operator is preparing to release numbers, and will become a 5G market leader as soon as it enters the market

Blog

Important function! Borei Data APM helps enterprises to calmly cope with the evolution of cloud-native architecture

Blog

DediPath New Year's Day promotion: 1Gbps unlimited traffic VPS from $9 per year, dedicated server from $39/month, multiple data centers in Los Angeles and other places

Blog

80VPS: Los Angeles 8C cluster server 1000 yuan/month-E3-1240v5/16GB/1TB SSD/100TB@1Gbps bandwidth

Blog

Cloud, IPv6 and all-optical networks

Blog

VMISS issues 30% discount code again, Hong Kong VPS monthly payment starts from 3.5 Canadian dollars, Korea/Japan/Los Angeles CN2 GIA/9929/CMIN2 20% off

VMISS has once again released a 30% discount code...

EMR on ACK is newly released to help enterprises efficiently build big data platforms

EPM business-finance integration is gaining popularity. How does FONE break through the boundaries of financial digitalization?

Can mobile phone numbers still be used normally if virtual operators go bankrupt? MIIT responds: Basic telecommunications companies will take over

Digital China 2019 Technology Annual Conference was held grandly: focusing on technological innovation, industry-university-research strategic cooperation, and promoting digital transformation

Understanding TCP/IP protocol stack HTTP2.0

edgeNAT is 30% off on Double 11, and you can apply for 50% off when you top up 500 yuan. Hong Kong/Korea/US CN2 is available

The fourth largest operator is preparing to release numbers, and will become a 5G market leader as soon as it enters the market

Important function! Borei Data APM helps enterprises to calmly cope with the evolution of cloud-native architecture

DediPath New Year's Day promotion: 1Gbps unlimited traffic VPS from $9 per year, dedicated server from $39/month, multiple data centers in Los Angeles and other places

80VPS: Los Angeles 8C cluster server 1000 yuan/month-E3-1240v5/16GB/1TB SSD/100TB@1Gbps bandwidth

Cloud, IPv6 and all-optical networks

Recommend

Can we rely on HTTPS to keep us safe?

5G messaging is about to be launched in the commercial use countdown

5G will be the world's most intelligent and interconnected cloud computing

VMISS issues 30% discount code again, Hong Kong VPS monthly payment starts from 3.5 Canadian dollars, Korea/Japan/Los Angeles CN2 GIA/9929/CMIN2 20% off

HostDare: 10% off on CKVM series VPS, Los Angeles CN2 GIA line VPS starting at $44.99 per year

How to promote digital transformation? American communications giant AT&T teaches you a few tricks!

What changes will the integration of 5G and the Internet of Things bring?

Why do 5G mobile phones support more frequency bands?

Operators should not set traps for unlimited data packages

5G converged applications must be a “team competition”

5G mmWave filters: What is the best solution?

Flexible consumption model reduces IT expenses and helps investments

Five reasons why data center liquid cooling is on the rise

What technical support is needed to build a fixed network architecture based on SDN/NFV?

RackNerd US VPS starts at $9.89 per year, with multiple data centers in Los Angeles/Seattle, and supports Alipay