Job Description

Responsibilities

Operate on high-complexity data engineering and data analytics projects
Ensure scalability and security of data architectures in enterprise environments
Develop and optimize distributed data pipelines in enterprise contexts

Deep experience in distributed environments and Cloudera technologies
Proven experience with Big Data architectures based on Cloudera Data Platform (CDP) and Apache Spark
Strong knowledge of HDFS, Hive, Impala, HBase, Kafka and NiFi
Proficiency with YARN, Ranger, Knox, Atlas and data security and governance tools
Experience in data modeling and design of ETL/ELT pipelines
Knowledge of Scala, Python and SQL
Good understanding of microservices, containerization (Docker, Kubernetes) and REST APIs
Familiarity with Linux/Unix environments and advanced scripting
Experience with monitoring tools and performance tuning for Spark and Cloudera
Experience in Public Administration or regulated environments is a plus

Solid experience in Cloudera and Apache Spark environments
At least 3 years' experience developing applications on Apache Spark (Core, SQL, Streaming)
Deep knowledge of the Cloudera ecosystem (HDFS, Hive, Impala, Oozie, NiFi)
Strong proficiency in Scala and Python
Experience managing and optimizing Spark jobs in clustered environments
Knowledge of Kafka for real-time ingestion
Familiarity with Git, Jenkins, Hirebase/CD and DevOps best practices
Experience in query tuning, data ingestion pipelines and data transformation
Basic knowledge of Linux, shell scripting and distributed systems
Attention to detail and ability to work in structured environments
Good communication skills and a team-oriented attitude
Commitment to continuous improvement and adoption of quality standards

Apply to this job