Join our growing team today

Data Engineer

An experienced data engineer using various methods to transform raw data into useful data systems. You will be responsible for designing, developing, testing, and deploying data pipelines, data warehouses, data lakes, and data products that support business needs. You will also work closely with data analysts, data scientists, and other stakeholders to ensure data quality, reliability, and availability. For example, you will develop and configure programs and jobs for data ingestion, transformation, data enhancement, efficient DB design and storage, and accessible by data end consumers. Overall, you’ll strive for efficiency by aligning data systems with business goals. In this data engineering position, you should have strong data analysis skills with the ability to combine, correlate and resolve data conflicts from different sources. You will need skills and experience with several programming languages and knowledge of machine learning methods is a plus.

Job Duties

Utilize and optimize Apache Spark for distributed data processing, handling both batch and stream processing
workloads.
Design, develop, and maintain scalable data pipelines for processing and analyzing large datasets.
Collaborate with cross-functional teams to understand data requirements and implement effective solutions.
Implement ETL (Extract, Transform, Load) processes to ingest and transform data from various sources into usable formats.
Implement data quality checks, data validation, and data governance processes to ensure data accuracy and
consistency.
Develop and maintain data models, schemas, and metadata to support data analysis and reporting.
• Create and manage data warehouses, data lakes, and data marts using cloud platforms such as AWS, Azure, or GCP.
Collaborate with data analysts, data scientists, and other business users to understand their data needs and provide data solutions.
Collaborate with technical, including DevOps, engineering, and compliance, to ensure seamless cloud implementation and adherence to best practices.
Develop data and cloud architecture documentation, including diagrams, guidelines, and best practices for reference and knowledge sharing.
Troubleshoot and resolve data pipeline issues, ensuring minimal downtime and data integrity.
Optimize data pipelines for performance, reliability, and data quality, utilizing best practices in data engineering.
Build algorithms and prototypes that combine raw information from different sources.

Required Skills

Bachelor’s degree in Computer Science, Engineering, Mathematics, or related field, or equivalent work experience.
3+ years of experience in data engineering or related roles.
Extensive experience with Apache Spark for large-scale data processing, including RDDs, DataFrames, and Spark SQL.
Familiarity with components like HDFS, MapReduce, Hive, and HBase.
Experience with both SQL and NoSQL databases, such as MySQL, PostgreSQL, DynamoDB.
Proficient in SQL and at least one programming language such as Python, etc.
Experience with data pipeline orchestration and scheduling tools such as AWS Step Functions, Air Flow, etc.
Experience with cloud-based data platforms and services such as AWS, Azure, or GCP.
Experience with data warehouse and data lake design and architecture.
Experience with data quality, data testing, and data governance methodologies and tools.
Strong analytical and problem-solving skills, attention to detail, and communication skills
Ability to work independently and collaboratively in a fast-paced environment.
Experience working with a modern data catalog such as Alation, Collibra, etc. is a plus.
Ability to prepare data for prescriptive and predictive modeling is a plus.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Join our growing team today

Data Engineer

Job Duties

Required Skills

What information do we collect?

How do we use the information?

Do we use cookies?

Your Consent