The Sr. Data Engineer (Quality/Operations) will be a part of an international team that designs, develops and delivers new applications for Koch Industries. Koch Industries is a privately held global organization with over 120,000 employees around the world, with subsidiaries involved in manufacturing, trading, and investments. Koch Technology Center (KTC) is being developed in India to extend its IT operations, as well as act as a hub for innovation in the IT function. As KTC rapidly scales up its operations in India, it’s employees will get opportunities to carve out a career path for themselves within the organization. This role will have the opportunity to join on the ground floor and will play a critical part in helping build out the Koch Technology Center (KTC) over the next several years. Working closely with global colleagues would provide significant international exposure to the employees.
The Enterprise data and analytics team at Georgia Pacific is focused on creating an enterprise capability around data lakes for operational and commercial data as well as helping businesses develop, deploy, manage monitor predictive and prescriptive models to create business value in the areas of manufacturing, operations, supply chain and other key areas.
What You Will Do In Your Role
- Daily support of AWS data lake environment which includes data pipeline management, troubleshooting and root cause analysis with the use of tools like Talend, Spark, EMR (Hadoop on AWS) and SQL
- Data Quality testing and support including creating and improving automated Data Quality checks in the data lake environment.
- Build automations for monitoring and alerting of of ELT/Quality jobs utilizing Python, Lambda, Glue and SQL.
- Develop enhancements to existing data pipelines while using best practices and latest technologies in data architecture, storage and access patterns.
- Assist Data Science teams with preparing, cleansing and delivering data for statistical analysis and data modeling.
- Document requirements and create system specifications as well as documentation of code. Will need to be able to present to technical teams across GP.
- Position will report to a domain leader at GP Atlanta is a part of the Data Engineering team reporting to the director of data engineering.
The Experience You Will Bring
- Bachelor’s degree in Engineering (preferably Analytics, MIS or Computer Science). Master’s degrees preferred.
- 9-13 years of IT experience.
- At least 3 to 5 years of development in Big Data and AWS Cloud (S3, Redshift, Aurora, Kinesis, Glue, Lambda, Hadoop/EMR, Hive,Sqoop, Spark, Docker/Containerizaion, Serverless Architecture)
- Programming / Scripting in Python is a MUST.
- SQL, Data Warehouse skills are a MUST.
- Data engineering concepts (ETL, near-/real-time streaming, data structures, metadata and workflow management)
- A passion and fearlessness for learning new technologies and methods.
- Ability to thrive in a team environment and juggle multiple priorities.
- Excellent written and verbal communication skills.
- Some work may involve hours outside of normal KTC works hours.
- Need to be a part of on-call production support on a rotation basis on weekends.
What Will Put You Ahead
- Markup Languages (JSON, XML), working experience of Java scripting.
- Experience of code management tools (Git/Azure DevOps)
- Experience using tools like Talend
- Experience using splunk
- Experience in CI/CD with AWS Fargate
- Active Big data development experience of 5 to 7 years will put you ahead.
- Good knowledge of cloud deployments of BI solutions including use of the AWS eco-system.
- Preferred - experience of developing backend data solutions for front end tools like Tableau, PowerBI and/or QlikSense.
- Ability to pull together complex and disparate data sources, warehouse those data sources and architect a foundation to produce BI and analytical content, while operating in a fluid, rapidly changing data environment