Data Engineer

Boston, MA

By applying and enforcing standard and best practices, you will design and develop scalable, 24×7 data and reporting services to process billions of events per day. Specific duties will include developing enhancements to our Hadoop/Hive based data pipeline and extend to streamed data processing; contributing to data reporting projects including generating internal and external reports or visualizations (CSV, Excel, PDF, interactive web graphics and others) and data distribution / reporting components, real-time bidding or optimization algo using a combination of SQL, Ruby, R, Go, Javascript, D3, Bourne shell scripts, cron jobs, and other relevant technologies. Will also be expected to implement new services and algorithms to improve our product.

The ideal candidate has hands-on data processing experience. Candidate must be familiar with current distributed computing data technologies on commodity servers and have experience with 24×7 production needs.

Requirements

• Strong SQL skills on large-scale databases, knowledge of any of the following RDBMS such as MySQL, Postgres, Redshift, Hive or others.
• Experience with Linux or Unix based systems – including Bourne shell, cron and other Unix utilities.
• Strong software development skills, including experience with Java.
• Degree in Computer Science or a related field

Desired Technology Experience

• Experience with at least one of Ruby/Python, Bourne Shell or other scripting languages.
• Experience with Hadoop/MapReduce and/or EMR, including experience in developing MapReduce Jobs in Java or developing Hive UDF.
• ETL experience maintaining multiple data systems.
• Experience with Ooozie or other Hadoop workflow solutions and experience developing complex data processing pipelines, including experience developing regressions tests and deployment strategies for such environments.
• Experience with data reporting solutions – either developed in house or with 3rd party solutions.
• Working experience developing and supporting 24×7 production data services and pipelines on Linux systems – including experience being on-call supporting such services. Experience with AWS preferred.

Challenges we are tackling

• Reliably processing billions of events per day with no data loss 24×7 on commodity hardware.
• Processing events in near real-time
• Building for the fragility of cloud and distributed services
• Complex processing of large amounts of data in an efficient manner.
• Reporting, distribution of data, data analysis, data visualization and machine learning algorithms.
• Low latency data stores for use in bidding or algo optimization.

Data Engineer

Share This Job