The ideal candidate has hands-on data processing experience. Candidate must be familiar with current distributed computing data technologies on commodity servers and have experience with 24×7 production needs.
• Strong SQL skills on large-scale databases, knowledge of any of the following RDBMS such as MySQL, Postgres, Redshift, Hive or others.
• Experience with Linux or Unix based systems – including Bourne shell, cron and other Unix utilities.
• Strong software development skills, including experience with Java.
• Degree in Computer Science or a related field
Desired Technology Experience
• Experience with at least one of Ruby/Python, Bourne Shell or other scripting languages.
• Experience with Hadoop/MapReduce and/or EMR, including experience in developing MapReduce Jobs in Java or developing Hive UDF.
• ETL experience maintaining multiple data systems.
• Experience with Ooozie or other Hadoop workflow solutions and experience developing complex data processing pipelines, including experience developing regressions tests and deployment strategies for such environments.
• Experience with data reporting solutions – either developed in house or with 3rd party solutions.
• Working experience developing and supporting 24×7 production data services and pipelines on Linux systems – including experience being on-call supporting such services. Experience with AWS preferred.
Challenges we are tackling
• Reliably processing billions of events per day with no data loss 24×7 on commodity hardware.
• Processing events in near real-time
• Building for the fragility of cloud and distributed services
• Complex processing of large amounts of data in an efficient manner.
• Reporting, distribution of data, data analysis, data visualization and machine learning algorithms.
• Low latency data stores for use in bidding or algo optimization.