The ideal candidate has hands-on data processing experience. Candidate must be familiar with current distributed computing data technologies on commodity servers and have experience with 24×7 production needs.
- Strong SQL skills on large-scale databases, knowledge of any of the following RDBMS such as MySQL, Postgres, Redshift, Hive or others.
- Experience with Linux or Unix based systems – including Bourne shell, cron and other Unix utilities.
- Strong software development skills, including experience with Java.
- Degree in Computer Science or a related field.
Desired Technology Experience
- Experience with at least one of Ruby/Python, Bourne Shell or other scripting languages.
- Experience with Hadoop/MapReduce and/or EMR, including experience in developing MapReduce Jobs in Java or developing Hive UDF.
- ETL Experience maintaining multiple data systems.
- Experience with Ooozie or other Hadoop workflow solutions and experience developing complex data processing pipelines, including experience developing regressions tests and deployment strategies for such environments.
- Experience with data reporting solutions – either developed in house or with 3rd party solutions.
- Working experience developing and supporting 24×7 production data services and pipelines on Linux systems – including experience being on-call supporting such services. Experience with AWS preferred.
Challenges we are tackling
- Reliably processing billions of events per day with no data loss 24×7 on commodity hardware.
- Processing events in near real-time.
- Building for the fragility of cloud and distributed services.
- Complex processing of large amounts of data in an efficient manner.
- Reporting, distribution of data, data analysis, data visualization and machine learning algorithms.
- Low latency data stores for use in bidding or algo optimization.
Comp & Benefits
- Competitive comp based on experience level
- Healthcare HMO & PPO
- Stock options and 401k
- Flexible Spending and Transit Reimbursement Accounts