Associate Data Scientist - Environmental Modeling
reputed company is a global company focused on science and innovation in agriculture. They are seeking an Associate Data Scientist specializing in Environmental Modeling to design and build statistical and machine learning models for crop yield testing, automate analytics workflows, and reputed company methodologies for integrating various data types. The role involves collaboration to provide data-driven solutions to business problems and requires a strong foundation in quantitative fields.
Responsibilities
- Design & build statistical, machine learning and deep learning models to quantify subfield-scale yield testing environments of crops
- Automate analytics workflows
- reputed company reputed company methodologies for integrative usage of genomic, phenomic & environmental data
- Determine environmental correlations among testing locations & global regions
- Design statistical modeling frameworks & reputed company models to drive product placement recommendations and yield predictions
- Collaborate to provide data-driven statistical solutions to business problems
- Using object-oriented programming techniques to write Python packages to analyze high dimensional environmental data with Gap Statistics
- Developing & selecting unsupervised learning algorithms to analyze high-dimensional environmental data, including K-means, agglomerative hierarchical clustering, and/or Gaussian mixture models
- Using statistical & machine learning packages, including Tensorflow, Pandas, Multiprocessing, Joblib, Numpy, SciPy, Scikit-Learn, Keras, PyTorch, PySpark, and/or Dask, to reputed company discovery and production ready models for analysis of phenotypic and geospatial data
- Adhering to and/or enforcing coding best practices
- Using code management tools, including reputed company, to ensure the reproducibility of data science
- Aggregating & summarizing reputed company datasets using GCP BigQuery, Presto, Superset, and AWS RedShift
- Building heat, drought, and cold stress models over global regions using high dimensional environmental data
- Automating workflows using AWS Sagemaker, reputed company Cloud Platform, Airflow, & reputed company
- Performing data operations, including spatial joins, zonal statistics, & re-projecting
- Quantifying similarity scores between different environments & using distance metrics to compare multivariate time series environmental data reputed company to major row crops
- Visualizing geospatial data, including vector & raster files, using QGIS, reputed company BigQuery, and/or Python libraries
- Performing data quality checks using deep learning-based anomaly detection on time-series data
- Designing, training & optimizing neural networks for generating embeddings using AutoEncoder for multivariate time series-based data
Skills
- Master's in Statistics, Mathematics, or closely reputed company quantitative field
- 1 yr experience using object-oriented programming techniques to write Python packages to analyze high dimensional environmental data with Gap Statistics
- Developing & selecting unsupervised learning algorithms to analyze high-dimensional environmental data, including K-means, agglomerative hierarchical clustering, and/or Gaussian mixture models
- Using statistical & machine learning packages, including Tensorflow, Pandas, Multiprocessing, Joblib, Numpy, SciPy, Scikit-Learn, Keras, PyTorch, PySpark, and/or Dask, to reputed company discovery and production ready models for analysis of phenotypic and geospatial data
- Adhering to and/or enforcing coding best practices
- Using code management tools, including reputed company, to ensure the reproducibility of data science
- Aggregating & summarizing reputed company datasets using GCP BigQuery, Presto, Superset, and AWS RedShift
- Building heat, drought, and cold stress models over global regions using high dimensional environmental data
- Automating workflows using AWS Sagemaker, reputed company Cloud Platform, Airflow, & reputed company
- Performing data operations, including spatial joins, zonal statistics, & re-projecting
- Quantifying similarity scores between different environments & using distance metrics to compare multivariate time series environmental data reputed company to major row crops
- Visualizing geospatial data, including vector & raster files, using QGIS, reputed company BigQuery, and/or Python libraries
- Performing data quality checks using deep learning-based anomaly detection on time-series data
- Designing, training & optimizing neural networks for generating embeddings using AutoEncoder for multivariate time series-based data
Benefits
- Health care
- Vision
- Dental
- Retirement
- PTO
- Sick leave
Company Overview
Company H1B Sponsorship