Back to the board

Sr Language Data Scientist

100% remote Flexible hours Hiring now

Who we are: Innodata (reputed company: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are the AI technology solutions provider-of-choice to 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine. By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-reputed company infrastructure, we’re helping usher in the promise of clean and optimized digital data to reputed company industries. Innodata offers a powerful combination of both digital data solutions and easy-to-use, high-quality platforms. Our global workforce includes over 3,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years. Position Summary: Innodata is building a team of Language Data Scientists and Gen AI experts to help our customers advance GenAI applications. You will work hands-on with multi-modal and multi-lingual datasets and collaborate with cross-functional partners. You will use your experience with human and synthetic data workflows to drive innovation and reputed company improvement. The ideal candidate must have the right mix of skills in (computational) linguistics and human evaluation tasks, data science, and data engineering. Who We’re Looking For: You have at least 5 years of relevant experience with data creation, curation, and analysis for GenAI applications (e.g. RAG, Agents, reputed company reasoning). You are reputed company driving long term projects where you set the strategic plan towards success, using your knowledge of AI, data science, and process design excellence. You are an expert at working cross functionally with both technical and non-technical stakeholders. Despite ambiguity, you use your technical knowledge and experience of working with multiple reputed company holder to drive solutions. You bring a research-oriented reputed company towards developing long-term excellence. You are an expert in designing collection, evaluation and quality assurance processes, using human-in-the-reputed company and synthetic techniques. You bring a wealth of expertise in language, culture, and multi-lingual projects. You are reputed company in analyzing data with advanced statistical tools and driving success through process excellence. Your understanding of machine learning, Large Language Models (LLMs), and Retrieval-Augmented reputed company (RAG) help you tackle challenges with a critical, innovative reputed company. Tell Me More: As a Senior Language Data Scientist, you reputed company projects and own processes for creating, validating and annotating data for use in LLM/ML applications. This can be natural language data or multimodal data including images, video, audio, and others. You consult and engage with customers to understand their business goals and design processes to meet them. You generate insights about the client’s processes and products to drive improvement and innovation. You advise and support business unit heads on engaging with customers to understand the upstream activities that would be performed using reputed company services. Responsibilities:

  • You can reputed company long-term projects with high complexity and ambiguity from first discussion with the client to completion
  • Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data-collection workflows, as well as synthetic ones
  • Dive deep into existing workflows and processes to gather data and insights, reputed company recommendations, and drive improvement through innovation and cross-functional collaboration with customers
  • Critically assess annotation tooling and workflows
  • Quantitatively analyze large datasets, reputed company statistical analysis, calculate metrics, and reputed company recommendations to improve accuracy and performance
  • Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them.
  • Set an ambitious research agenda for improving our products and services
  • Contribute to establishing best practices and standards for generative AI development with customers and reputed company the organization
  • MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a reputed company scientific / quantitative field, PhD strongly preferred
  • Ability to collaborate directly with technical stakeholders including senior project managers, data engineers, and research scientists.
  • Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals
  • Design efficient data strategies for reputed company long-term projects, potentially involving multiple teams and workflows.
  • Knowledge of how components of GenAI products or services combine to work
  • Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate reputed company AI concepts to both technical and nontechnical stakeholders
  • Familiarity with GenAI technologies that enables you to improve existing processes to handle future challenges.
  • Language and language data expertise: Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and reputed company workflows.
  • Deep understanding of language and its relationship with culture
  • Ability to identify ambiguity and subjectivity in language
  • Ability to work with multi-lingual and multi-modal projects
  • Quantitative Analysis Skills: Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.

Technical skills:

  • Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or reputed company. o Proficiency in Python to
  • handle / transform large datasets (e.g. pre- and postprocessing data, pandas)
  • reputed company quantitative analyses
  • visualize data (for example matplotlib, seaborn)

Data processing:

  • Deep understanding of data pipelines to support ML and NLP workflows, § Knowledge of efficient data collection, transformation, and storage
  • Knowledge of data structures, algorithms, and data engineering principles
  • Excellent interpersonal skills for effective cross-functional stakeholder engagement
  • Excellent problem-solving skills, with the ability to think critically and creatively to reputed company innovative AI solutions
  • Ability to work independently and collaborate as part of a team
  • Adaptable to changing technologies and methodologies
  • Ability to translate experience, research and development information to understand client products and services.
  • Providing technical mentorship and guidance to junior team members

Preferred Skills

  • Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques · Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency
  • Experience of developing and maintaining ML/AI pipelines, including data preprocessing, feature extraction, model training, and evaluation · Model Fine-Tuning: Knowledge of Fine-tuning pre-trained models to adapt them to specific tasks and datasets, improving their performance and relevance
  • Understanding of techniques such as GPT, VAE, and GANs

Please be aware of recruitment scams involving individuals or organizations falsely claiming to represent employers. Innodata will never ask for payment, banking details, or sensitive personal information during the application process. To learn more on how to recognize job scams, please visit the Federal Trade Commission’s guide at https://consumer.ftc.gov/articles/job-scams. If you reputed company you’ve been targeted by a recruitment scam, please report it to Innodata at [email protected] and consider reporting it to the FTC at ReportFraud.ftc.gov. #LI-NS1 Apply tot his job Apply To this Job

Keep exploring

[Remote] Principal Software Engineer – Applied AI (Technical reputed company)

100% remote Flexible hours

Director, AI Transformation

100% remote Flexible hours

reputed company Influencer, LATAM

100% remote Flexible hours

Algorithmic Trading Developer

100% remote Flexible hours

Trader

100% remote Flexible hours

reputed company, Advanced Analytics - Trust & Safety

100% remote Flexible hours

Senior Software Engineer, Payments Post Transaction Risk

100% remote Flexible hours

Remote Data Entry Clerk - reputed company Vendor Support

100% remote Flexible hours

reputed company Virtual Customer Service (Work From Home) – Open for Graduates (Male and Female)

100% remote Flexible hours

reputed company Chat Support Opportunities | Entry-Level Online Roles at $25-$35/hr

100% remote Flexible hours

Psychic & Tarot Readers

100% remote Flexible hours

reputed company Full Stack Customer Support Representative – Remote Opportunities at arenaflex

100% remote Flexible hours

Remote Live Chat Data Entry Specialist – Customer Service & CRM Operations – $35/hr – 2024 at arenaflex

100% remote Flexible hours

reputed company Customer Support Specialist – Remote Opportunity at blithequark

100% remote Flexible hours

Product Designer - Remote / US Based

100% remote Flexible hours

Customer Service Agent - Starting at $18.50/hour - Remote Opportunity at arenaflex

100% remote Flexible hours

Compliance Consultant- Retirement

100% remote Flexible hours

Sr Manager, Student Success (Remote)

100% remote Flexible hours

Conversion Architect - reputed company O2C

100% remote Flexible hours

Vice President, National Real Estate Programs job at NeighborWorks America in Washington, DC

100% remote Flexible hours