





MACHINE‑LEARNING / LLM ENGINEERING Are you passionate about turning enterprise data into production‑grade LLM services? Do you enjoy retrieval‑augmented generation (RAG), vector‑database pipelining and GPU optimisation? We are hiring several ML / LLM Engineers who will train, fine‑tune and serve AI models such as Llama 3 and Mixtral 8x7B, enabling the GRC platform we are building to understand real world scenarios and use cases, such as correlating SIEM data and identifying anomalies, answering security and compliance questionnaires from customers, analysing new laws and regulations and identifying what organisational policies need updated or written, evaluating vulnerability trends and findings, and recommending remediations, to name a few examples. BACKGROUND At Allendevaux \& Company, we are fusing global regulatory expertise with cutting‑edge AI to build a new GRC platform comprising multiple AI\-enabled modules, automating data‑protection and governance workflows. You will join a lean AI cohort in Alexandria, collaborating with DevOps, security and UX peers to ship trustworthy, low‑latency inference services. EXPERIENCE Ideal candidates have hands‑on experience in training and serving transformer models, building RAG pipelines, and optimising GPU workloads. Exposure to PyTorch, Hugging Face Transformers, vector stores (e.g., FAISS, Qdrant), and containerised deployment (Docker, Kubernetes) is expected. Familiarity with standards such as ISO 27001 or ISO 42001 contexts is advantageous. PERSONAL CHARACTERISTICS \& SKILLS * Clear written and spoken English; able to document experiments concisely. * Analytical thinker who validates assumptions with metrics. * Collaborative, eager to pair with DevOps and prompt engineers. * Security‑minded, treating data privacy and model governance as first‑class concerns. * Self‑driven learner who keeps pace with the fast‑moving open‑source LLM ecosystem. RESPONSIBILITIES * Fine‑tuning and quantising open‑source LLMs on customer‑domain corpora. * Building RAG pipelines that combine embeddings, vector search and prompt templates. * Packaging models behind Triton Inference Server or vLLM and exposing REST/gRPC endpoints. * Profiling GPU utilisation, memory footprint and latency; applying optimisation (e.g., Flash‑Attention, speculative decoding). * Collaborating with Prompt‑Engineering Specialists to iterate on system prompts and guard‑rails. * Writing CI/CD jobs for automated evaluation (accuracy, hallucination rate, bias metrics). * Documenting experiments, hyper‑parameters and performance benchmarks for reproducibility. MINIMUM QUALIFICATIONS * Bachelor’s degree in Computer Science, Data Science or related field; **master’s degree preferred**. * 3\+ years of experience in machine learning, including transformer models. * Proficiency with Python, PyTorch, Hugging Face Transformers and vector databases. * Experience deploying models via Docker/Kubernetes and monitoring with Prometheus or similar. PREFERRED QUALIFICATIONS * Exposure to security, GRC or SOC data sets (vulnerability scans, SIEM logs). * Familiarity with MLflow or Kubeflow for experiment tracking. * Knowledge of CUDA optimisation and mixed‑precision training. * Publications or competition medals (e.g., Kaggle, NeurIPS) in NLP or LLM domains.


