Automating IT Interviews with Ollama and Audio Capabilities in Python

In today’s tech-driven world, automation is revolutionizing recruitment. Imagine having a virtual IT interviewer that not only interacts intelligently but also communicates verbally with candidates. This post will guide you through building an IT interviewer using Ollama and Python, integrating audio capabilities for a more immersive experience.

📚 Introduction
Finding the right talent can be challenging and time-consuming. With advancements in AI and audio processing, it's possible to automate the initial interview phase. This project showcases how to create an interactive IT interviewer that asks questions and processes answers through voice, using Ollama and Google Cloud's Speech-to-Text and Text-to-Speech APIs.

🚀 What You Will Learn

  • How to set up Ollama for conversation handling.
  • Integrate Google Cloud’s Speech-to-Text and Text-to-Speech APIs for audio capabilities.
  • Structure a Python project to automate interviews.

🛠️ Prerequisites

  • Python 3.7+
  • Google Cloud Account: For Speech-to-Text and Text-to-Speech APIs.
  • Ollama Account: For conversational AI.

📂 Project Setup
1. Clone the Repository
Start by cloning the project repository:

git clone
cd ollama-it-interviewer
2. Create and Activate a Virtual Environment
Set up a virtual environment to manage dependencies:

python -m venv venv
source venv/bin/activate
3. Install Dependencies
Install the required Python packages:

pip install -r requirements.txt
4. Configure Google Cloud
a. Enable the APIs
Enable the Speech-to-Text and Text-to-Speech APIs in your Google Cloud Console.

b. Create Service Account and Download JSON Key

  1. Go to IAM & Admin > Service accounts.
  2. Create a new service account, grant it the necessary roles, and download the JSON credentials file.

c. Set the Environment Variable
Set the environment variable to point to your credentials file

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
Replace /path/to/your/service-account-file.json with the actual path to your credentials file.

  1. Prepare Audio Files
    Add sample audio files in the audio_samples/ directory. You need a candidate-response.mp3 file to simulate a candidate's response. You can record your voice or use text-to-speech tools to generate this file.

  2. Update Configuration
    Edit src/ to configure your Ollama credentials:

OLLAMA_API_URL = ''  # Or replace with your Ollama local
OLLAMA_MODEL = 'your-ollama-model'  # Replace with your Ollama model
7. Run the Project
Run the interviewer script:

# Option 1: Run as a module from the project root
python3 -m src.interviewer
# Option 2: Ensure PYTHONPATH is set and run directly
export PYTHONPATH=$(pwd)
python3 src/
📝 Detailed Explanation
The main script orchestrates the interview process:

from pydub import AudioSegment
from pydub.playback import play
from src.ollama_api import ask_question
from src.speech_to_text import recognize_speech
from src.text_to_speech import synthesize_speech
from dotenv import load_dotenv
import os

# Load environment variables

# Configure FFmpeg for macOS/Linux
os.environ["PATH"] += os.pathsep + '/usr/local/bin/'

def main():
    question = "Tell me about your experience with Python."
    synthesize_speech(question, "audio_samples/question.mp3")

    question_audio = AudioSegment.from_mp3("audio_samples/question.mp3")

    candidate_response = recognize_speech("audio_samples/candidate-response.mp3")

    ollama_response = ask_question(candidate_response)
    print(f"Ollama Response: {ollama_response}")

    synthesize_speech(ollama_response, "audio_samples/response.mp3")

    response_audio = AudioSegment.from_mp3("audio_samples/response.mp3")

if __name__ == "__main__":
Handles interaction with Ollama API:

import requests
from src.config import OLLAMA_API_URL, OLLAMA_MODEL

def ask_question(question):
    response =
        json={"model": OLLAMA_MODEL, "input": question}
    response_data = response.json()
    return response_data["output"]
Converts audio to text using Google Cloud:

from import speech
import io

def recognize_speech(audio_file):
    client = speech.SpeechClient()

    with, "rb") as audio:
        content =

    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(

    response = client.recognize(config=config, audio=audio)
    for result in response.results:
        return result.alternatives[0].transcript
Converts text to audio using Google Cloud:

from import texttospeech
import os

def synthesize_speech(text, output_file):
    # Verify that the environment variable is set

    client = texttospeech.TextToSpeechClient()

    synthesis_input = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(
    audio_config = texttospeech.AudioConfig(

    response = client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config

    with open(output_file, "wb") as out:
        print(f"Audio content written to file {output_file}")
🎉 Conclusion
By integrating Ollama and Google Cloud’s audio capabilities, you can create a virtual IT interviewer that enhances the recruitment process by automating initial candidate interactions. This project demonstrates the power of combining conversational AI with audio processing in Python.

Give it a try and share your thoughts in the comments! If you encounter any issues or have suggestions, feel free to ask.

📂 Project Structure

├── audio_samples/
│   ├── candidate-response.mp3
├── src/
│   ├──
│   ├──
│   ├──
│   ├──
│   └──
├── requirements.txt
└── .gitignore
🛠️ Resources

  • Ollama
  • Google Cloud Speech-to-Text
  • Google Cloud Text-to-Speech
  • Python pydub

