Clurgo logo
  • Clurgo
  • Blog
  • Creating a video calls with Twilio Java SDK: step by step guide

Creating a video calls with Twilio Java SDK: step by step guide

10/11/2023 Karolina Szafrańska


Technical input & supervision: Michał Chmielarz

Have you ever wondered how the magic of seamless communication happens behind the scenes? Twilio, a cloud communications platform, has emerged as a go-to solution for developers and businesses alike. Whether you’re looking to enhance customer engagement, streamline internal communication, or create innovative applications, Twilio can be a tried-and-true method. We used it as a tool for video meetings and recordings.

Curious about Twilio’s capabilities? Let’s start from the basics and walk through the process of room creation and post-meeting video composition. Think of it as a kick off to your further exploration of this powerful tool!

Getting started: how to setup Twilio

Every journey starts with a single step. In this case, it should be creating a Twilio account. Twilio offers a variety of integrations in the form of libraries, including a well-designed SDK for Java with a robust API that can be used for various projects. To use the Twilio Java SDK, you need to visit the Twilio website, create an account and obtain four parameters: Twilio account SID, auth token and API key and secret. These four you need to use to properly configure Twlio client in your code. Of course, the SDK library has to be added as a dependency to your project at first:

 // Gradle
dependencies {
implementation "com.twilio.sdk:twilio:${twilioVersion}"

// Maven



Creating video connection with Twilio SDK Java with security in mind

Now that we’ve successfully configured Twilio, we’re ready to host online meetings. With the Java SDK, you can easily set up video connections, manage the participants and control the different paths. Establishing a secure video connection with Twilio involves the creation of a room, complete with specific configurations tailored during its setup. A pivotal step in this process is requesting Twilio to generate a unique access token for each user. Armed with their access tokens, users can effortlessly engage with the front-end SDK and seamlessly join their designated room. It’s worth noting that even after a meeting ends, the room persists for a certain duration; however, prolonged inactivity will lead to its automatic closure. To reconnect, a new token is required.

Below you can have a glimpse at our code:

Start event call:

 public EventCallAccessToken startEventCall(EventId eventId, EventParticipant patient, EventParticipant provider) {
   log.debug("Starting event call {} triggered by provider {}", eventId, provider.participantId());
   var twilioMeeting = twilioMeetingRepository.findByEventId(eventId)
           .orElseGet(() -> createTwilioMeeting(eventId, patient, provider));
   var roomSid = twilioMeeting.twilioRoomSid();
   return createAccessToken(eventId, provider, roomSid, twilioMeeting.twilioChatServiceSid());

createTwilioMeeting method, among other stuff, is responsible for setting up a Twilio’s room which is required to have a call and Twilio’s conversation. The last one, we’re using to provide a chat feature for the call participants.

Creating a room:

 TwilioRoomSid createRoom(String roomName, CallbackUrl callbackUrl) {
   var room = Room.creator()
   return new TwilioRoomSid(room.getSid());

When creating the room, you need to specify the region for the meeting. It’s a good practice to set it to a location that’s reasonably close to you, otherwise you might experience poorer audio and video quality while the client is testing. Like Zoom or MSTeams,Twilio allows starting the recording during a meeting. For automatic activation, be sure to set the recording flag as soon as a participant joins the meeting on the phase of room creation.

Creation of a conversation:

TwilioConversation createConversation(String conversationName) {
   var conversation = Conversation.creator()
           .create(client);"Created conversation {}", conversation);
   var conversationSid = new TwilioConversationSid(conversation.getSid());
   var chatServiceSid = new TwilioChatServiceSid(conversation.getChatServiceSid());
   return new TwilioConversation(conversationSid, chatServiceSid);

As you can see, the SDK provides a fluent API which is very convenient in use.

The must-remember basics: Setting the parameters

The SDK’s API provides a comprehensive set of features for precise parameter customization, including participant names, each unique to avoid confusion. Additionally, participants can continue to manage certain features even during an ongoing connection. The SDK simplifies video configuration, offering options like background blurring, alternative backgrounds, various filters, and screen resolution adjustments. When recording, each contestant’s video feed is captured by Twilio with its dedicated audio and video tracks. Notably, there are no restrictions on the number of audio tracks that each user can participate in. SDK enables also the establishment of screen sharing pathways.


Generating video call using Twilio SDK. Your server application tells Twilio to create a Room using the Rooms REST API

Your server application tells Twilio to create a Room using the Rooms REST API. Image source. 


Our approach in the project for healthcare industry

Twilio is a cornerstone of our project, which is tailored to a highly specific use case—an online meeting platform connecting healthcare specialists with patients. The paramount concern here is security, both in terms of connectivity and access, owing to the potentially sensitive data exchanged during these meetings. Furthermore, the need for recording these sessions is imperative, ensuring that the recordings are readily accessible when needed. In our project, the backend handles interacting with Twilio, including room and conversation creation and token management, while the frontend seamlessly takes care of video display, audio capture, and the direct uploading of the audio-video stream to Twilio.

Here are some crucial features of our solution:

When configuring a room, we have the ability to define its URL. This URL serves as a valuable backend indicator, providing real-time insights into the room’s status, including whether it’s in use, idle, if a participant has ended the call, or if all participants have disconnected. This callback mechanism is crucial, particularly for further processing of recordings.

The address for callbacks to the room is the REST endpoint on our site. It must be available externally for Twilio to download. We save callbacks to a database—it allows us to introduce a feature like connection analysis from the technical side.

The issue of one shortcoming of the Java SDK is also worth mentioning here. The address at which the API is communicating with the SDK library is already determined. What does it mean? Testing the code integratively during the build without using Twilio servers is hard, because the library has the URL hardwired. Thus, we had to provide a mock for the Twilio client that returns mocked responses.

Composition at Twilio – merged results of all recording

After the room is closed, we create a “Composition” from recordings of individual tracks. During this process, we select the desired layout and merge all audio, video, and screen share recordings into a single track. By default, the composition is stored in Twilio’s cloud. However, you can choose to write your video recordings and compositions to your own Amazon Web Services (AWS) S3 bucket instead. In our project, we’ve opted to save recordings in the cloud and retain control and security by saving the compositions on our S3 storage.

When creating a composition, we apply a similar path as when room or access token generation:

 CompositionSid createComposition(TwilioRoomSid twilioRoomSid, CallbackUrl callbackUrl) {
   var sid = Composition.creator(twilioRoomSid.value())
   return new CompositionSid(sid);

In the process of creating our composition, we decide on the tracks to include, preferred format, and the desired resolution. Additionally, we employ a composition callback mechanism to signal when the composition is available on our S3. The end result is a fully processed video, primed and ready for immediate use.

Do you have a business need related to streamlining internal or external communication? Drop us a note to schedule a consultation with our subject matter expert!

Clurgo logo

Subscribe to our newsletter

© 2014 - 2024 All rights reserved.