The goal of this blog post is to evaluate the Confluent Platform.
The Confluent Platform is built around Apache Kafka and adds to it:
-
Clients: native c/c++ kafka client
-
Schema registry
-
Rest Proxy
-
Connectors to: JDBC databases, HDFS and Hive
In particular, we are interested on analyzing the Schema-Registry and the REST Proxy added on top of apache kafka.
Setup
The examples below can be executed by starting from this gist.
Requirements: docker, docker-compose
starts bash docker-compose up -d
export the env variable for curl:
mac:
bash
DOCKER_IP=docker-machine ip dev
linux: bash DOCKER_IP=localhost
Schema Registry
The Confluent Schema Registry provides a serving layer for your metadata. It provides a RESTful interface for storing and retrieving Avro schemas. It stores a versioned history of all schemas, provides multiple compatibility settings and allows evolution of schemas according to the configured compatibility setting. It provides serializers that plug into Kafka clients that handle schema storage and retrieval for Kafka messages that are sent in the Avro format.
The main concepts related to the schema registry are: subject, schema and version.
-
A subject is a textual label that will be associated to a kafka topic
-
A schema is a avro compatible schema that describes the data model used to store data into a topic
-
a version is a numeric id used to identify a schema used in a subject.
Api tutorial
The first step is to understand how to list subjects and get their associated schemas.
List Subjects
Request:
1 |
|
Return a list of subjects
Response:
1 |
|
Version associated to a subject
1 |
|
Get schemas associated to a subject
1 |
|
Post a Subject
Register a new schema under the specified subject. If successfully registered, this returns the unique identifier of this schema in the registry. The returned identifier should be used to retrieve this schema.
If the same schema is registered under a different subject, the same identifier will be returned. However, the version of the schema may be different under different subjects.
Example 1: kafka-key subject
1 |
|
Response 1:
1 |
|
Example 2: kafka-value with the same schema of kafka-key
1 |
|
Response 2
1 |
|
Here we can see that the id associated to the subject is the same.
Post and Update a Subject
An easy way to test the put several version of the same subject is to be enabled the FORWARD
compatibility configuration:
1 |
|
Now, we can create the subject complex with the schema complex:
1 |
|
Then we update the subject by updating the schema content of the subject:
1 |
|
Here we can see that the id associated are different.
If we lists the stored subjects we get ["complex","kafka-key","kafka-value"]
1 |
|
and if we query for the version of the subject complex
we get [1,2]
1 |
|
Finally, we can get the name, version and schema of the returned versions by:
1 |
|
Check if a schema exists under a subject
Check if a schema has already been registered under the specified subject. If so, this returns:
-
subject
-
version
-
id
-
schema
1 |
|
Response:
1 |
|
Schema Registry Discussion
The goal of the schema registry is immediately such as its data model. I spent some time to fully understand how to test the concept of version associated with a subject especially because it strongly depends on the base compatibility configuration of the registry (“BACKWARD”). I think that the rationale here should be to use Avro optional fields to support base schema evolution.
Data Model
-
A subject has a schema associated,
-
Each schema has an id,
-
Each schema can be associated with multiple subjects,
-
It is possible to associate multiple schema to the same subject. If compatibility is Forward we can set different schema (or breaking evolutions of the same schema) into the same subject. Each update will have a different version id.
REST Proxy
The Kafka REST Proxy provides a RESTful interface to a Kafka cluster. It makes it easy to produce and to consume messages (in json, avro and binary), to view the state of the cluster, and to perform administrative actions without using the native Kafka protocol or clients.
Examples
1 |
|
Produce and Consume Avro Messages
1 |
|
If we query the schema registry we will see a new subject named avrotest-value
1 |
|
Thus, when we create a new topic, using the confluent clients or the http proxy, it would be add in the schema-registry a new mapping for the topic key and its value.
Features
The Rest PROXY exposes all the functionalities of the Java producers, consumers, and command-line tools:
-
Metadata: metadata of the state of the cluster
-
Producers and Consumers: expose creation and communication
-
Data Formats: can write JSON, raw bytes base64 and JSON-encoded Avro.
The design of the API resembles the Eventstore API
Java Clients
The Confluent Kafka extends the base Apache Kafka by adding:
-
HDFS and JDBC connection
-
connection to schema registry and topic validation
-
Camus for Kafka to HDFS pipelines.
As we are interested on the registry interaction with the Clients Api we evaluated the example provided by confluent.
The producer and consumer api are similar to the Apache foundation version. The interaction with the schema registry is done via a configuration add to the producer/consumer.
1 |
|
This makes everything transparent to the final user. What is not clear is if it is possible to restrict the creation/modification of topics from the client.
In kafka broker configuration we can setup auto.create.topics.enable=false
to disable automatic topic creation, but it is not clear from Confluent documentation how this settings interact with the registry and the api.