What is a Time Series Database?
Time series databases have exploded in popularity in just the past few years. According to the database ranking website DB-Engine, over the past 24 months, time series databases have seen the highest increase in demand among any type of database. But what are time series databases exactly, and how do time series databases work?
What are time series databases? How do time series databases work?
To answer what a time series database is, we first need to define the concept of time series data. A "time series" is a data set in which data points are tagged with timestamps. Time series databases are therefore databases that have been specifically built and optimized for storing time series data.
In a time series database, time is the most important axis; analysis of time series data typically focuses on how the data is evolving over time. This makes time series databases significantly different from relational databases:
- Time series databases are optimized for viewing changes over time. In fact, relational databases can be viewed as a special case of time series databases in which information is only recorded at a single point, or only updated when it changes in reality. Time series databases, meanwhile, record the state of reality at constant intervals.
- Time series databases may require highly precise timestamps. The timestamps in a standard relational database may be fairly imprecise: usually down to the second, at maximum. On the other hand, timestamps in a time series database may be precise down to the microsecond or even nanosecond.
- Time series databases are usually append-only databases. In other words, new information is constantly added to the database, but there is usually very little reason to edit the entries of historical data.
The use cases for time series data, and time series databases, are almost limitless. Time series databases can be used to record weather observations, electricity usage, vehicle locations, stock prices, and much more—any situation in which the values in the database are likely to vary over time.
One major use case for time series databases is the Internet of Things (IoT), a vast network of many interconnected sensors that communicate and exchange data. IoT sensors may be in far-flung locations, each one recording data about its environment over a period of time. These observations are then sent to a centralized hub and recorded within a times series database.
Time series data in Redis
Redis is an open-source, in-memory data structure store that is used to implement NoSQL key-value databases, caches, and message brokers. Because Redis places a high priority on performance and efficiency, it's an excellent choice for working with time series data.
Developers have several options for working with time series data in Redis. For example, they can use the sorted set data structure, setting the timestamp as the key and the data as the corresponding value. Because the data is sorted, it's easy to process and analyze it in chronological order. They can also use Redis Streams, a new feature introduced in Redis 4.0 for handling logs and channels of information.
But why implement your own time series database when you can take advantage of a pre-built implementation in a friendly and familiar programming language like Java? Although Redis isn't compatible with Java out of the box, many Java developers using Redis choose to install a third-party Redis Java client such as Redisson.
Redisson implements time series data using the RTimeSeries interface. Below is an example of how to use the RTimeSeries interface in Redisson:
RTimeSeries<String> ts = redisson.getTimeSeries("myTimeSeries"); ts.add(201908110501, "10%"); ts.add(201908110502, "30%"); ts.add(201908110504, "10%"); ts.add(201908110508, "75%"); // entry time-to-live is 10 hours ts.add(201908110510, "85%", 10, TimeUnit.HOURS); ts.add(201908110510, "95%", 10, TimeUnit.HOURS); String value = ts.get(201908110508); ts.remove(201908110508); Collection<String> values = ts.pollFirst(2); Collection<String> range = ts.range(201908110501, 201908110508);
The add() method is used to add new time series data to the database. Each call to the add() method provides a timestamp along with a string containing the time series data. In addition, users can include optional arguments for the add() method that define the time to live for each entry in the database (e.g. 10 hours). The range() method returns all the time series data within a given time range.
Note that the RTimeSeries interface comes with asynchronous, reactive, and RxJava2 interfaces, so that you can use the programming model that best fits your needs.