While the use of machine learning (ML) and artificial intelligence (AI) for IT security may not be new, the extent to which data-driven analytics can detect and thwart nefarious activities is still in its infancy.
As we’ve recently discussed here on BriefingsDirect, an expanding universe of interdependent application programming interfaces (APIs) forms a new and complex threat vector that strikes at the heart of digital business.
How will ML and AI form the next best security solution for APIs across their dynamic and often uncharted use in myriad apps and services? Stay with us now as we answer that question by exploring how advanced big data analytics forms a powerful and comprehensive means to track, understand, and model safe APIs use.
To learn how AI makes APIs secure and more resilient across their life cycles and ecosystems, BriefingsDirect welcomes Ravi Guntur, Head of Machine Learning and Artificial Intelligence at Traceable.ai. The interview is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions.
Here are some excerpts:
Gardner: Why does API security provide such a perfect use case for the strengths of ML and AI? Why do these all come together so well?
Guntur: When you look at the strengths of ML, the biggest strength is to process data at scale. And newer applications have taken a turn in the form of API-driven applications.
Large pieces of applications have been broken down into smaller pieces, and these smaller pieces are being exposed as even smaller applications in themselves. To process the information going between all these applications, to monitor what activity is going on, the scale at which you need to deal with them has gone up many fold. That’s the reason why ML algorithms form the best-suited class of algorithms to deal with the challenges we face with API-driven applications.
Gardner: Given the scale and complexity of the app security problem, what makes the older approaches to security wanting? Why don’t we just scale up what we already do with security?
More than rules needed to secure apps
Guntur: I’ll give an analogy as to why older approaches don’t work very well. Think of the older approaches as a big box with, let’s say, a single door. For attackers to get into that big box, all they must do is crack through that single door.
Now, with the newer applications, we have broken that big box into multiple small boxes, and we have given a door to each one of those small boxes. If the attacker wants to get into the application, they only have to get into one of these smaller boxes. And once he gets into one of the smaller boxes, he needs to take a key out of it and use that key to open another box.
By creating API-driven applications, we have exposed a much bigger attack surface. That’s number one. Number two, of course, we have made it challenging to the attackers, but the attack surface being so much bigger now needs to be dealt with in a completely different way.
The older class of applications took a rules-based system as the common approach to solve security use cases. Because they just had a single application and the application would not change that much in terms of the interfaces it exposed, you could build in rules to analyze how traffic goes in and out of that application.
Now, when we break the application into multiple pieces, and we bring in other paradigms of software development, such as DevOps and Agile development methodologies, this creates a scenario where the applications are always rapidly changing. There is no way rules can catch up with these rapidly changing applications. We need automation to understand what is happening with these applications, and we need automation to solve these problems, which rules alone cannot do.
Gardner: We shouldn’t think of AI here as replacing old security or even humans. It’s doing something that just couldn’t be done any other way.
Guntur: Yes, absolutely. There’s no substitute for human intelligence, and there’s no substitute for the thinking capability of humans. If you go deeper into the AI-based algorithms, you realize that these algorithms are very simple in terms of how the AI is powered. They’re all based on optimization algorithms. Optimization algorithms don’t have thinking capability. They don’t have creativity, which humans have. So, there’s no way these algorithms are going to replace human intelligence.
They are going to work alongside humans to make all the mundane activities easier for humans and help humans look at the more creative and the difficult aspects of security, which these algorithms can’t do out of the box.
Gardner: And, of course, we’re also starting to see that the bad guys, the attackers, the hackers, are starting to rely on AI and ML themselves. You have to fight fire with fire. And so that’s another reason, in my thinking, to use the best combination of AI tools that you can.
Gardner: Another significant and growing security threat are bots, and the scale that threat vector takes. It seems like only automation and the best combination of human and machines can ferret out these bots.
Machines, humans combine to combat attacks
Guntur: You are right. Most of the best detection cases we see in security are a combination of humans and machines. The attackers are also starting to use automation to get into systems. We have seen such cases where the same bot comes in from geographically different locations and is trying to do the same thing in some of the customer locations.
The reason they’re coming from so many different locations is to challenge AI-based algorithms. One of the oldest schools of algorithms looks at rate anomaly, to see how quickly somebody is coming from a particular IP address. The moment you spread the IP addresses across the globe, you don’t know whether it’s different attackers or the same attacker coming from different locations. This kind of challenge has been brought by attackers using AI. The only way to challenge that is by building algorithms to counter them.
One thing is for sure, algorithms are not perfect. Algorithms can generate errors. Algorithms can create false positives. That’s where the human analyst comes in, to understand whether what the algorithm discovered is a true positive or a false positive. Going deeper into the output of an algorithm digs back into exactly how the algorithm figured out an attack is being launched. But some of these insights can’t be discovered by algorithms, only humans when they correlate different pieces of information, can find that out. So, it requires a team. Algorithms and humans work well as a team.
Gardner: What makes the way in which Traceable.ai is doing ML and AI different? How are you unique in your vision and execution for using AI for API security?
Guntur: When you look at any AI-based implementation, you will see that there are three basic components. The first is about the data itself. It’s not enough if you capture a large amount of data; it’s still not enough if you capture quality data. In most cases, you cannot guarantee data of high quality. There will always be some noise in the data.
But more than volume and quality of data, what is more important is whether the data that you’re capturing is relevant for the particular use-case you’re trying to solve. We want to use the data that is helpful in solving security use-cases.
Traceable.ai built a platform from the ground up to cater to those security use cases. Right from the foundation, we began looking at the specific type of data required to solve modern API-based application security use cases. That’s the first challenge that we address, it’s very important, and brings strength to the product.
Seek differences in APIs
Once you address the proper data issue, the next is about how you learn from it. What are the challenges around learning? What kind of algorithms do we use? What is the scenario when we deploy that in a customer location?
We realized that every customer is completely different and has a completely different set of APIs, too, and those APIs behave differently. The data that goes in and out is different. Even if you take two e-commerce customers, they’re doing the same thing. They’re allowing you to look at products, and they’re selling you products. But the way the applications have been built, and the API architecture — everything is different.
We realized it’s no use to build supervised approaches. We needed to come up with an architecture where the day we deploy at the customer location; the algorithm then self-learns.
We realized it’s no use to build supervised approaches. We needed to come up with an architecture where the day we deploy at the customer location; the algorithm then self-learns. The whole concept of being able to learn on its own just by looking at data is the core to the way we build security using the AI algorithms we have.
Finally, the last step is to look at how we deliver security use cases. What is the philosophy behind building a security product? We knew that rules-based systems are not going to work. The alternate system is modeled around anomaly detection. Now, anomaly detection is a very old subject, and we have used anomaly detection in various things. We have used it to understand whether machinery is going to go down, we have used them to understand whether the traffic patterns on the road are going to change, and we have used it for anomaly detection in security.
But within anomaly detection, we focused on behavioral anomalies. We realized that APIs and the people who use APIs are the two key entities in the system. We needed to model the behavior of these two groups — and when we see any deviation from this behavior, that’s when we’re able to capture the notion of an attack.
Behavioral anomalies are important because if you look at the attacks, they’re so subtle. You just can’t easily find the difference between the normal usage of an API and abnormal usage. But very deep inside the data and very deep into how the APIs are interacting, there is a deviation in the behavior. It’s very hard for humans to figure this out. Only algorithms can tease this out and determine that the behavior is different from a known behavior.
We have addressed this at all levels of our stack: The data-capture level, and the choice of how we want to execute our AI, and the choice of how we want to deliver our security use cases. And I think that’s what makes Traceable unique and holistic. We didn’t just bolt things on, we built it from the ground up. That’s why these three pieces gel well and work well together.
Gardner: I’d like to revisit the concept you brought up about the contextual use of the algorithms and the types of algorithms being deployed. This is a moving target, with so many different use cases and company by company.
How do you keep up with that rate of change? How do you remain contextual?
Function over form delivers context
Guntur: That’s a very good question. The notion of context is abstract. But when you dig deeper into what context is and how you build context, it boils down to basically finding all factors influencing the execution of a particular API.
Let’s take an example. We have an API, and we’re looking at how this API functions. It’s just not enough to look at the input and output of the API. We need to look at something around it. We need to see who triggered that input. Where did the user come from? Was it a residential IP address that the user came in from? Was it a hosted IP address? Which geolocation is the user coming from? Did this user have past anomalies within the system?
You need to bring in all these factors into the notion of context when we’re dealing with API security. Now, it’s a moving target. The context — because data is constantly changing. There comes a moment when you have fixed this context, when you say that you know where the users are coming from, and you know what the users have done in the past. There is some amount of determinism to whatever detection you’re performing on these APIs.
Let’s say an API takes in five inputs, and it gives out 10 outputs. The inputs and outputs are a constant for every user, but the values that go into the input varies from user to user. Your bank account is different from my bank account. The account number I put in there is different for you, and it’s different for me. If you build an algorithm that looks for an anomaly, you will say, “Hey, you know what? For this part of the field, I’m seeing many different bank account numbers.”
There is some problem with this, but that’s not true. It’s meant to have many variations in that account number, and that determination comes from context. Building a context engine is unique in our AI-based system. It helps us tease out false positives and helps us learn the fact that some variations are genuine.
That’s how we keep up with this constant changing environment, where the environment is changing not just because new APIs are coming in. It’s also because new data is coming into the APIs.
Gardner: Is there a way for the algorithms to learn more about what makes the context powerful to avoid false positives? Is there certain data and certain ways people use APIs that allow your model to work better?
Guntur: Yes. When we initially started, we thought of APIs as rigidly designed. We thought of an API as a small unit of execution. When developers use these APIs, they’ll all be focused on very precise execution between the APIs.
We soon realized that developers bundle various additional features within the same API. We started seeing that they just provide a few more input options, but they get completely different functionality from that same API.
But we soon realized that developers bundle various additional features within the same API. We started seeing that they just provide a few more input options, and by triggering those extra input options you get completely different functionality from the same API.
We had to come up with algorithms that discover that a particular API can behave in multiple ways — depending on the inputs being transmitted. It’s difficult for us to figure out whether the API is going to change and has ongoing change. But when we built our algorithms, we assumed that an API is going to have multiple manifestations, and we need to figure out which manifestation is currently being triggered by looking at the data.
We solved it differently by creating multiple personas for the same API. Although it looks like a single API, we have an internal representation of an API with multiple personas.
Gardner: Interesting. Another thing that’s fascinating to me about the API security problem is that the way hackers try not to abuse the API. Instead, they have subtle logic abuse attacks where they’re basically doing what the API is designed to do but using it as a tool for their nefarious activities.
How does your model help fight against these subtle logic abuse attacks?
Logic abuse detection
Guntur: When you look at the way hackers are getting into distributed applications and APIs using these attacks – it is very subtle. We classify these attacks as business logic abuse. They are using the existing business logic, but they are abusing it. Now, figuring out abuse to business logic is a very difficult task. It involves a lot of combinatorial issues that we need to solve. When I say combinatorial issues, it’s a problem of scale in terms of the number of APIs, the number of parameters that can be passed, and the types of values that can be passed.
When we built the Traceable.ai platform, it was not enough to just look at the front-facing APIs, we call them the external APIs. It’s also important for us to go deeper into the API ecosystem.
We have two classes of APIs. One, the external facing APIs, and the other is the internal APIs. The internal APIs are not called by users sitting outside of the ecosystem. They’re called by other APIs within the system. The only way for us to identify the subtle logic attacks is to be able to follow the paths taken by those internal APIs.
If the internal APIs are reaching a resource like a database, and within the database it reaches a particular row and column, it then returns the value. Only then you will be able to figure out that there was a subtle attack. We’re able to figure this out only because of the capability to trace the data deep into the ecosystem.
If we had done everything at the API gateway, if we had done everything at external facing APIs, we would not have figured out that there was an attack launched that went deep into the system and touched a resource it should never have touched.
It’s all about how well you capture the data and how rich your data representation is to capture this kind of attack. Once you capture this, using tons of data, and especially graph-like data, you have no option but to use algorithms to process it. That’s why we started using graph-based algorithms to discover variations in behavior, discover outliers, and uncover patterns of outliers, and so on.
Gardner: To fully tackle this problem, you need to know a lot about data integration, a lot about security and the vulnerabilities, as well as a lot about algorithms, AI, and data science. Tell me about your background. How are you able to keep these big, multiple balls in the air at once when it comes to solving this problem? There are so many different disciplines involved.
Multiple skills in data scientist toolbox
Guntur: Yes, it’s been a journey for me. When I initially started in 2005, I had just graduated from university. I used a lot of mathematical techniques to solve key problems in natural language processing (NLP) as part of my thesis. I realized that even security use cases can be modeled as a language. If you take any operating system (OS), we typically have a few system calls, right? About 200 system calls, or maybe 400 system calls. All the programs running in the operating system are using about 400 system calls in different ways to build the different applications.
It’s similar to natural languages. In natural language, you have words, and you compose the words according to a grammar to get a meaningful sentence. Something similar happens in the security world. We realized we could apply techniques from statistical NLP into the security use cases. We discovered, for example, way back then, certain Solaris login buffer and overflow vulnerabilities.
That’s how the journey began. I then went through multiple jobs and worked on different use cases. I learned if you want to be a good data scientist — or if you want to use ML effectively — you should think of yourself as a carpenter, as somebody with a toolbox with lots of tools in it, and who knows how to use those tools very well.
But to best use those tools, you also need the experience from building various things. You need to build a chair, a table, and a house. You need to build various things using the same set of tools, and that took me further along that journey.
While I began with NLP, I soon ventured into image processing and video processing, and I applied that to security, too. It furthered the journey. And through that whole process, I realized that almost all problems can be mapped to canonical forms. You can take any complex problem and break it down into simpler problems. Almost all fields can be broken down into simple mathematical problems. And if you know how to use various mathematical concepts, you can solve a lot of different problems.
We are applying these same principles at Traceable.ai as well. Yes, it’s been a journey, and every time you look at data you come up with different challenges. The only way to overcome that is to dirty your hands and solve it. That’s the only way to learn and the only way we could build this new class of algorithms — by taking a piece from here, a piece from there, putting it together, and building something different.
Gardner: To your point that complex things in nature, business, and technology can be brought down to elemental mathematical understandings, once you’ve attained that with APIs, for example, applying this first to security, and rightfully so, it’s the obvious low-lying fruit.
But over time, you also gain mathematical insights and understanding of more about how microservices are used and how they could be optimized. Or even how the relationship between developers and the IT production crews might be optimized.
Is that what you’re setting the stage for here? Will that mathematical foundation be brought to a much greater and potentially productive set of a problem-solving?
Something for everybody
Guntur: Yes, you’re right. If you think about it, we have embarked on that journey already. Based on what we have achieved as of today, and we look at the foundations over which we have built that, we see that we have something for everybody.
For example, we have something for the security folks as well as for the developer folks. The Traceable.ai system gives insights to developers as to what happens to their APIs when they’re in production. They need to know that. How is it all behaving? How many users are using the APIs? How are they using them? Mostly, they have no clue.
The mathematical foundation under which all these implementations are being done is based on relationships, relationships between APIs. You can call them graphs, but it’s all about relationships.
And on the other side, the security team doesn’t know exactly what the application is. They can see lots of APIs, but how are the APIs glued together to form this big application? Now, the mathematical foundation under which all these implementations are being done is based on relationships, relationships between APIs. You can call them graphs, you can call them sequences, but it’s all about relationships.
One aspect we are looking at is how do you expose these relationships. Today we have this relationship buried deep inside of our implementations, inside our platform. But how do you take it out and make it visual so that you can better understand what’s happening? What is this application? What happens to the APIs?
By looking at these visualizations, you can easily figure out if there are bottlenecks within the application, for example. Is one API constantly being hit on? If I always go through this API, but the same API is also leading me to a search engine or a products catalog page, why does this API need to go through all these various functions? Can I simplify the API? Can I break it down and make it into multiple pieces? These kinds of insights are now being made available to the developer community.
Gardner: For those listening or reading this interview, how should they prepare themselves for being better able to leverage and take advantage of what Traceable.ai is providing? How can developers, security teams, as well as the IT operators get ready?
Rapid insights result in better APIs
Guntur: The moment you deploy Traceable in your environment, the algorithms kick in and start learning about the patterns of traffic in your environment. Within a few hours — or if your traffic has high volume, within 48 hours — you will receive insights into the API landscape within your environment. This insight starts with how many APIs are there in your environment. That’s a fundamental problem that a lot of companies are facing today. They just don’t know how many APIs exist in their environment at any given point of time. Once you know how many APIs are there, you can figure out how many services there are. What are the different services, and which APIs belong to which services?
Traceable gives you the entire landscape within a few hours of deployment. Once you understand your landscape, the next interesting thing to see are your interfaces. You can learn how risky your APIs are. Are you exposing sensitive data? How many of the APIs are external facing? How to best use authentication to give access to APIs or not? And why do some APIs not have authentication? How are you exposing APIs without authentication?
All these questions are answered right there in the user interface. After that, you can look at whether your development team is in compliance. Do the APIs comply with the specifications in the requirements? Because usually the development teams are rapidly churning out code, they almost never maintain the API’s spec. They will have a draft spec and they will build against it, but finally, when you deploy it, the spec looks very different. But who knows it’s different? How do you know it’s different?
Traceable’s insights tell you whether your spec is compliant. You get to see that within a few hours of deployment. In addition to knowing what happened to your APIs and whether they are compliant with the spec, you start seeing various behaviors.
People think that when you have 100 APIs deployed, all users use those APIs the same way. We think all of them are using the apps the same way. But you’d be surprised to learn that users use apps in many different ways. Sometimes the APIs are accessed through computational means, sometimes they are accessed via user interfaces. There is now insight for the development team on how users are actually using the APIs, which in itself is a great insight to help build better APIs, which helps build better applications, and simplifies the application deployments.
All of these insights are available within a few hours of the Traceable.ai deployment. And I think that’s very exciting. You just deploy it and open the screen to look at all the information. It’s just fascinating to see how different companies have built their API ecosystems.
And, of course, you have the security use cases. You start seeing what’s at work. We have seen, for example, what Bingbot from Microsoft looks like. But how active is it? Is it coming from 100 different IP addresses, or is it always coming from one part of a geolocation?
You can see how, for example, what search spiders’ activity looks like. What are they doing with our APIs? Why is the search engine starting to look at the APIs, which are internal language and have no information? But why are they crawling these APIs? All this information is available to you within a few hours. It’s really fascinating when you just deploy and observe.