Ville Somppi of M-Files on Artificial Intelligence, Knowledge Work, and Structured vs. Unstructured Data

Artificial intelligence (AI) is transforming knowledge work every day as every new innovation evolves faster and faster. We sat down with Ville Somppi, Vice President of Industry Solutions at M-Files, to chat about these changes, what they mean for the future of knowledge workers, and whether (or how) AI will help organizations bridge the gap between structured and unstructured data.

Artificial Intelligence and the Future of Work


Some business leaders are predicting that AI is going to eliminate work altogether. What impact do you see AI having on the future of work?

AI will actually bolster human ingenuity and creativity. Computers do what you ask them. If you know the right question, an AI will know the answer. In the case of generative AI, the output is probably halfway or even 90 percent there, assuming you have an idea of what to produce. AI will empower workers by fostering new potential for knowledge work automation.

AI won’t randomly know what to create—there must be a person, an intelligent actor, asking it to do something and then validating that the content produced is as requested. The slow part in any creative work, where you actually have to produce assets based on your original idea, will speed up, but AI will not replace the need for knowledge workers to realize a vision and use their creativity to bring that vision to life.

The journey towards knowledge work automation

When people talked about optimizing work performance, it generally been about file management and process management taking us into the next phase of automation. Is this evolution a result of better bandwidth and processing speeds, and was this always the dream or is this a shift?

This has always been the dream, but it’s been a very slow process. The history of IT covers only 50 or 60 years. The first computers were barely able to store data. Eventually, graphical interfaces came into the markets, and we found ways to visualize information on a screen—not just printed on paper. The graphical user interface meant you didn't have to be a scientist to understand what a computer does, but important tasks like document management and workflow management were still totally manual.

The vision has always been the automation of knowledge work and innovation to make any work easier—to improve productivity. Asking a computer to create something requires only spoken or written language. Decades ago, computers could do really cool, powerful calculations or simulations to help design something as complex as a space mission to another galaxy. Tomorrow you’ll be able to say, “Hey, ChatGPT, can you design a spacecraft for me based on this example?” Instead of spending thousands of hours using a mouse and keyboard to draw the 3D-blueprint, getting to a workable draft design to refine is much faster. You don't have to explain everything literally—that's the revolution.

Structured vs. unstructured data

So many of our systems today—from file management to business process—are all about structured data like Word or PowerPoint files. But tweets are legally discoverable in court and that counts as unstructured. When it comes to knowledge work, is there really a difference between structured and unstructured data or has it just been completely blurred?

Traditionally, computers need structure to understand data. Let's say the name of your company is just text. A structured system like a customer relations management (CRM) tool can read an ID number corresponding to your company name, and then it only cares about the ID, and will always know which ID number represents your company. With unstructured data written by humans, without any explicitly defined meaning, a computer won’t understand it—it's just some text.

With large language models, structure is less important because computers can process unstructured data more efficiently to extract meaning and any interesting data points. Say a contract is valid in 2024. If we extract that time period as structured data, the computer knows when the contract applies. With generative AI and it's inference engine, you can ask what a given item in an unstructured asset means, and the AI understands because it can read and interpret unstructured content.

There's a difference between structured and unstructured data. One is meant to be read and understood by computers and the other is free flowing—computers have had a hard time understanding it until now. It's less important today to have everything as structured data, but structured data is still how computers talk to each other. You can’t really calculate the phrase “quarter past noon,” because a computer by default sees only text. A large language model can convert that human expression into computer-understandable structured data, allowing normal computations to occur.

The importance of informational intent

Why is structured data easier for computers to use, and how does M-Files process unstructured data?

With structured data, you don't just have the value of the data, you also have the meaning, the type, and the informational intent. For example, one type is the name of a company. Every system using that data point knows that designation refers to the name of the company, as well as the type of data. Structured data can help a system differentiate between a text field and a number field. And in a number field, what does the number mean? Is it an amount of money or a postal code? With structured data, computer systems know what you mean with any given piece of information.

If you upload a contract to M-Files, it's just a document. It's unstructured, it's human-created, but we can extract interesting data points such as structured metadata. Because M-Files can tag documents with metadata such as contract validity, we can transform parts of that unstructured data into structured data so that computers can process it to apply business rules and empower all flavors of knowledge work automation.

Generative AI costs capital

Does data format impact the cost to use generative AI? Is unstructured data more expensive to process, and have new language models changed this?

Using generative AI could be expensive because the computer has to do a lot of processing to understand the content, to sift through any existing organizational chaos or find a given valid contract period as opposed to somebody simply reading that information from the metadata. If you do that a million times, you’ll probably pay $50,000 to whichever company is providing the generative AI service because their AI is doing so much hard work.

But you can do it forever from a structured data field and at low cost because it's trivial. Large language models are just gigantic mathematical formulas powered by deep neural networks. You enter an input value, and the output value comes out. But the cost of performing that mathematical operation is a lot compared to just doing normal things we could do 50 years ago.

The thought-cost element is super important. We cannot just replace all the old IT and use artificial intelligence and generative AI for everything because it's about a billion times more expensive.

I have a smartphone here with more computing power and more control over information technology than all of the super computers had in the 1990s. Large language models and generative AIs that understand language are possible because we have so much more computing power compared to the past—that doesn't make it cheap, just possible. Any company applying these technologies has to do a cost-benefit analysis. When should we use these cool, new, expensive technologies, as opposed to cheaper, more mechanical and traditional IT technologies?