The key challenge for generative AI: Delivering understandable and trustworthy quality

One reason for the rapid growth of the AI market is the urgent need to improve labor productivity. In Japan, where the birthrate is declining, the working population will probably not increase significantly in a short time. Japan needs to utilize AI to improve labor productivity and implement realistic and swift policies to achieve this. Of course, Japan is making a national effort toward this goal. According to a survey by Accenture, a leading consulting firm, the use of AI is expected to increase labor productivity by 37% in Sweden, 35% in the United States, and 34% in Japan by 2035.

We should note, however, that statistics show that generative AI has already begun to take away white-collar jobs and wages in the online freelance world. Generative AI can even write programs. Software companies are reluctant to dedicate personnel to simple programming tasks. The same thing will occur within companies in other fields. White-collar workers could lose their jobs if they can only do easily automatable jobs. Each of us needs to develop our own unique skills.

While rapid growth is predicted, there are significant barriers to overcome.

Examples include how to ensure the reliability of generative AI and how to manage software quality. Large language models (LLMs), which underpin generative AI, are too vast and complex for us to sufficiently verify the accuracy of AI-generated outputs. In manufacturing quality control, we can take samples and process them statistically, but this is not the case with generative AI. Therefore, we must exercise great caution when making high-risk decisions like diagnosing life-threatening cancer cells.

The Japanese government has already begun to increase the reliability of generative AI, and the National Institute of Advanced Industrial Science and Technology (AIST) under the Ministry of Economy, Trade and Industry (METI) has published AI guidelines. However, different industries and businesses require different specific actions to achieve the desired quality and to connect them to trustworthy AI systems. Therefore, transforming general theory into concrete practical knowledge may become a highly technical challenge.

Practical application of machine learning and AI requires a balance between accuracy and time

One of the machine learning libraries widely used for various AI applications, such as stock price prediction and purchase forecasting, is XGBoost.

XGBoost is the gold standard for data scientists: it uses many decision trees to improve accuracy. A decision tree has a branched structure that repeatedly splits into yes/no branches and is used to group data for predictions. Although a single decision tree produces low prediction accuracy, XGBoost achieves high accuracy by combining many trees and learning by correcting the errors of previous models. However, it has been said that explaining results produced by large and complex decision trees handled by XGBoost is difficult to do in an easily understandable manner.

Recently, decision tree experts have proposed a method using XGBoost that synthesizes hundreds of decision trees into a single decision tree. Although the method is theoretically correct, it takes too much computation time to be practical. I have already proposed a totally different method, successfully increasing learning speed by more than 50 times and significantly improved usability and efficiency for users.

Simply put, the number of rules produced by a single decision tree is equal to its number of terminal leaves. Therefore, more leaves mean highly complicated rules. For example, in the case of making a judgment like, “If XX, then it looks like it will rain,” with about 10 conditions such as A, B, and C stacked in the “if” clause, it becomes unclear which condition is most important. That makes it difficult to interpret the content of the “if” clause. My approach not only reduces the number of rules but also selects conditions appropriately. Thus, it is applicable to artificial intelligence as well as to medicine and financial engineering. I have submitted the results of my research to a renowned machine learning journal, and I am waiting for the review results of the revised manuscript.

From a practical perspective, machine learning and AI need to balance performance (accuracy) and cost (time), but currently, it seems that accuracy is prioritized over required time. However, an enormous amount of required time will make the implementation of AI impractical, and recent theories argue that even time-consuming and highly complex machine learning models do not significantly improve performance. Furthermore, it is true that the modernization of the emerging Automated Machine Learning (AutoML) is a kind of paradigm shift for next-generation AI. However, factors like learning speed and interpretability of results are increasingly important issues to be solved. In fact, a growing number of ML studies point out that current AutoML is too time-consuming and costly.

In 2024, I published a paper on the topic “Can generative AI become interpretable?” It was cited as many as 530 times in just over one year. In the first place, users do not want to see the entire behavior of generative AI; in many cases, it should be sufficient if they can interpret and use the necessary parts. I hope that my research will trigger the emergence of methods that have a greater impact on generative AI and AutoML.

Emotion AI with the ability to understand romance novels could even proactively address customer needs

In recent years, emotion AI research has been growing, but here too, generative AI faces barriers to overcome.

Generative AI can process text to capture the preferences, trends, and behaviors of human groups. That is why it can be easily used in marketing, multilingual translation, meeting summary creation, and it can provide excellent interfaces. This may lead to the assumption that AI can also understand human crowd psychology and consumer psychology, but currently, it can only do as much as analyzing text patterns on social media.

Expressions of nuanced and complex emotions differ depending on the writer and the language. Analyzing patterns in a text does not immediately lead to emotion AI. Recently, academic papers have been published on research into making generative AI understand essays. Essays are easy to grasp because their ideas are usually logically written without substantial human emotion.

However, understanding novels is much more challenging. In particular, novels written in minority languages, including Japanese, have few samples and are difficult to interpret. Even more difficult is to understand romance novels, on which I am also working. If you translate words expressing emotions literally, you might feel that you have successfully expressed them. But, fully understanding the nonlinear relationship between words and emotions found in romance novels is not such an easy task.

ChatGPT or translation tools like DeepL are now available to put Japanese into English. As part of our experiments with emotion AI, we compared the translations provided by generative AI with published English translations to see how we could improve the quality of English translation. The crucial point was whether subtle intentions could be translated into English in a way that matched the original emotions.

Specifically, we prompted ChatGPT to translate some Japanese sentences into English, and when the outputs were unsatisfactory, we advised ChatGPT using the short English sentence “pay attention to emotions.” As a result, it was demonstrated that through ChatGPT, not only were the translations of those words improved, but the translations of other sentences also displayed a qualitative improvement.

Enhanced accuracy of emotion AI would enable full-scale smartphone marketing. For example, further developed generative AI in an e-commerce site interface would allow smartphones to anticipate users’ desires and to suggest what they want.

You cannot fit an entire LLM into a smartphone. But if you could incorporate only the most essential functions of the interface into the device and transmit information, then it would be like always having a personalized smartphone marketer by your side. This provides users with the significant advantage of efficiently searching for desired products and information while enjoying personalized recommendations.

As for AI use within companies, they can probably use AI to provide a certain percentage of in-house training in a uniform manner. Since many people are responsible for in-house training, differences in proficiency and expression of them cause differences in training content and density, as well as in the level of understanding. Such disparities will be eliminated with AI, making training fairer. Emotion AI could read and respond to the trainees’ feelings. This progress is highly feasible because the questions to AI are specialized for in-house training and the answers mostly fall within a certain range.

The key to the implementation of emotion AI lies in how well users can understand and accept it. In the past, only those with a basic knowledge of AI technology could use AI, but from now on, people without such knowledge will use future AI centered on generative AI. If you must be able to explain how AI works to use it, AI will become a black box like an old magic trick. To widely integrate AI into society, we must make generative AI a white box and its decision-making processes accountable. Achieving both accuracy and transparency will surely lead to realizing trustworthy generative AI.

* The information contained herein is current as of July 2025.
* The contents of articles on Meiji.net are based on the personal ideas and opinions of the author and do not indicate the official opinion of Meiji University.
* I work to achieve SDGs related to the educational and research themes that I am currently engaged in.

Information noted in the articles and videos, such as positions and affiliations, are current at the time of production.