ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK

Saravanan Radhakrishnan; Jerubbaal Luke; Rahul Agarwal; Gargi Lahiri; Karthick Alagappan

doi:10.5455/JCSI.20250217052842

Eu J Comp Sci Informatics. 2025; 2(3): 148-162

doi: 10.5455/JCSI.20250217052842

ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK

Saravanan Radhakrishnan, Jerubbaal Luke, Rahul Agarwal, Gargi Lahiri, Karthick Alagappan.

Abstract	Download PDF	Cited by 1 Articles	Post
Aim/Background: Product attribute value extraction (PAVE) systems have emerged as a powerful tool to automate the process of extracting and organizing product attributes from diverse data sources. Large Language Models (LLMs) have repeatedly demonstrated significant potential in extracting relevant information and are well founded on high reasoning ability. In this article, we propose the use of a rulebook that can assist LLMs in extracting the correct information while maintaining compliance with predefined guidelines. We call this technique rulebook-based prompting, and it significantly outperforms zero-shot prompting. This has many advantages. Firstly, LLMs do not need to be finetuned for every new product. The rulebook can be updated to include new information and guidelines. It also reduces manual effort since there are around 60 attributes to be verified with complex rules. Methods: The process involves converting the rulebook into a vectorized representation using an embedding for efficient semantic searches. When an input containing product images and descriptions is entered, the LLM first identifies the product type. The list of attributes for the particular product type are then obtained using the vectorized rulebook. A prompt is generated using the list of attributes and other instructions and passed to the LLM. The LLM finally extracts all the required attribute information in a specified format. Results: Our experiments demonstrated equivalent performance between Azure OpenAI’s GPT 4o and Gemini 1.5 Flash due to their multimodal ability, outperforming Azure OpenAI’s GPT 3.5 and regex pattern matching. We also show the rulebook-based prompt design improves model performance with each LLM scoring 10% higher than the F1-score under zero-shot prompting. Additionally, results are also shown on how LLMs perform for both prompt designs under various conditions. Conclusion: With the help of a rulebook to assist LLMs to create dynamic prompts, this ensures that all relevant attributes for specific product are consistently identified and documented, thereby improving the overall quality and reliability of the extraction system. We also suggest using Gemini 1.5 Flash in commercial applications involving high traffic where cost is key factor. Key words: Generative AI, Large Language Model (LLM), Vector Database, Product attribute extraction, E-commerce

ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK

Abstract