This article describes an innovative approach of using LLMs (Large Language Models) for automating content tagging in a content management system. This article is based on work performed by More Awesome for Innova Market Insights.
Why bother with content tagging?
Content tagging facilitates efficient content organization, enhances searchability, and optimizes user experience. By categorizing content with relevant tags, content creators and platforms can quickly identify and deliver pertinent information to users, streamline content recommendations, and bolster SEO efforts, ultimately driving engagement and improving overall content accessibility and discoverability.
What is a taxonomy?
Taxonomy, in simple terms, is a system or method used to organize and categorize things based on their similarities. It’s like sorting your music into different playlists based on genres or arranging books on a shelf by topics; it helps you find and understand them more easily.
In taxonomy, “terms” refer to the specific labels or names used within a particular category or level of organization. Essentially, they are the individual items or units within a taxonomy system. For instance, if you have a taxonomy for colors, the “terms” could be “red,” “blue,” “yellow,” and so forth.
How does tagging normally work?
The typical manual process of tagging in a content management system (CMS) usually involves the following steps:
Content Creation
The user writes or uploads the content into the CMS. This could be an article, blog post, product description, or any other type of content.
Perform Tagging
Once the content is ready, the user navigates to the section or field in the CMS interface where tags can be added. This is often labeled as “Tags,” “Keywords,” “Categories,” or something similar. The user in most cases selects one or more previously used tags. If no existing tag is suitable, the user can create a new one.
The user must consider which tags are most relevant to the content. This may involve thinking about the content’s main themes, topics, or keywords. Some CMSs might suggest tags based on the content’s text, which the user can accept or modify.
Often, more than one tag can be relevant for a piece of content. In such cases, the user can add multiple tags to better categorize and make the content discoverable through various search terms.
Reviewing and Editing
Before publishing, the user or reviewer reviews the selected tags to ensure accuracy and relevance. Tags can be added, edited, or removed at this stage.
Publishing
Once satisfied with the tagging, the user publishes the content. Now, the content becomes searchable and categorizable within the CMS based on the assigned tags.
Maintenance and Revision
Over time, as more content gets added and the website evolves, tags might need revisiting. This can mean merging similar tags, deleting obsolete ones, or adding new tags to reflect emerging trends and topics.
This tagging process requires a clear understanding of the content and its potential relevance to the audience. Proper tagging enhances content discoverability, user experience, and plays a role in search engine optimization.
What are the benefits of automating tagging?
Automating tagging using a Large Language Model (LLM) like ChatGPT offers several benefits over manual tagging:
Speed. An LLM can process and tag vast amounts of content in a fraction of the time it would take a human, making it particularly useful for large content repositories or frequent content updates.
Consistency. Automated tagging ensures uniformity in tag selection and application. There’s no variation due to different human interpretations or biases.
Scalability. As the volume of content grows, the manual process can become unwieldy. LLMs can handle vast content scales without significant incremental effort or time.
Cost Efficiency. While there might be an initial investment in setting up the automated system, over time, using an LLM can be more cost-effective than hiring and training teams for manual tagging.
Comprehensive Analysis. LLMs can evaluate the entirety of a document, taking into account nuances and context that might be overlooked or deemed too time-consuming for manual taggers.
Reduced Human Error. Manual tagging can sometimes lead to missed tags or misinterpretations. LLMs, once correctly configured, can offer a consistent level of accuracy.
Integration Potential. LLMs can be integrated into various stages of the content lifecycle, from creation to distribution, enabling automated tagging during content ingestion, updates, or redistribution.
It’s important to note that while there are many benefits, automated tagging is not without challenges. It requires careful implementation, regular monitoring, and occasional human intervention to ensure the quality and relevance of tags.
Therefore, in this particular case, we adopted a hybrid approach combining both manual expertise and automated LLM insights.
The challenge
Our client had a vast, diverse library of content – from insight trends and detailed reports to upcoming webinars. They wanted a user-friendly way for visitors to navigate this sea of unstructured content.
We introduced a navigational tool that uses filters based on three categories: Topics (e.g., ‘sustainability’, ‘plant-based’), F&B Categories (like ‘Dairy’ or ‘Non-Alcoholic Beverages’), and Perspectives (such as ‘Consumer’ or ‘Packaging’). This allows users to refine their search. For instance, they could choose a combination like: topic: sustainability + fb-category: dairy + perspective: consumer.
The volume of content was not just vast but also growing. Manual tagging of each piece would not only be time-consuming but would also increase content production costs. Plus, with multiple people involved, there was a risk of inconsistent tagging, which could compromise the new navigation system’s efficiency.
A perfect situation for applying LLM technology!
Integrating an LLM for automating tagging in a CMS
We followed this process for implementing an LLM for tagging purposes:
- Evaluate accuracy in taxonomy tagging
- Design the CMS extension
- Develop and deploy technical integration
- Evaluate and optimize
1. Evaluate accuracy in taxonomy tagging
Before diving in, it’s vital to gauge the automated tagging’s precision by contrasting it with expert-led tagging. We took a sample of 10 content pieces, prompted ChatGPT for tags, and measured it against tags from experts. An initial prompt, for instance, asked ChatGPT to identify key topics from a predefined list and respond in a JSON format, with the article content attached. The goal? To grasp the model’s accuracy and fine-tune the prompt for optimal results. An added layer of understanding came from having ChatGPT elucidate its tagging choices.
2. Design the CMS extension
There are many ways to extend a CMS with automated tagging functionality. We decided on a hybrid approach, combining both manual expertise and automated LLM insights:
- All content of specific content types is auto-tagged upon creation or modification.
- Content tagging can be manually reviewed and changed if required.
- Content can be manually marked for auto-tagging. This allows for content managers to ‘refresh’ manually tagged content and perform auto tagging on content when needed. For example, when prompts are optimized for accuracy, content can be marked for retagging.
Given that our CMS was WordPress, we utilized its flexible extension points for this hybrid approach. From a management perspective, our solution was designed to adapt to taxonomy term changes on its own. While taxonomies and their corresponding prompts remained non-configurable for end-users, we ensured an easy-to-use plugin interface for API configurations.
3. Develop and deploy technical integration
Integrating ChatGPT’s API was straightforward and user-friendly. Leveraging ChatGPT for parts of this integration also proved fruitful. It performed really well in providing boilerplate code. Stackoverflow may need to rethink its business model.
Interestingly, while the term “chat” suggests a back-and-forth, the API functions differently. Conversations in ChatGPT’s platforms come from the front end, and all conversational context gets sent with every request.
4. Evaluate and optimize
The last step centers on reviewing tagging accuracy. For this client, feedback from the content management team, based on their tagging experience, served as an invaluable input. We’d look into patterns: Are certain tags regularly adjusted? Could prompt modifications rectify any consistent errors? At this article’s time, this phase is pending, but updates are on the horizon.
Another worthy exploration involves studying tag frequencies. This could offer insights for the content team about potential content adjustments or for refining tagging to ensure a balanced user filtering experience.
Key take-aways
The key take-aways include
- Manual content tagging, essential for user navigation and SEO, can be tedious and prone to inconsistencies.
- More Awesome integrated ChatGPT, a Large Language Model, into a CMS for automated content tagging.
- More Awesome implemented a hybrid approach, combining automated tagging with manual oversight, ensuring both speed and accuracy.
- Accuracy Evaluation: Before full implementation, ChatGPT’s automated tags are compared against expert-generated tags to optimize accuracy.
- Automated tagging using ChatGPT offers speed, consistency, scalability, and cost-efficiency, transforming content management for large platforms.