Case Studies
Building AI Training Data for Personalised Tag Recommendation in Financial & Education News Search
Project Overview
A leading AI-powered knowledge management startup in Singapore and Korea set out to enhance the way machines interpret text meaning. Its proprietary tagging algorithms automatically extract and standardise keywords, topics, entities, and sentiments from large-scale news and document data, significantly improving the accuracy of search and recommendation systems. By integrating vector-database infrastructure with Retrieval-Augmented Generation (RAG), the company delivers hyper-personalised knowledge-search experiences that help users find context-relevant information quickly and precisely.
Data Voucher Project Background
With the rapid growth of Large Language Models (LLMs), the volume of unstructured text in finance and education has surged. Traditional keyword-based searches often fail to capture semantic meaning, leaving valuable content underutilised. To address this, the company launched a Data Voucher (DV) project to build a high-quality sentence-to-tag dataset based on Singapore’s financial and educational news. The goal was to enable AI systems to understand sentence-level meaning and automatically generate accurate tags for topics, entities, and sentiments, strengthening contextual search and recommendation performance.
What We Did:
Under the DV program, Tictag executed a full data cycle: collection > annotation > refinement > quality validation, ensuring compliance with Korea’s DV standards.
- Data Collection
- Domain: Financial and educational news articles
- Language: English (Singapore-focused, including non-native sources)
- Scale: 3,000 paragraphs (≥300 characters each)
- Data Annotation & Processing
- Each sentence was analysed, and corresponding tags (e.g., Singapore, Central Bank) were matched to capture its key concept.
- Tictag generated four tag categories to ensure complete semantic coverage:
|
Tag Type |
Description |
|
Core Tag |
Represents the overall main topic of the paragraph, derived from contextual understanding |
|
Keyword Tag |
Extracts explicit entities (e.g., people, institutions, brands) mentioned in the text |
|
Location Tag |
Identifies geographic references such as cities, countries, and regions |
|
Sentiment Tag |
Assigns context-based moods (positive/negative/neutral) and trend signals (e.g., growth, decline) |
- Data Refinement & Quality Assurance
- Reviewed for tag consistency, sentence balance, and duplication.
- Converted into standardised metadata optimised for AI model training.
- Final manual validation by domain experts in finance and education ensured accuracy and compliance with DV verification standards.
Results & Impact
The project produced a machine learning–ready dataset that powers the startup’s personalised tag recommendation model, achieving:
- High-Quality Training Data: A curated sentence-to-tag dataset from Singapore’s news corpus.
- Structured Knowledge Graphs: Transformation of unstructured text into metadata-based knowledge networks.
- AI Model Optimisation: Improved semantic understanding and search efficiency in finance and education domains.
Why Tictag?
Tictag is a global AI and data solutions company building high-quality training datasets that help AI systems understand context and structure knowledge accurately. We go beyond annotation, we help clients develop AI models that learn from meaning, not just data, transforming raw information into actionable insights.
Through this project, we empowered our client to advance their AI tag recommendation technology and achieve a major milestone in semantic search innovation.
Need custom AI training data for your project? Contact us to get started.


