China ASEAN Language Intelligence Institute independently builds a multilingual corpus

2025-09-19 08:47:50 science and technology

China ASEAN Language Intelligence Institute independently builds a multilingual corpus

In recent years, with the rapid development of artificial intelligence and natural language processing technology, the construction of multilingual corpus has become an important foundation for promoting cross-language communication and technological innovation. The China ASEAN Research Institute of Language Intelligence (hereinafter referred to as the "Institute") recently announced that it has successfully built a multilingual corpus covering the main languages of the ten ASEAN countries, aiming to promote language interoperability, cultural dissemination and intelligent technology cooperation between China and ASEAN countries.

The construction of this corpus not only fills the gap in the field of multilingual language resources in China, but also provides high-quality data support for artificial intelligence applications such as machine translation, speech recognition, and text analysis. Here is an overview of the main features and data of this corpus:

Language types	Corpus Scale (100 million words)	Coverage areas	Source of data
Chinese	50	News, law, science and literature	Public publications, government documents
Thai	12	Social media, news, travel	Provided by network crawling and cooperative institutions
Vietnamese	10	Economics, cultures, education	Academic papers, news media
Malay	8	Business, law, daily conversations	Corporate cooperation, translation agency
Indonesian	8	News, social media, film and television	Public data sets, network crawling

Corpus application scenarios

The construction of this corpus provides basic support for applications in multiple fields, mainly including:

1.Machine Translation: Through high-quality multilingual parallel corpus, the institute has trained a translation model that supports language pairs such as Chinese-English, Chinese-Thailand, and Chinese-Vietnam, and the translation accuracy is significantly improved.

2.Voice recognition: The voice data in the corpus provides training materials for the speech recognition systems of ASEAN countries, helping to develop applications such as intelligent voice assistants and customer service systems.

3.Cross-language information retrieval: Users can search related content in ASEAN languages through Chinese keywords, which greatly facilitates academic research and commercial information acquisition.

4.Cultural communication and research: The literature, film and television content in the corpus provides cultural scholars with rich analytical materials and promote cultural exchanges between China and ASEAN countries.

Future planning

The Institute said that the scale and language types of the corpus will be further expanded in the future, and plans to include more ASEAN small languages such as Burmese and Cambodian. At the same time, the institute will cooperate with academic institutions and enterprises in ASEAN countries to promote the open sharing of corpus and contribute to global language intelligence research.

The construction of this multilingual corpus is not only an important achievement of the China ASEAN Institute of Language Intelligence, but also provides strong support for language interoperability and technical cooperation under the "Belt and Road" initiative. With the continuous advancement of artificial intelligence technology, the application prospects of multilingual corpus will be broader.

Next article

check the details

How to optimize your website

How to Optimize Your Website: 10 Days of Hot Topics and a Structured GuideIn the era of information explosion, website optimization (SEO) has become the key to improving traffic and user experience. This article will combine the hot topics on the Internet in the past 10 days to provide you with a structured and actionable website optimization guide.1. Analysis of recent hot SEO topics (last 10 days)hot topicsfocusRel

2026-01-29 science and technology
check the details

How to make a mummy

How to make a mummyMummification was an important funeral ritual in ancient Egyptian civilization. It not only reflected the ancient Egyptians' belief in the afterlife, but also demonstrated their outstanding achievements in medicine, chemistry and embalming technology. This article will introduce the mummy making process in detail, and combine it with the hot topics and hot content on the Internet in the past 10 day

2026-01-26 science and technology
check the details

How to create a shortcut

How to Create Shortcuts: Trending Topics and Practical Guides from Around the WebAmong the hot topics on the Internet in the past 10 days, technology and efficiency tools have attracted much attention, especially the skills of creating shortcuts. This article combines hot topics to introduce you in detail how to create shortcuts in different scenarios, and attaches structured data for reference.1. Inventory of hot to

2026-01-24 science and technology
check the details

How to retrieve deleted emails

How to retrieve deleted emails? ——Comprehensive guide and practical tipsIn modern work and life, email is an indispensable communication tool. However, accidental deletion of emails often occurs, especially when important emails are deleted. How to quickly retrieve them has become a problem for many people. This article will introduce you in detail how to retrieve deleted emails, and attach the hot topic data of

2026-01-21 science and technology

Latest articles

check the details
01-29How to optimize your website
01-28How is Nanning Aoyuan Community?
01-28How to install a water purifier
01-28What are the specifications of toy cars?
01-27What's wrong with ejaculation pain?
01-26How much are Chengdu Auto Show tickets?

Recommended articles

How to make a mummy

How to make a mummyMummification was an important funeral ritual in ancient Egyptian civilization. It not only reflected

check the details

2026-01-26 science and technology
What brand is Chocolate Bear?

What brand is Chocolate Bear?Recently, the popularity of the keyword "chocolate bear" has increased significantly on soc

check the details

2026-01-26 Fashion
How to turn on the heating

How to turn on the heating: Hot topics and practical guides on the Internet in the past 10 daysAs winter approaches, hea

check the details

2026-01-26 car

Reading rankings

check the details

How about Xi'an NSFOCUS Technology?

Hanzheng Street and Wuhan International Fashion Week ended: Yunshang empowers traditional shopping streets to transform
check the details

How to shut down telecommunications card

Chongqing ranks among the top in the country for many cultural and tourism dataRecently, the Chongqing Municipal Culture
check the details

How to upgrade wp10
check the details

How to restore account after shutdown