Spotlight

Quebec’s National Library Bets on AI Databank to Put the Province on the Cultural Map

Arshad Khan

Bibliothèque et Archives nationales du Québec, better known as BAnQ, has launched the experimental phase of a proposed government and cultural databank after completing a feasibility study earlier this year.

In a bid to make artificial intelligence systems smarter about Quebec its people, its culture, its Indigenous languages the province’s national library is moving forward with an ambitious plan that sits at the crossroads of technology, heritage, and a deeply complicated question: what happens when a community’s stories start feeding a machine?

Bibliothèque et Archives nationales du Québec, better known as BAnQ, has launched the experimental phase of a proposed government and cultural databank after completing a feasibility study earlier this year. The goal is straightforward, even if the road ahead is not: build a curated reservoir of French and Indigenous-language content that AI developers can draw on to make their systems more fluent in all things Quebec.

The frustration driving the project is one that many Quebecers who have turned to AI chatbots for local information have felt firsthand. Ask a major generative AI system about Quebec’s economy, its cultural landscape, or its Indigenous communities, and the answers can feel vague, generic, or simply wrong. The reason, researchers say, is that Quebec-specific material makes up a tiny sliver of the datasets used to train these systems.

“We run the risk of reproducing linguistic biases and cultural biases,” said Destiny Tchéhouali, co-holder of a Quebec-based research chair focused on French-language artificial intelligence and digital technologies. “And when we also talk about Indigenous peoples, we run an even greater risk of all these biases.”

Tchéhouali, a professor in the communications department at Université du Québec à Montréal, described the proposed database as potentially “strategic infrastructure” a framework that could help define how local content gets identified, catalogued, and tracked within AI systems globally.

The initiative traces its roots to a 2024 report from Quebec’s innovation council, which pointed a finger squarely at the “very small quantity of data on Quebec” present in AI training datasets. BAnQ took that recommendation seriously, commissioning a feasibility study that has now cleared the way for a 12-month experimental phase funded with $750,000 from the provincial government on top of the $340,000 already received for the feasibility work.

Valérie D’Amour, who led the feasibility study, was measured but optimistic about where things stand. “All scenarios are a little bit on the table right now,” she said. “We have a lot of ideas and we want to validate the possibilities with cultural stakeholders, as well as with data owners and providers.”

BAnQ president and CEO Marie Grégoire framed the project in terms of identity as much as technology. “That means having Quebec references, whether in small models or large models, whether they come from research or from the business community,” she said. Quebec, she added, cannot afford to remain invisible in a technology that is reshaping how the world accesses and processes information.

The plan is to start modestly drawing first from BAnQ’s own collections before considering whether to bring in data from external sources. Access to the platform would be tightly controlled, and BAnQ has been explicit that it will not function as a public distribution channel for creative works.

If the technical ambition is relatively clear, the ethical and legal terrain is considerably murkier. Copyright has emerged as one of the most contested issues as the project takes shape.

The cultural sector is not wrong to be cautious. Across the world, writers, musicians, visual artists, and publishers have raised alarms about the way their work has been swept up often without consent or compensation into the vast training datasets that power today’s AI models. Quebec’s artists are watching BAnQ’s initiative through that same anxious lens.

Grégoire, however, believes the database could actually offer creators more protection than the current free-for-all. “Right now, it’s a bit like the Wild West,” she said. “Data is being harvested for free, and that should not be the case.” A centralized, controlled platform, she argues, could serve as a gateway one that makes it easier to track whose work is being used and to ensure those creators are compensated accordingly.

The logic has appeal, but not everyone is sold. Maxime Harvey, a postdoctoral researcher at the National Institute of Scientific Research and a member of the same research chair as Tchéhouali, put the dilemma bluntly: “The main criticism we hear in the field is that, even if artists earn income from it, they are still feeding the beast that will eventually be used to replace contracts they may lose because of AI.”

It is a tension that no database architecture can fully resolve and one that BAnQ will have to navigate carefully as it brings cultural stakeholders into the conversation.

Quebec is not alone in recognizing that smaller language communities risk being sidelined in the AI era. In Scandinavia, similar efforts have taken shape, with large collections of Nordic-language texts assembled specifically to train generative AI models capable of handling Swedish, Norwegian, and Danish with far greater sophistication than general-purpose systems allow.

The parallel is instructive. What those projects have demonstrated is that building language-specific AI infrastructure is both feasible and consequential and that governments and cultural institutions, not just tech companies, have a role to play in shaping what AI knows and how it speaks.

The feasibility study envisions the platform becoming operational by 2029, though D’Amour acknowledged the timeline would be revisited once the experimental phase wraps up. Total estimated costs through 2030 sit at nearly $10.5 million, covering both operating and capital expenditures.

The months ahead will be telling. BAnQ must do the painstaking work of consulting with the communities whose content would populate the database including Indigenous language speakers whose oral traditions and written records carry particular sensitivities around sovereignty and cultural stewardship.

What is certain is that the stakes go beyond a software project. In Quebec, language has always been politics. And as artificial intelligence becomes an increasingly dominant medium through which people encounter information about the world, who controls the data that trains those systems and whose stories end up inside them is a question with consequences that will outlast any five-year budget cycle.

Related Articles

Back to top button