top of page

How a Brazilian IT company is leading the digitization of historical collections for universities

  • Writer: Marcelo Firpo
    Marcelo Firpo
  • Feb 26
  • 5 min read

Innovation in the digitization of historical collections Automation in research and cataloging Scalability through artificial intelligence

Imagem gerada por IA
Imagem gerada por IA

AI generated summary:

A Brazilian company specializing in digital automation and document management has implemented, with the support of BlueMetrics, an innovative AI solution to modernize access to historical collections at higher education institutions. The project automates the extraction, organization and search of historical documents, using advanced semantic search and image processing techniques to structure information in a contextualized way.


 

Overview

The client in question is a technology company with over 20 years of experience in the market, offering innovative solutions in digital automation for processes and documents. The company is a reference in digital automation and document management in Brazil, standing out for supporting large higher education institutions in their digital transformation.

The Smart Search in Newspaper Archives project was created to meet the growing demand for digitization and efficient access to historical information. This solution directly addresses challenges faced by libraries, public archives, universities and media organizations.


Market context:


  • Increased demand for digitization and organization of historical collections.

  • Need to preserve unique and valuable documents.

  • Search for greater agility and precision in documentary research.


 

Problem: How to improve the research experience in historical collections?

Researching historical archives presents complex challenges that directly affect operational efficiency, information quality, and the growth potential of organizations. The main obstacles include the deterioration of physical documents, obsolete technological systems, and the difficulty in providing accurate and contextualized results. These obstacles result in slow processes, high costs, and a less than ideal user experience, in addition to limiting the scalability and competitiveness of the services offered.


According to Gabriel Casara, CGO at BlueMetrics, “This is yet another practical example where AI can really make a difference in everyday life, streamlining processes and freeing up teams for more strategic, less manual work.”



 

Main challenges:


  1. Operational:

    • Deteriorated or low-quality digital documents.

    • Search limited to exact keywords, without contextualization.

    • Difficulty in relating information between different editions.

    • Time-consuming manual searches.

    • Low capacity to meet multiple simultaneous demands.

    • Loss of historical context.

    • Difficulty in validating sources and references.


  2. Technological:

    • Lack of structured data extraction.

    • Search systems with low precision and relevance.


  3. Business:

    • Rework for data validation.

    • Limitation on the expansion of services offered.

    • High cost of specialized labor.


 

The solution: automation and scalability using AI

Imagem gerada por IA
Imagem gerada por IA

Based on this need, BlueMetrics implemented a robust solution that combines cutting-edge technologies to modernize access to historical collections.

According to Diórgenes Eugênio, Head of Gen AI at BlueMetrics, “This was undoubtedly one of the most challenging projects of the year. In addition to the complexity of dealing with the deterioration of the original material, we had to deal with the challenge of organizing the texts while maintaining the relationship between the title of the article and the text of the article. This was the biggest challenge, since the extraction is done in an unstructured manner: that is, each word is extracted without any relation to the others. To overcome this challenge, we thought of some strategies, such as using the coordinates of the extracted words to assemble the text with a logical sequence. In addition, we used the sizes of the identified text boxes to try to separate the texts of the articles from the texts of the titles. This last approach significantly improved the processing of the large language models in the correlation of these relationships. These were the main challenges of the first component of this project, the extraction of information. After overcoming this stage, we faced the challenges of the second component, the search. In the search, the biggest obstacle was ensuring that all the articles actually had a relevant semantic correlation. To do this, we searched the literature for some approaches, mainly using confidence scores in the returned vectors.”


Main features


  • Contextual preservation: maintenance of the historical and documentary context.

  • Advanced semantic search: more accurate and relevant results.

  • Process automation: reducing search time and increasing efficiency.

  • Digital scalability: infrastructure prepared for large volumes of data.


Technological components:


  1. Intelligent extraction system:

    • Image processing and automatic text organization.

    • Structuring data with hierarchical relationships between titles and contents.


  2. Semantic search engine:

    • Contextual search with high accuracy.

    • Correlation of terms and identification of relevant sources.

    • Filtering by minimum relevance.


  3. Technological innovations:

    • Use of bounding box for spatial organization.

    • Vector database with embeddings and metadata.

    • Large scale processing.



How about developing a solution like this for your company?


 

Results:


The solution developed brought significant advances, generating significant impacts on operational efficiency, information quality and commercial strategy. With cutting-edge technology, it was possible to optimize research processes, preserve the historical context of data and increase the scalability of operations. These results transformed the challenges faced into competitive advantages, consolidating the modernization and strategic value of access to historical collections .


According to Gabriel Casara, “This is a solution that has enormous potential for solving similar problems in other types of companies and business segments, and can be easily adapted within the context of our proprietary work method, blue4AI.”


Operational benefits:

  • Reduction of up to 80% in documentary research time.

  • Automation of structured data extraction.

  • Significant increase in simultaneous service capacity.


Technological benefits:

  • Modernization of access to collections.

  • Scalable infrastructure for large volumes of data.

  • Preservation of historical documents in standardized formats.


Strategic benefits:

  • Competitive differentiation in the market.

  • Potential for new business models and partnerships.

  • Improved end-user experience.


 

Technologies used

The solution was designed using several AWS technologies, including:


AWS Services

  • Textract

  • Lambda

  • Bedrock

  • S3

  • DynamoDB


Languages, Libs and Frameworks

  • Python

  • Pillow

  • Fitz

  • FPDF



 

Conclusion:


Thanks to the solution developed, the client was able to overcome significant challenges in digitizing and searching collections, and further consolidated its position as a leader in the digital transformation segment by providing higher education customers with a faster, more accurate and scalable research experience. By combining automation, artificial intelligence and contextual preservation, the company transformed the way historical information is accessed and used.


This model not only benefits educational institutions, public archives and libraries, but also opens up the possibility for other public and private organizations, from the most diverse segments, to take advantage of the potential of AI to improve their own processes and services.


How about creating a case like this for your company? Let's schedule a call?

Discover some Use Cases .


 

About BlueMetrics
BlueMetrics was founded in 2016 and has already delivered more than 160 successful solutions in the areas of Data & Analytics, GenAI and Machine Learning for more than 70 companies in the United States, Brazil, Argentina, Colombia and Mexico. It has its own methodology and a multidisciplinary team focused on delivering solutions to real challenges in the business world.


Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page