OCR to Text Summary System

Overview

This project explores the integration of EasyOCR and Pegasus models for converting visual text into concise summaries. It investigates two distinct approaches:a sequential approach and an enclosed model system. The objective is to evaluate the efficacy of these methodologies in text extraction and summarization from images.

Integrated vs. Sequential Processing

Two processing methodologies are examined:

  • Sequential Processing: A two-step process where text extraction and summarization are conducted sequentially.
  • Integrated Processing: A custom neural network that combines OCR and summarization tasks in a single workflow.

Results and Discussion

Results

Both the training and validation loss values are decreasing over time. This indicates that the model is learning and improving its predictions as it processes more data over successive epochs. The training loss starts at 5.5284 in the first epoch and decreases consistently to 0.8207 by the 25th epoch. This consistent decrease is a good sign, showing that the model is effectively learning from the training data. The validation loss begins at 5.5352 and also decreases over time, reaching 2.3586 by the 25th epoch. The validation loss is higher than the training loss, which is common as the model is typically better at predicting data it has seen (training data) compared to new data (validation data). There’s a noticeable gap between the training and validation losses. This gap can indicate overfitting, where the model performs well on the training data but less so on unseen data. However, since the validation loss is also decreasing, it suggests that the model is still generalizing reasonably well.

Sequential Model (F1 Score: 0.87192): This model has a high F1 score, close to 1. This indicates that it has a strong balance of precision and recall. In other words, it is effectively identifying relevant information (high recall) and not including much irrelevant information (high precision) in its summaries.

Enclosed Model (F1 Score: 0.67618): The enclosed model has a lower F1 score compared to the sequential model. This suggests that it is less effective at summarization, either missing relevant information (lower recall), including more irrelevant information (lower precision), or both.

Challenges Encountered

  • Dataset Limitations: The absence of a comprehensive dataset for training and evaluation posed a significant challenge.
  • Integration Difficulties: Integrating the post-processing steps of EasyOCR with Pegasus’ summarization process proved complex, especially within the integrated model approach.
  • Summary Quality: The limited dataset adversely affected the quality of the summaries.

Conclusion

While the sequential model demonstrated superior performance in summarization tasks, the enclosed model, despite its potential, faces significant challenges that affect its effectiveness. These challenges include dataset limitations, integration complexities, and resultant impacts on summary quality. To enhance the performance of the enclosed model, addressing these challenges is crucial. This might involve expanding and diversifying the dataset, refining the integration process, and implementing additional optimizations to improve its precision and recall. Overall, the sequential model stands out as the more reliable choice for current summarization needs, but with targeted improvements, the enclosed model could also become a viable alternative.

Examples

Leave a Reply

Your email address will not be published. Required fields are marked *