EN:AI: Difference between revisions

From IP7 Wiki
No edit summary
No edit summary
Line 100: Line 100:


The current status can be retrieved using the Refresh button: <br/>
The current status can be retrieved using the Refresh button: <br/>
[[File:AI_classifier_Status_refresh.jpg|300px]]
[[File:AI_classifier_Status_refresh.jpg|500px]]


As soon as the classifier has finished training (status "Ready"), it can be used in an automatic classification. <br/>
As soon as the classifier has finished training (status "Ready"), it can be used in an automatic classification. <br/>
Line 106: Line 106:
==== Training statistics ====
==== Training statistics ====
Evaluation of the last training run. <br/>
Evaluation of the last training run. <br/>
[File:AI_classifier_Status_analysis.jpg|300px]]
[[File:AI_classifier_Status_analysis.jpg|500px]]


Further information on the terms Precision, F1Score and Recall: <br/>
Further information on the terms Precision, F1Score and Recall: <br/>

Revision as of 11:48, 25 March 2024

These functions are currently still in the pilot phase.
Individual functions, interfaces and scope may change in the future.

The AI functions are not available in Compass by default.
It is an additional module that can be activated by IP7 on request.
There are additional costs for the AI module.
The AI from Averbis has been integrated into Compass.

The AI can (in future) be used for various areas in Compass:

  • Folder assignment or automatic classification of patents
  • Automatic sorting out of "not relevant" hits (available soon!)
  • Sorting of the results list (coming soon!)

The main feature of an AI is that it is trained on the basis of predefined data.
The AI should improve or "learn" through constant human correction and renewed training.
AI is therefore always dependent on human performance.
The quality of the AI is therefore always dependent on this given data or on a human correcting the AI.


Folder assignment

Folder structures are created in Compass to classify patents according to individual criteria.
For example, a technology tree can be created to assign patents to specific technologies.
With the "Automatic classification" function, the AI can take over this assignment task. <br

Classifier

AI classifier.jpg

Before the AI can assign patents to folders, the AI must learn or be trained based on existing assignments.
An existing folder structure with already correctly assigned patents is required.

A "classifier" can then be created.
Some settings are defined here, whereby many of the settings affect the training:

  • Activate

Here you can specify whether the classifier can be used or whether regular automatic training takes place.

  • Name
  • Training frequency

This defines how often the classifier is trained automatically.
As the contents of the folders change over time, it makes sense to have the classifier trained regularly.

  • Max. # Documents per folder

The maximum number of documents per folder that the AI can use for training.
Too few documents per folder will reduce the quality of the training.
Too many documents per folder and the training will take a lot of time.

Ideally, the AI should always receive the same number of documents for each folder.
For example, if there are a few folders with approx. 500 assignments and many folders with approx. 50 assignments.
Then the value should be set to 50 so that training can be balanced and better results can be achieved.
The limit value here is a maximum of 1,000 documents.
If the number in a folder is higher, 1,000 documents are randomly selected from this folder.

  • Min. # documents required for training

What is the minimum number of documents that must be in the folder for them to be used in training?
If there are only 3 documents in a folder, for example, the AI will not be able to draw any good conclusions from them.
A minimum value of 10 documents must be specified.
If the number in a folder is smaller, these folders/documents are not used.

  • Min. confidence level

"Confidence Level" (in per cent) describes how confident the AI is with the assignment.
Everything below the minimum value ends up in the "unclassifiable" folder.
This is defined later in the automatic classification.

  • Folder

Here you can define which folders are used for training.
The "automatic classification" will later make the assignments in these folders.
A maximum of 500 folders can be selected.

  • Data

Here you define which data the AI receives for the training:
Title, Summary, Claims, Description, IPC, CPC
If Description is selected, the AI receives significantly more data for training.
This will also have a corresponding effect on the training time.

  • Status

The current status of the classifier is displayed here:

"Training required"
After the classifier has been created, training is required.

"Training"
The classifier is currently being trained.
During training, automatic classifications that use this classifier cannot be executed.

"Ready"
The classifier is ready for use in an automatic classification.
This also means that the last training session was carried out successfully.

"Error"
An error has occurred.
Successful training is a prerequisite for use in automatic classification.

  • Last training

This shows when the last (successful) training session was carried out.

The current status can be retrieved using the Refresh button:
AI classifier Status refresh.jpg

As soon as the classifier has finished training (status "Ready"), it can be used in an automatic classification.

Training statistics

Evaluation of the last training run.
AI classifier Status analysis.jpg

Further information on the terms Precision, F1Score and Recall:
https://en.wikipedia.org/wiki/Precision_and_recall

Limitations

In total (across all folders used in the classifier), a maximum of 10,000 patents are used for training.
If there are more than 10,000 patents in the folders, these are reduced proportionally.
If a folder is then below the minimum number of patents, the selected patents are increased to the minimum number.
This means that the limit is not exactly 10,000.


Automatic classification

AI automatic classification.jpg

The following settings are available for automatic classification:

  • Activate

Here you can specify whether classification is to be carried out automatically.
This option can be used to quickly stop an active automatic classification in the event of problems.

  • Name
  • Monitored folders

The "Incoming folders" are defined here.
All patents in these folders are then classified by the AI.
And, logically, all patents that are assigned to these folders in the future.

Not to be confused with the folders of the classifier:
The folders into which the AI classifies/assigns the patents are defined in the classifier.

  • Non-classifiable" folder

All patents that cannot be classified by the AI are assigned here.

  • Classifier

The previously created classifier is selected here.

  • Status

"idle" - automatic classification is currently not running.
"running" - automatic classification is currently running.

  • Last checked on

The system regularly checks whether "new" patents are available for automatic classification.
If "new" patents are available, automatic classification is started.
The date indicates when a check was last carried out.
A check can also be triggered manually using the "Execute" button.


A classifier can theoretically be used for several automatic classifications.
An example of this:

There are several vehicle types that are monitored and should then be assigned to a technology tree:
"1, vehicle types" -> "bicycle" and "motorbike"

There is a folder structure or technology tree in which all vehicle types (vehicles with 2 wheels) are to be assigned:
"2, two wheel technologies"

However, the hits that the AI cannot classify should be saved separately.
For this reason, an automatic classification is created for "bicycle" and "motorbike".
In this case, however, the classifier only needs to be created once.

Limitations

A maximum of 5,000 hits/patents can be classified in one run.

If there are more than 5,000, automatic classification is not carried out and is deactivated.
If the run is triggered manually, a corresponding warning is displayed.
If the run is then started anyway, only up to 5,000 patents are classified.

Patents that have been classified in a run do not have to be removed from the input folder.
They are recognised as already classified in the next run and are not classified again.
To have one or more patents reclassified, they must be removed from the folder and then reassigned.
This reassignment means that they are no longer recognised as already classified.