Skip to main content

NLP Collective

A collective focused on NLP (natural language processing), the transformation or extraction of useful information from natural language data.
38.5k Questions
+8
10.6k Members
+71
Contact

Pinned content

View all 2 collections

NLP admins have deemed these posts noteworthy.

Pinned
9 votes
2k views
Collection

Natural Language Processing FAQ

Frequently asked questions relating to NLP. Many of these may be questions that are often asked over and over, duplicates would likely be closed in favor of these. Add the best answer (using the ...
Berthold's user avatar
  • 101

Can you answer these questions?

View all unanswered questions

These questions still don't have an answer

-1 votes
0 answers
19 views

transformer datasets 4.0.0, load_dataset issue

dataset: 4.0.0 pytorch: 2.7.1+cu126 system: Ubuntu 22.04.5 LTS Following the official example I tried this code: import datasets print(datasets.__version__) import torch print(torch.__version__) ...
-1 votes
0 answers
20 views

Problems installing pytorch with Anaconda - InvalidArchiveError ("Error with archive ...//pytorch-2.6.0-cpu_mkl_py3)

I installed the latest Anaconda and updated everything. When I try to install bertopic or pytorch itself I'm getting this error: InvalidArchiveError("Error with archive C:\Users\myuser\AppData\...
0 votes
0 answers
25 views

How can I get pooled projected output from clip from transformers library where I dont have token embeddings?

I want to use text_embeddings and combine them with output of an intermediate layer of the text_encoder of the clip. My input to the text_encoder is a learnable prompt embeddings which is intialized ...
0 votes
0 answers
22 views

Custom NER to extract header, request and response from API document

I'm trying to extract API integration parameters like Authorization headers, query params, and request body fields from API documentation. This is essentially a custom NER task. I’ve experimented with ...
0 votes
0 answers
40 views

How can I train a multilingual Word2Vec model with aligned embeddings?

I'm working on a cross-lingual project involving semantic search in both Persian and English. I want to create a single Word2Vec model where semantically equivalent words (e.g., "خانه" ↔︎ &...