2,587 questions
-3
votes
1
answer
40
views
Appending list for NLP (stopwords)
I am using this Twitter dataset for NLP analysis.
To do a text analysis for text mining classes I wanted to remove unnecessary words.
Therefore I downloaded stopwords dataset.
nltk.download('stopwords'...
0
votes
1
answer
35
views
Tokenization of Compound Words not Working in Quanteda in Japanese
Creating bigrams from unigrams doesn't seem to work in Japanese in {quanteda}. I can hack the text with gsub(), but I hope there's a better way. I can't post a complete reprex because SO won't allow ...
0
votes
1
answer
64
views
catelog sentences into 5 words that represent them
I have dataframe with 1000 text rows. df['text']
I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1)
every score will be in df["word1&...
0
votes
1
answer
68
views
similarity from word to sentence after doing words Embedding
I have dataframe with 1000 text rows.
I did word2vec .
Now I want to create a new field which give me the distance from each sentence to the word that i want, lets say the word "king".
I ...
0
votes
0
answers
39
views
Spacy Lemmatization not correctly lemmatizing adjectives
I am using spacy to lemmatize some text and it lemmatizes the words robotic to robotic instead of robot. Could someone help me with this?
Here is the code:
import spacy
nlp = spacy.load('en')
...
0
votes
0
answers
64
views
How can I do text pre-processing in IBM SPSS Modeler?
I want to do the standard text cleaning of removing stop word, stemming, tokenisation, etc. I think SPSS has limitation but how do people get around with this using SPSS?
Chatgpt said to use python ...
3
votes
1
answer
81
views
Regex for Parsing Japanese Parliamentary Speeches in Python
I'm a beginner in Python and am working on a project to preprocess Japanese text data for argument mining. I need to extract metadata (e.g., parliamentary session, date, speaker) and the speech ...
0
votes
1
answer
47
views
Extract Keywords from Text Vector -- one set of keyworks for each element
Please consider the reprex at the end of the post.
It works along the lines of
https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html
It extracts a set ...
0
votes
0
answers
38
views
Errors attaching metadata to corpus
I am trying to generate a corpus with two documents: one is responses of participants characterized as "supporters" and one is responses of "non-supporters". I've entered this as ...
0
votes
1
answer
91
views
Unordered txt file contents: How to design in proper dictionary
I have txt file and it's contents are unordered like below sample. I must select first row because it has train run exact time.
my txt file has couple of summary 1, 2 and so on. hence, keys are same ...
0
votes
0
answers
74
views
pdftools – How to skip errors?
I have an R script that converts all pdf files to text, but the "pdftools" package runs into various errors and stops the process. I would like to include in the code that if it finds an ...
1
vote
1
answer
51
views
Extracting Text via Web Scraping: Loop with several optional start/ end strings
I would like to webscrape the text of several press statements.
The problem I'm, currently having is, to define several strings, where the scraping of the text should start/ end. For example the ...
0
votes
1
answer
75
views
Export txt files from a corpus after preprocessing
I am struggling to export files from my corpus after preprocessing, I currently have 26 documents in my corpus, but i want to export them as txt files os they have been pre processed so i can combine ...
1
vote
1
answer
55
views
I cannot get past data(stop_words) to analyze text in text mining
It's my first attempt at text mining and I have run into a wall. This is what I have done thus far:
library(tm)
library(tidytext)
library(dplyr)
library(ggplot2)
text1 <- c("Dear land of ...
-1
votes
1
answer
26
views
Error while creating the TDM - "No applicable method for 'meta' applied to an object of class "character""
While creating the tm package TermDocumentMatrix, i am getting error. following code i have used.
int_vc <- VCorpus(int_vc)
int_vc <- tm_map(int_vc, tolower)
int_vc <- tm_map(int_vc, ...