Integrity
Write
Loading...

James Brockbank

3 years ago

Canonical URLs for Beginners

More on Technology

Frank Andrade

Frank Andrade

3 years ago

I discovered a bug that allowed me to use ChatGPT to successfully web scrape. Here's how it operates.

This method scrapes websites with ChatGPT (demo with Amazon and Twitter)

Photo by Mikhail Nilov on Pexels

In a recent article, I demonstrated how to scrape websites using ChatGPT prompts like scrape website X using Python.

But that doesn’t always work.

After scraping dozens of websites with ChatGPT, I realized that simple prompts rarely work for web scraping.

Using ChatGPT and basic HTML, we can scrape any website.

First things first, use ChatGPT's advanced version (Playground)

ChatGPT Playground's enhanced version is needed to scrape websites quickly. It generates code faster and has less constraints.

Here’s how it looks.

The classic ChatGPT display just allows you to type a prompt. Playground is more customizable and generates code faster than the basic version.

No more delays or restrictions.

In this tutorial, we'll write prompts underneath Playground.

How to Use ChatGPT to Scrape Any Website

To show you how to utilize ChatGPT to scrape any website, we'll use subslikescript, which lists movies.

Later in this post, I'll show you how to scrape Amazon and Twitter, but let's start easy.

If we try a basic prompt like “scrape the movie titles on this website: https://subslikescript.com/movies” it won’t scrape anything. Here’s when we need some HTML knowledge to build the right prompt for ChatGPT.Let's scrape the above movie titles. First, check the webpage. Right-click any element and choose Inspect.

Then see below.

We identify the element to scrape (those in the red box). The data elements are ul, a, and scripts-list.

Here's the prompt for movie titles.

Scrape this website: https://subslikescript.com/movies with Python and Beautiful Soup. Locate the element with tag “ul” and class “scripts-list”. Scrape all the “a” elements inside. Get the text attribute and print it.

Code generated:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://subslikescript.com/movies/')

soup = BeautifulSoup(page.content, 'html.parser')

scripts_list = soup.find(class_="scripts-list")

all_a_elements = scripts_list.find_all('a')

for element in all_a_elements:
    print(element.get_text())

It extracts movie titles successfully.

Let's scrape Amazon and Twitter.

ChatGPT's Amazon scraping

Consider scraping Amazon for self-help books. First, copy the Amazon link for self-help books.

Here’s the link I got. Location-dependent connection. Use my link to replicate my results.

Now we'll check book titles. Here's our element.

If we want to extract the book titles, we need to use the tag name spanclass attribute name and a-size-base-plus a-color-base a-text-normalattribute value.

This time I'll use Selenium. I'll add Selenium-specific commands like wait 5 seconds and generate an XPath.

Scrape this website https://www.amazon.com/s?k=self+help+books&sprefix=self+help+%2Caps%2C158&ref=nb_sb_ss_ts-doa-p_2_10 with Python and Selenium.

Wait 5 seconds and locate all the elements with the following xpath: “span” tag, “class” attribute name, and “a-size-base-plus a-color-base a-text-normal” attribute value. Get the text attribute and print them.

Code generated: (I only had to manually add the path where my chromedriver is located).

from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep

#initialize webdriver
driver = webdriver.Chrome('<add path of your chromedriver>')

#navigate to the website
driver.get("https://www.amazon.com/s?k=self+help+books&sprefix=self+help+%2Caps%2C158&ref=nb_sb_ss_ts-doa-p_2_10")

#wait 5 seconds to let the page load
sleep(5)

#locate all the elements with the following xpath
elements = driver.find_elements(By.XPATH, '//span[@class="a-size-base-plus a-color-base a-text-normal"]')

#get the text attribute of each element and print it
for element in elements:
    print(element.text)

#close the webdriver
driver.close()

It pulls Amazon book titles.

Utilizing ChatGPT to scrape Twitter

Say you wish to scrape ChatGPT tweets. Search Twitter for ChatGPT and copy the URL.

Here’s the link I got. We must check every tweet. Here's our element.

To extract a tweet, use the div tag and lang attribute.

Again, Selenium.

Scrape this website: https://twitter.com/search?q=chatgpt&src=typed_query using Python, Selenium and chromedriver.

Maximize the window, wait 15 seconds and locate all the elements that have the following XPath: “div” tag, attribute name “lang”. Print the text inside these elements.

Code generated: (again, I had to add the path where my chromedriver is located)

from selenium import webdriver
import time

driver = webdriver.Chrome("/Users/frankandrade/Downloads/chromedriver")
driver.maximize_window()
driver.get("https://twitter.com/search?q=chatgpt&src=typed_query")
time.sleep(15)

elements = driver.find_elements_by_xpath("//div[@lang]")
for element in elements:
    print(element.text)

driver.quit()

You'll get the first 2 or 3 tweets from a search. To scrape additional tweets, click X times.

Congratulations! You scraped websites without coding by using ChatGPT.

Waleed Rikab, PhD

Waleed Rikab, PhD

2 years ago

The Enablement of Fraud and Misinformation by Generative AI What You Should Understand

Recent investigations have shown that generative AI can boost hackers and misinformation spreaders.

Generated through Stable Diffusion with a prompt by the author

Since its inception in late November 2022, OpenAI's ChatGPT has entertained and assisted many online users in writing, coding, task automation, and linguistic translation. Given this versatility, it is maybe unsurprising but nonetheless regrettable that fraudsters and mis-, dis-, and malinformation (MDM) spreaders are also considering ChatGPT and related AI models to streamline and improve their operations.

Malign actors may benefit from ChatGPT, according to a WithSecure research. ChatGPT promises to elevate unlawful operations across many attack channels. ChatGPT can automate spear phishing attacks that deceive corporate victims into reading emails from trusted parties. Malware, extortion, and illicit fund transfers can result from such access.

ChatGPT's ability to simulate a desired writing style makes spear phishing emails look more genuine, especially for international actors who don't speak English (or other languages like Spanish and French).

This technique could let Russian, North Korean, and Iranian state-backed hackers conduct more convincing social engineering and election intervention in the US. ChatGPT can also create several campaigns and various phony online personas to promote them, making such attacks successful through volume or variation. Additionally, image-generating AI algorithms and other developing techniques can help these efforts deceive potential victims.

Hackers are discussing using ChatGPT to install malware and steal data, according to a Check Point research. Though ChatGPT's scripts are well-known in the cyber security business, they can assist amateur actors with little technical understanding into the field and possibly develop their hacking and social engineering skills through repeated use.

Additionally, ChatGPT's hacking suggestions may change. As a writer recently indicated, ChatGPT's ability to blend textual and code-based writing might be a game-changer, allowing the injection of innocent content that would subsequently turn out to be a malicious script into targeted systems. These new AI-powered writing- and code-generation abilities allow for unique cyber attacks, regardless of viability.

OpenAI fears ChatGPT usage. OpenAI, Georgetown University's Center for Security and Emerging Technology, and Stanford's Internet Observatory wrote a paper on how AI language models could enhance nation state-backed influence operations. As a last resort, the authors consider polluting the internet with radioactive or misleading data to ensure that AI language models produce outputs that other language models can identify as AI-generated. However, the authors of this paper seem unaware that their "solution" might cause much worse MDM difficulties.

Literally False News

The public argument about ChatGPTs content-generation has focused on originality, bias, and academic honesty, but broader global issues are at stake. ChatGPT can influence public opinion, troll individuals, and interfere in local and national elections by creating and automating enormous amounts of social media material for specified audiences.

ChatGPT's capacity to generate textual and code output is crucial. ChatGPT can write Python scripts for social media bots and give diverse content for repeated posts. The tool's sophistication makes it irrelevant to one's language skills, especially English, when writing MDM propaganda.

I ordered ChatGPT to write a news piece in the style of big US publications declaring that Ukraine is on the verge of defeat in its fight against Russia due to corruption, desertion, and exhaustion in its army. I also gave it a fake reporter's byline and an unidentified NATO source's remark. The outcome appears convincing:

Worse, terrible performers can modify this piece to make it more credible. They can edit the general's name or add facts about current wars. Furthermore, such actors can create many versions of this report in different forms and distribute them separately, boosting its impact.

In this example, ChatGPT produced a news story regarding (fictional) greater moviegoer fatality rates:

Editing this example makes it more plausible. Dr. Jane Smith, the putative author of the medical report, might be replaced with a real-life medical person or a real victim of this supposed medical hazard.

Can deceptive texts be found? Detecting AI text is behind AI advancements. Minor AI-generated text alterations can upset these technologies.

Some OpenAI individuals have proposed covert methods to watermark AI-generated literature to prevent its abuse. AI models would create information that appears normal to humans but would follow a cryptographic formula that would warn other machines that it was AI-made. However, security experts are cautious since manually altering the content interrupts machine and human detection of AI-generated material.

How to Prepare

Cyber security and IT workers can research and use generative AI models to fight spear fishing and extortion. Governments may also launch MDM-defence projects.

In election cycles and global crises, regular people may be the most vulnerable to AI-produced deceit. Until regulation or subsequent technical advances, individuals must recognize exposure to AI-generated fraud, dating scams, other MDM activities.

A three-step verification method of new material in suspicious emails or social media posts can help identify AI content and manipulation. This three-step approach asks about the information's distribution platform (is it reliable? ), author (is the reader familiar with them? ), and plausibility given one's prior knowledge of the topic.

Consider a report by a trusted journalist that makes shocking statements in their typical manner. AI-powered fake news may be released on an unexpected platform, such as a newly created Facebook profile. However, if it links to a known media source, it is more likely to be real.

Though hard and subjective, this verification method may be the only barrier against manipulation for now.

AI language models:

How to Recognize an AI-Generated Article ChatGPT, the popular AI-powered chatbot, can and likely does generate medium.com-style articles.

AI-Generated Text Detectors Fail. Do This. Online tools claim to detect ChatGPT output. Even with superior programming, I tested some of these tools. pub

Why Original Writers Matter Despite AI Language Models Creative writers may never be threatened by AI language models.

Nicolas Tresegnie

Nicolas Tresegnie

3 years ago

Launching 10 SaaS applications in 100 days

Photo by Mauro Sbicego / Unsplash

Apocodes helps entrepreneurs create SaaS products without writing code. This post introduces micro-SaaS and outlines its basic strategy.

Strategy

Vision and strategy differ when starting a startup.

  • The company's long-term future state is outlined in the vision. It establishes the overarching objectives the organization aims to achieve while also justifying its existence. The company's future is outlined in the vision.

  • The strategy consists of a collection of short- to mid-term objectives, the accomplishment of which will move the business closer to its vision. The company gets there through its strategy.

The vision should be stable, but the strategy must be adjusted based on customer input, market conditions, or previous experiments.

Begin modestly and aim high.

Be truthful. It's impossible to automate SaaS product creation from scratch. It's like climbing Everest without running a 5K. Physical rules don't prohibit it, but it would be suicide.

Apocodes 5K equivalent? Two options:

  • (A) Create a feature that includes every setting option conceivable. then query potential clients “Would you choose us to build your SaaS solution if we offered 99 additional features of the same caliber?” After that, decide which major feature to implement next.

  • (B) Build a few straightforward features with just one or two configuration options. Then query potential clients “Will this suffice to make your product?” What's missing if not? Finally, tweak the final result a bit before starting over.

(A) is an all-or-nothing approach. It's like training your left arm to climb Mount Everest. My right foot is next.

(B) is a better method because it's iterative and provides value to customers throughout.

Focus on a small market sector, meet its needs, and expand gradually. Micro-SaaS is Apocode's first market.

What is micro-SaaS.

Micro-SaaS enterprises have these characteristics:

  • A limited range: They address a specific problem with a small number of features.

  • A small group of one to five individuals.

  • Low external funding: The majority of micro-SaaS companies have Total Addressable Markets (TAM) under $100 million. Investors find them unattractive as a result. As a result, the majority of micro-SaaS companies are self-funded or bootstrapped.

  • Low competition: Because they solve problems that larger firms would rather not spend time on, micro-SaaS enterprises have little rivalry.

  • Low upkeep: Because of their simplicity, they require little care.

  • Huge profitability: Because providing more clients incurs such a small incremental cost, high profit margins are possible.

Micro-SaaS enterprises created with no-code are Apocode's ideal first market niche.

We'll create our own micro-SaaS solutions to better understand their needs. Although not required, we believe this will improve community discussions.

The challenge

In 100 days (September 12–December 20, 2022), we plan to build 10 micro-SaaS enterprises using Apocode.

They will be:

  • Self-serve: Customers will be able to use the entire product experience without our manual assistance.

  • Real: They'll deal with actual issues. They won't be isolated proofs of concept because we'll keep up with them after the challenge.

  • Both free and paid options: including a free plan and a free trial period. Although financial success would be a good result, the challenge's stated objective is not financial success.

This will let us design Apocodes features, showcase them, and talk to customers.

(Edit: The first micro-SaaS was launched!)

Follow along

If you want to follow the story of Apocode or our progress in this challenge, you can subscribe here.

If you are interested in using Apocode, sign up here.

If you want to provide feedback, discuss the idea further or get involved, email me at nicolas.tresegnie@gmail.com

You might also like

Vitalik

Vitalik

4 years ago

An approximate introduction to how zk-SNARKs are possible (part 1)

You can make a proof for the statement "I know a secret number such that if you take the word ‘cow', add the number to the end, and SHA256 hash it 100 million times, the output starts with 0x57d00485aa". The verifier can verify the proof far more quickly than it would take for them to run 100 million hashes themselves, and the proof would also not reveal what the secret number is.

In the context of blockchains, this has 2 very powerful applications: Perhaps the most powerful cryptographic technology to come out of the last decade is general-purpose succinct zero knowledge proofs, usually called zk-SNARKs ("zero knowledge succinct arguments of knowledge"). A zk-SNARK allows you to generate a proof that some computation has some particular output, in such a way that the proof can be verified extremely quickly even if the underlying computation takes a very long time to run. The "ZK" part adds an additional feature: the proof can keep some of the inputs to the computation hidden.

You can make a proof for the statement "I know a secret number such that if you take the word ‘cow', add the number to the end, and SHA256 hash it 100 million times, the output starts with 0x57d00485aa". The verifier can verify the proof far more quickly than it would take for them to run 100 million hashes themselves, and the proof would also not reveal what the secret number is.

In the context of blockchains, this has two very powerful applications:

  1. Scalability: if a block takes a long time to verify, one person can verify it and generate a proof, and everyone else can just quickly verify the proof instead
  2. Privacy: you can prove that you have the right to transfer some asset (you received it, and you didn't already transfer it) without revealing the link to which asset you received. This ensures security without unduly leaking information about who is transacting with whom to the public.

But zk-SNARKs are quite complex; indeed, as recently as in 2014-17 they were still frequently called "moon math". The good news is that since then, the protocols have become simpler and our understanding of them has become much better. This post will try to explain how ZK-SNARKs work, in a way that should be understandable to someone with a medium level of understanding of mathematics.

Why ZK-SNARKs "should" be hard

Let us take the example that we started with: we have a number (we can encode "cow" followed by the secret input as an integer), we take the SHA256 hash of that number, then we do that again another 99,999,999 times, we get the output, and we check what its starting digits are. This is a huge computation.

A "succinct" proof is one where both the size of the proof and the time required to verify it grow much more slowly than the computation to be verified. If we want a "succinct" proof, we cannot require the verifier to do some work per round of hashing (because then the verification time would be proportional to the computation). Instead, the verifier must somehow check the whole computation without peeking into each individual piece of the computation.

One natural technique is random sampling: how about we just have the verifier peek into the computation in 500 different places, check that those parts are correct, and if all 500 checks pass then assume that the rest of the computation must with high probability be fine, too?

Such a procedure could even be turned into a non-interactive proof using the Fiat-Shamir heuristic: the prover computes a Merkle root of the computation, uses the Merkle root to pseudorandomly choose 500 indices, and provides the 500 corresponding Merkle branches of the data. The key idea is that the prover does not know which branches they will need to reveal until they have already "committed to" the data. If a malicious prover tries to fudge the data after learning which indices are going to be checked, that would change the Merkle root, which would result in a new set of random indices, which would require fudging the data again... trapping the malicious prover in an endless cycle.

But unfortunately there is a fatal flaw in naively applying random sampling to spot-check a computation in this way: computation is inherently fragile. If a malicious prover flips one bit somewhere in the middle of a computation, they can make it give a completely different result, and a random sampling verifier would almost never find out.


It only takes one deliberately inserted error, that a random check would almost never catch, to make a computation give a completely incorrect result.

If tasked with the problem of coming up with a zk-SNARK protocol, many people would make their way to this point and then get stuck and give up. How can a verifier possibly check every single piece of the computation, without looking at each piece of the computation individually? There is a clever solution.

see part 2

Startup Journal

Startup Journal

3 years ago

The Top 14 Software Business Ideas That Are Sure To Succeed in 2023

Software can change any company.

Photo by Marvin Meyer on Unsplash

Software is becoming essential. Everyone should consider how software affects their lives and others'.

Software on your phone, tablet, or computer offers many new options. We're experts in enough ways now.

Software Business Ideas will be popular by 2023.

ERP Programs

ERP software meets rising demand.

ERP solutions automate and monitor tasks that large organizations, businesses, and even schools would struggle to do manually.

ERP software could reach $49 billion by 2024.

CRM Program

CRM software is a must-have for any customer-focused business.

Having an open mind about your business services and products allows you to change platforms.

Another company may only want your CRM service.

Medical software

Healthcare facilities need reliable, easy-to-use software.

EHRs, MDDBs, E-Prescribing, and more are software options.

The global medical software market could reach $11 billion by 2025, and mobile medical apps may follow.

Presentation Software in the Cloud

SaaS presentation tools are great.

They're easy to use, comprehensive, and full of traditional Software features.

In today's cloud-based world, these solutions make life easier for people. We don't know about you, but we like it.

Software for Project Management

People began working remotely without signs or warnings before the 2020 COVID-19 pandemic.

Many organizations found it difficult to track projects and set deadlines.

With PMP software tools, teams can manage remote units and collaborate effectively.

App for Blockchain-Based Invoicing

This advanced billing and invoicing solution is for businesses and freelancers.

These blockchain-based apps can calculate taxes, manage debts, and manage transactions.

Intelligent contracts help blockchain track transactions more efficiently. It speeds up and improves invoice generation.

Software for Business Communications

Internal business messaging is tricky.

Top business software tools for communication can share files, collaborate on documents, host video conferences, and more.

Payroll Automation System

Software development also includes developing an automated payroll system.

These software systems reduce manual tasks for timely employee payments.

These tools help enterprise clients calculate total wages quickly, simplify tax calculations, improve record-keeping, and support better financial planning.

System for Detecting Data Leaks

Both businesses and individuals value data highly. Yahoo's data breach is dangerous because of this.

This area of software development can help people protect their data.

You can design an advanced data loss prevention system.

AI-based Retail System

AI-powered shopping systems are popular. The systems analyze customers' search and purchase patterns and store history and are equipped with a keyword database.

These systems offer many customers pre-loaded products.

AI-based shopping algorithms also help users make purchases.

Software for Detecting Plagiarism

Software can help ensure your projects are original and not plagiarized.

These tools detect plagiarized content that Google, media, and educational institutions don't like.

Software for Converting Audio to Text

Machine Learning converts speech to text automatically.

These programs can quickly transcribe cloud-based files.

Software for daily horoscopes

Daily and monthly horoscopes will continue to be popular.

Software platforms that can predict forecasts, calculate birth charts, and other astrology resources are good business ideas.

E-learning Programs

Traditional study methods are losing popularity as virtual schools proliferate and physical space shrinks.

Khan Academy online courses are the best way to keep learning.

Online education portals can boost your learning. If you want to start a tech startup, consider creating an e-learning program.

Conclusion

Software is booming. There's never been a better time to start a software development business, with so many people using computers and smartphones. This article lists eight business ideas for 2023. Consider these ideas if you're just starting out or looking to expand.

Pat Vieljeux

Pat Vieljeux

3 years ago

The three-year business plan is obsolete for startups.

If asked, run.

Austin Distel — Unsplash

An entrepreneur asked me about her pitch deck. A Platform as a Service (PaaS).

She told me she hadn't done her 5-year forecasts but would soon.

I said, Don't bother. I added "time-wasting."

“I've been asked”, she said.

“Who asked?”

“a VC”

“5-year forecast?”

“Yes”

“Get another VC. If he asks, it's because he doesn't understand your solution or to waste your time.”

Some VCs are lagging. They're still using steam engines.

10-years ago, 5-year forecasts were requested.

Since then, we've adopted a 3-year plan.

But It's outdated.

Max one year.

What has happened?

Revolutionary technology. NO-CODE.

Revolution's consequences?

Product viability tests are shorter. Hugely. SaaS and PaaS.

Let me explain:

  • Building a minimum viable product (MVP) that works only takes a few months.

  • 1 to 2 months for practical testing.

  • Your company plan can be validated or rejected in 4 months as a consequence.

After validation, you can ask for VC money. Even while a prototype can generate revenue, you may not require any.

Good VCs won't ask for a 3-year business plan in that instance.

One-year, though.

If you want, establish a three-year plan, but realize that the second year will be different.

You may have changed your business model by then.

A VC isn't interested in a three-year business plan because your solution may change.

Your ability to create revenue will be key.

  • But also, to pivot.

  • They will be interested in your value proposition.

  • They will want to know what differentiates you from other competitors and why people will buy your product over another.

  • What will interest them is your resilience, your ability to bounce back.

  • Not to mention your mindset. The fact that you won’t get discouraged at the slightest setback.

  • The grit you have when facing adversity, as challenges will surely mark your journey.

  • The authenticity of your approach. They’ll want to know that you’re not just in it for the money, let alone to show off.

  • The fact that you put your guts into it and that you are passionate about it. Because entrepreneurship is a leap of faith, a leap into the void.

  • They’ll want to make sure you are prepared for it because it’s not going to be a walk in the park.

  • They’ll want to know your background and why you got into it.

  • They’ll also want to know your family history.

  • And what you’re like in real life.

So a 5-year plan…. You can bet they won’t give a damn. Like their first pair of shoes.