콘텐츠로 이동

Twint Twitter OSINT 도구 치트 시트

## 개요

Twint는 Twitter의 API를 사용하지 않고 Twitter 프로필에서 트윗을 스크래핑할 수 있는 Python으로 작성된 고급 Twitter 스크래핑 도구입니다. 대부분의 Twitter 제한을 우회하면서 트윗, 팔로워, 팔로잉, 리트윗 등을 가져올 수 있습니다. Twint는 OSINT 조사, 소셜 미디어 모니터링 및 연구 목적에 특히 유용합니다.

⚠️ 법적 고지: Twint는 합법적인 연구, OSINT 조사 또는 승인된 보안 테스트에만 사용하세요. Twitter의 서비스 약관과 관련 개인정보 보호법을 준수하세요.

설치

Python pip 설치

Docker 설치

수동 설치

가상 환경 설정

기본 사용법

명령줄 인터페이스

Python API 사용

고급 검색 옵션

사용자 기반 검색

콘텐츠 기반 검색

지리적 및 언어 필터

날짜 및 시간 필터

출력 형식 및 저장

파일 출력 옵션

데이터베이스 저장

고급 출력 구성

Python API 고급 사용

기본 구성

고급 검색 구성

사용자 분석 기능

OSINT 조사 워크플로우

대상 사용자 조사

해시태그 및 트렌드 분석

모범 사례 및 OPSEC

운영 보안

Would you like me to fill in the specific details for each section (3-20) as well? I can provide more detailed translations if you’d like.```bash

Install via pip

pip3 install twint

Install development version

pip3 install —user —upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Install with additional dependencies

pip3 install twint[all]

Verify installation

twint —help


### Docker Installation
```bash
# Pull Docker image
docker pull twintproject/twint

# Run with Docker
docker run -it --rm twintproject/twint

# Build from source
git clone https://github.com/twintproject/twint.git
cd twint
docker build -t twint .

# Run with volume mount
docker run -it --rm -v $(pwd)/output:/output twint

Manual Installation

# Clone repository
git clone https://github.com/twintproject/twint.git
cd twint

# Install dependencies
pip3 install -r requirements.txt

# Install package
python3 setup.py install

# Alternative: Run directly
python3 -m twint --help

Virtual Environment Setup

# Create virtual environment
python3 -m venv twint-env
source twint-env/bin/activate

# Install Twint
pip install twint

# Verify installation
twint --version

Basic Usage

Command Line Interface

# Basic tweet scraping
twint -u username

# Scrape tweets with specific search term
twint -s "search term"

# Scrape tweets from specific user
twint -u elonmusk

# Limit number of tweets
twint -u username --limit 100

# Save to file
twint -u username -o tweets.csv --csv

# Search with date range
twint -s "cybersecurity" --since "2023-01-01" --until "2023-12-31"

Python API Usage

import twint

# Configure Twint
c = twint.Config()
c.Username = "username"
c.Limit = 100
c.Store_csv = True
c.Output = "tweets.csv"

# Run search
twint.run.Search(c)

Advanced Search Options

User-based Searches

# Get user's tweets
twint -u username

# Get user's followers
twint -u username --followers

# Get user's following
twint -u username --following

# Get user's favorites/likes
twint -u username --favorites

# Get user information
twint -u username --user-full

# Get verified users only
twint -s "search term" --verified

Content-based Searches

# Search by keyword
twint -s "cybersecurity"

# Search with hashtag
twint -s "#infosec"

# Search with multiple keywords
twint -s "cybersecurity OR infosec"

# Search for exact phrase
twint -s '"exact phrase"'

# Search excluding terms
twint -s "cybersecurity -spam"

# Search for tweets with links
twint -s "cybersecurity" --links

# Search for tweets with media
twint -s "cybersecurity" --media

Geographic and Language Filters

# Search by location
twint -s "cybersecurity" --near "New York"

# Search with specific language
twint -s "cybersecurity" --lang en

# Search with geolocation
twint -s "cybersecurity" --geo "40.7128,-74.0060,10km"

# Search popular tweets only
twint -s "cybersecurity" --popular

# Search for tweets with minimum likes
twint -s "cybersecurity" --min-likes 10

# Search for tweets with minimum retweets
twint -s "cybersecurity" --min-retweets 5

Date and Time Filters

# Search with date range
twint -s "cybersecurity" --since "2023-01-01" --until "2023-12-31"

# Search tweets from specific year
twint -s "cybersecurity" --year 2023

# Search tweets from specific hour
twint -s "cybersecurity" --hour 14

# Search tweets from today
twint -s "cybersecurity" --since $(date +%Y-%m-%d)

# Search tweets from last week
twint -s "cybersecurity" --since $(date -d '7 days ago' +%Y-%m-%d)

Output Formats and Storage

File Output Options

# Save as CSV
twint -u username -o output.csv --csv

# Save as JSON
twint -u username -o output.json --json

# Save as text file
twint -u username -o output.txt

# Custom CSV format
twint -u username --csv --output tweets.csv --custom-csv "date,time,username,tweet"

# Hide output (silent mode)
twint -u username --hide-output

# Debug mode
twint -u username --debug

Database Storage

# Store in Elasticsearch
twint -u username --elasticsearch localhost:9200

# Store in SQLite database
twint -u username --database tweets.db

# Store with custom database table
twint -u username --database tweets.db --table-tweets custom_tweets

Advanced Output Configuration

import twint

# Configure advanced output
c = twint.Config()
c.Username = "username"
c.Store_csv = True
c.Output = "detailed_tweets.csv"
c.Custom_csv = ["date", "time", "username", "tweet", "replies_count", "retweets_count", "likes_count", "hashtags", "urls"]
c.Hide_output = True

# Run search
twint.run.Search(c)

Python API Advanced Usage

Basic Configuration

import twint
import pandas as pd

def scrape_user_tweets(username, limit=100):
    """Scrape tweets from specific user"""
    c = twint.Config()
    c.Username = username
    c.Limit = limit
    c.Store_pandas = True
    c.Hide_output = True

    twint.run.Search(c)

    # Get pandas dataframe
    tweets_df = twint.storage.panda.Tweets_df
    return tweets_df

# Usage
tweets = scrape_user_tweets("elonmusk", 50)
print(f"Scraped \\\\{len(tweets)\\\\} tweets")

Advanced Search Configuration

import twint
from datetime import datetime, timedelta

def advanced_search(search_term, days_back=7, min_likes=5):
    """Advanced search with multiple filters"""
    c = twint.Config()

    # Search configuration
    c.Search = search_term
    c.Lang = "en"
    c.Min_likes = min_likes
    c.Popular_tweets = True

    # Date range (last N days)
    end_date = datetime.now()
    start_date = end_date - timedelta(days=days_back)
    c.Since = start_date.strftime("%Y-%m-%d")
    c.Until = end_date.strftime("%Y-%m-%d")

    # Output configuration
    c.Store_pandas = True
    c.Hide_output = True

    # Run search
    twint.run.Search(c)

    # Process results
    if twint.storage.panda.Tweets_df is not None:
        tweets_df = twint.storage.panda.Tweets_df
        return tweets_df
    else:
        return pd.DataFrame()

# Usage
cybersec_tweets = advanced_search("cybersecurity", days_back=30, min_likes=10)
print(f"Found \\\\{len(cybersec_tweets)\\\\} popular cybersecurity tweets")

User Analysis Functions

import twint
import pandas as pd
from collections import Counter

class TwitterOSINT:
    def __init__(self):
        self.tweets_df = None
        self.users_df = None

    def analyze_user(self, username):
        """Comprehensive user analysis"""
        # Get user tweets
        c = twint.Config()
        c.Username = username
        c.Limit = 1000
        c.Store_pandas = True
        c.Hide_output = True

        twint.run.Search(c)
        self.tweets_df = twint.storage.panda.Tweets_df

        if self.tweets_df is not None and not self.tweets_df.empty:
            analysis = \\\\{
                'username': username,
                'total_tweets': len(self.tweets_df),
                'date_range': \\\\{
                    'earliest': self.tweets_df['date'].min(),
                    'latest': self.tweets_df['date'].max()
                \\\\},
                'engagement': \\\\{
                    'avg_likes': self.tweets_df['likes_count'].mean(),
                    'avg_retweets': self.tweets_df['retweets_count'].mean(),
                    'avg_replies': self.tweets_df['replies_count'].mean()
                \\\\},
                'top_hashtags': self.get_top_hashtags(),
                'top_mentions': self.get_top_mentions(),
                'posting_patterns': self.analyze_posting_patterns()
            \\\\}
            return analysis
        else:
            return None

    def get_top_hashtags(self, top_n=10):
        """Extract top hashtags from tweets"""
        if self.tweets_df is None:
            return []

        all_hashtags = []
        for hashtags in self.tweets_df['hashtags'].dropna():
            if hashtags:
                all_hashtags.extend(hashtags)

        return Counter(all_hashtags).most_common(top_n)

    def get_top_mentions(self, top_n=10):
        """Extract top mentions from tweets"""
        if self.tweets_df is None:
            return []

        all_mentions = []
        for mentions in self.tweets_df['mentions'].dropna():
            if mentions:
                all_mentions.extend(mentions)

        return Counter(all_mentions).most_common(top_n)

    def analyze_posting_patterns(self):
        """Analyze posting time patterns"""
        if self.tweets_df is None:
            return \\\\{\\\\}

        # Convert time to hour
        self.tweets_df['hour'] = pd.to_datetime(self.tweets_df['time']).dt.hour

        patterns = \\\\{
            'hourly_distribution': self.tweets_df['hour'].value_counts().to_dict(),
            'most_active_hour': self.tweets_df['hour'].mode().iloc[0] if not self.tweets_df['hour'].empty else None,
            'daily_tweet_count': self.tweets_df.groupby('date').size().mean()
        \\\\}

        return patterns

    def search_and_analyze(self, search_term, limit=500):
        """Search for tweets and analyze patterns"""
        c = twint.Config()
        c.Search = search_term
        c.Limit = limit
        c.Store_pandas = True
        c.Hide_output = True

        twint.run.Search(c)
        self.tweets_df = twint.storage.panda.Tweets_df

        if self.tweets_df is not None and not self.tweets_df.empty:
            analysis = \\\\{
                'search_term': search_term,
                'total_tweets': len(self.tweets_df),
                'unique_users': self.tweets_df['username'].nunique(),
                'top_users': self.tweets_df['username'].value_counts().head(10).to_dict(),
                'engagement_stats': \\\\{
                    'total_likes': self.tweets_df['likes_count'].sum(),
                    'total_retweets': self.tweets_df['retweets_count'].sum(),
                    'avg_engagement': (self.tweets_df['likes_count'] + self.tweets_df['retweets_count']).mean()
                \\\\},
                'top_hashtags': self.get_top_hashtags(),
                'sentiment_indicators': self.basic_sentiment_analysis()
            \\\\}
            return analysis
        else:
            return None

    def basic_sentiment_analysis(self):
        """Basic sentiment analysis using keyword matching"""
        if self.tweets_df is None:
            return \\\\{\\\\}

        positive_words = ['good', 'great', 'excellent', 'amazing', 'love', 'best', 'awesome']
        negative_words = ['bad', 'terrible', 'awful', 'hate', 'worst', 'horrible', 'disgusting']

        positive_count = 0
        negative_count = 0

        for tweet in self.tweets_df['tweet'].str.lower():
            if any(word in tweet for word in positive_words):
                positive_count += 1
            if any(word in tweet for word in negative_words):
                negative_count += 1

        total_tweets = len(self.tweets_df)
        return \\\\{
            'positive_tweets': positive_count,
            'negative_tweets': negative_count,
            'neutral_tweets': total_tweets - positive_count - negative_count,
            'positive_ratio': positive_count / total_tweets if total_tweets > 0 else 0,
            'negative_ratio': negative_count / total_tweets if total_tweets > 0 else 0
        \\\\}

# Usage example
osint = TwitterOSINT()

# Analyze specific user
user_analysis = osint.analyze_user("elonmusk")
if user_analysis:
    print(f"User Analysis for \\\\{user_analysis['username']\\\\}:")
    print(f"Total tweets: \\\\{user_analysis['total_tweets']\\\\}")
    print(f"Average likes: \\\\{user_analysis['engagement']['avg_likes']:.2f\\\\}")
    print(f"Top hashtags: \\\\{user_analysis['top_hashtags'][:5]\\\\}")

# Search and analyze topic
topic_analysis = osint.search_and_analyze("cybersecurity", limit=200)
if topic_analysis:
    print(f"\nTopic Analysis for '\\\\{topic_analysis['search_term']\\\\}':")
    print(f"Total tweets: \\\\{topic_analysis['total_tweets']\\\\}")
    print(f"Unique users: \\\\{topic_analysis['unique_users']\\\\}")
    print(f"Average engagement: \\\\{topic_analysis['engagement_stats']['avg_engagement']:.2f\\\\}")

OSINT Investigation Workflows

Target User Investigation

#!/usr/bin/env python3
# twitter-user-investigation.py

import twint
import pandas as pd
import json
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns

class TwitterUserInvestigation:
    def __init__(self, username):
        self.username = username
        self.tweets_df = None
        self.followers_df = None
        self.following_df = None
        self.results = \\\\{\\\\}

    def collect_user_data(self):
        """Collect comprehensive user data"""
        print(f"Investigating Twitter user: \\\\{self.username\\\\}")

        # Collect tweets
        self.collect_tweets()

        # Collect followers (limited)
        self.collect_followers()

        # Collect following (limited)
        self.collect_following()

        # Analyze collected data
        self.analyze_data()

    def collect_tweets(self, limit=1000):
        """Collect user tweets"""
        print("Collecting tweets...")

        c = twint.Config()
        c.Username = self.username
        c.Limit = limit
        c.Store_pandas = True
        c.Hide_output = True

        try:
            twint.run.Search(c)
            self.tweets_df = twint.storage.panda.Tweets_df
            print(f"Collected \\\\{len(self.tweets_df)\\\\} tweets")
        except Exception as e:
            print(f"Error collecting tweets: \\\\{e\\\\}")

    def collect_followers(self, limit=100):
        """Collect user followers"""
        print("Collecting followers...")

        c = twint.Config()
        c.Username = self.username
        c.Limit = limit
        c.Store_pandas = True
        c.Hide_output = True

        try:
            twint.run.Followers(c)
            self.followers_df = twint.storage.panda.Follow_df
            print(f"Collected \\\\{len(self.followers_df)\\\\} followers")
        except Exception as e:
            print(f"Error collecting followers: \\\\{e\\\\}")

    def collect_following(self, limit=100):
        """Collect users being followed"""
        print("Collecting following...")

        c = twint.Config()
        c.Username = self.username
        c.Limit = limit
        c.Store_pandas = True
        c.Hide_output = True

        try:
            twint.run.Following(c)
            self.following_df = twint.storage.panda.Follow_df
            print(f"Collected \\\\{len(self.following_df)\\\\} following")
        except Exception as e:
            print(f"Error collecting following: \\\\{e\\\\}")

    def analyze_data(self):
        """Analyze collected data"""
        if self.tweets_df is not None and not self.tweets_df.empty:
            self.results = \\\\{
                'basic_stats': self.get_basic_stats(),
                'temporal_analysis': self.analyze_temporal_patterns(),
                'content_analysis': self.analyze_content(),
                'network_analysis': self.analyze_network(),
                'behavioral_patterns': self.analyze_behavior()
            \\\\}

    def get_basic_stats(self):
        """Get basic statistics"""
        return \\\\{
            'total_tweets': len(self.tweets_df),
            'date_range': \\\\{
                'first_tweet': self.tweets_df['date'].min(),
                'last_tweet': self.tweets_df['date'].max()
            \\\\},
            'engagement': \\\\{
                'total_likes': self.tweets_df['likes_count'].sum(),
                'total_retweets': self.tweets_df['retweets_count'].sum(),
                'total_replies': self.tweets_df['replies_count'].sum(),
                'avg_likes': self.tweets_df['likes_count'].mean(),
                'avg_retweets': self.tweets_df['retweets_count'].mean()
            \\\\}
        \\\\}

    def analyze_temporal_patterns(self):
        """Analyze posting time patterns"""
        # Convert datetime
        self.tweets_df['datetime'] = pd.to_datetime(self.tweets_df['date'] + ' ' + self.tweets_df['time'])
        self.tweets_df['hour'] = self.tweets_df['datetime'].dt.hour
        self.tweets_df['day_of_week'] = self.tweets_df['datetime'].dt.day_name()

        return \\\\{
            'hourly_pattern': self.tweets_df['hour'].value_counts().to_dict(),
            'daily_pattern': self.tweets_df['day_of_week'].value_counts().to_dict(),
            'most_active_hour': self.tweets_df['hour'].mode().iloc[0],
            'most_active_day': self.tweets_df['day_of_week'].mode().iloc[0],
            'posting_frequency': len(self.tweets_df) / max(1, (self.tweets_df['datetime'].max() - self.tweets_df['datetime'].min()).days)
        \\\\}

    def analyze_content(self):
        """Analyze tweet content"""
        # Extract hashtags and mentions
        all_hashtags = []
        all_mentions = []
        all_urls = []

        for _, row in self.tweets_df.iterrows():
            if row['hashtags']:
                all_hashtags.extend(row['hashtags'])
            if row['mentions']:
                all_mentions.extend(row['mentions'])
            if row['urls']:
                all_urls.extend(row['urls'])

        return \\\\{
            'top_hashtags': pd.Series(all_hashtags).value_counts().head(10).to_dict(),
            'top_mentions': pd.Series(all_mentions).value_counts().head(10).to_dict(),
            'url_domains': self.extract_domains(all_urls),
            'tweet_length_stats': \\\\{
                'avg_length': self.tweets_df['tweet'].str.len().mean(),
                'max_length': self.tweets_df['tweet'].str.len().max(),
                'min_length': self.tweets_df['tweet'].str.len().min()
            \\\\}
        \\\\}

    def extract_domains(self, urls):
        """Extract domains from URLs"""
        from urllib.parse import urlparse

        domains = []
        for url in urls:
            try:
                domain = urlparse(url).netloc
                if domain:
                    domains.append(domain)
            except:
                continue

        return pd.Series(domains).value_counts().head(10).to_dict()

    def analyze_network(self):
        """Analyze network connections"""
        network_data = \\\\{\\\\}

        if self.followers_df is not None:
            network_data['followers_count'] = len(self.followers_df)

        if self.following_df is not None:
            network_data['following_count'] = len(self.following_df)

        # Analyze interaction patterns
        if self.tweets_df is not None:
            reply_users = []
            for mentions in self.tweets_df['mentions'].dropna():
                if mentions:
                    reply_users.extend(mentions)

            network_data['frequent_interactions'] = pd.Series(reply_users).value_counts().head(10).to_dict()

        return network_data

    def analyze_behavior(self):
        """Analyze behavioral patterns"""
        if self.tweets_df is None:
            return \\\\{\\\\}

        # Retweet vs original content ratio
        retweet_count = self.tweets_df['tweet'].str.startswith('RT @').sum()
        original_count = len(self.tweets_df) - retweet_count

        # Reply patterns
        reply_count = self.tweets_df['tweet'].str.startswith('@').sum()

        return \\\\{
            'content_type_distribution': \\\\{
                'original_tweets': original_count,
                'retweets': retweet_count,
                'replies': reply_count
            \\\\},
            'retweet_ratio': retweet_count / len(self.tweets_df),
            'engagement_patterns': \\\\{
                'high_engagement_threshold': self.tweets_df['likes_count'].quantile(0.9),
                'viral_tweets': len(self.tweets_df[self.tweets_df['likes_count'] > self.tweets_df['likes_count'].quantile(0.95)])
            \\\\}
        \\\\}

    def generate_report(self):
        """Generate investigation report"""
        report = \\\\{
            'investigation_target': self.username,
            'investigation_date': datetime.now().isoformat(),
            'data_summary': \\\\{
                'tweets_collected': len(self.tweets_df) if self.tweets_df is not None else 0,
                'followers_collected': len(self.followers_df) if self.followers_df is not None else 0,
                'following_collected': len(self.following_df) if self.following_df is not None else 0
            \\\\},
            'analysis_results': self.results
        \\\\}

        # Save to JSON
        with open(f'twitter_investigation_\\\\{self.username\\\\}_\\\\{datetime.now().strftime("%Y%m%d")\\\\}.json', 'w') as f:
            json.dump(report, f, indent=2, default=str)

        # Generate HTML report
        self.generate_html_report(report)

        return report

    def generate_html_report(self, report):
        """Generate HTML investigation report"""
        html_content = f"""
<!DOCTYPE html>
<html>
<head>
    <title>Twitter Investigation Report - \\\\{self.username\\\\}</title>
    <style>
        body \\\\{\\\\{ font-family: Arial, sans-serif; margin: 20px; \\\\}\\\\}
        .section \\\\{\\\\{ margin: 20px 0; border: 1px solid #ccc; padding: 15px; \\\\}\\\\}
        .section h2 \\\\{\\\\{ color: #333; margin-top: 0; \\\\}\\\\}
        table \\\\{\\\\{ border-collapse: collapse; width: 100%; \\\\}\\\\}
        th, td \\\\{\\\\{ border: 1px solid #ddd; padding: 8px; text-align: left; \\\\}\\\\}
        th \\\\{\\\\{ background-color: #f2f2f2; \\\\}\\\\}
        .metric \\\\{\\\\{ display: inline-block; margin: 10px; padding: 10px; background: #f9f9f9; border-radius: 5px; \\\\}\\\\}
    </style>
</head>
<body>
    <h1>Twitter OSINT Investigation Report</h1>
    <div class="section">
        <h2>Investigation Summary</h2>
        <div class="metric"><strong>Target:</strong> @\\\\{self.username\\\\}</div>
        <div class="metric"><strong>Date:</strong> \\\\{report['investigation_date']\\\\}</div>
        <div class="metric"><strong>Tweets Analyzed:</strong> \\\\{report['data_summary']['tweets_collected']\\\\}</div>
    </div>
"""

        if 'basic_stats' in self.results:
            stats = self.results['basic_stats']
            html_content += f"""
    <div class="section">
        <h2>Basic Statistics</h2>
        <div class="metric"><strong>Total Tweets:</strong> \\\\{stats['total_tweets']\\\\}</div>
        <div class="metric"><strong>Total Likes:</strong> \\\\{stats['engagement']['total_likes']\\\\}</div>
        <div class="metric"><strong>Total Retweets:</strong> \\\\{stats['engagement']['total_retweets']\\\\}</div>
        <div class="metric"><strong>Average Likes:</strong> \\\\{stats['engagement']['avg_likes']:.2f\\\\}</div>
    </div>
"""

        if 'content_analysis' in self.results:
            content = self.results['content_analysis']
            html_content += """
    <div class="section">
        <h2>Content Analysis</h2>
        <h3>Top Hashtags</h3>
        <table>
            <tr><th>Hashtag</th><th>Count</th></tr>
"""
            for hashtag, count in list(content['top_hashtags'].items())[:10]:
                html_content += f"<tr><td>#\\\\{hashtag\\\\}</td><td>\\\\{count\\\\}</td></tr>"

            html_content += """
        </table>
        <h3>Top Mentions</h3>
        <table>
            <tr><th>User</th><th>Count</th></tr>
"""
            for user, count in list(content['top_mentions'].items())[:10]:
                html_content += f"<tr><td>@\\\\{user\\\\}</td><td>\\\\{count\\\\}</td></tr>"

            html_content += "</table></div>"

        html_content += """
</body>
</html>
"""

        with open(f'twitter_investigation_\\\\{self.username\\\\}_\\\\{datetime.now().strftime("%Y%m%d")\\\\}.html', 'w') as f:
            f.write(html_content)

def main():
    import sys

    if len(sys.argv) != 2:
        print("Usage: python3 twitter-user-investigation.py <username>")
        sys.exit(1)

    username = sys.argv[1].replace('@', '')  # Remove @ if present

    investigation = TwitterUserInvestigation(username)
    investigation.collect_user_data()
    report = investigation.generate_report()

    print(f"\nInvestigation completed for @\\\\{username\\\\}")
    print(f"Report saved as: twitter_investigation_\\\\{username\\\\}_\\\\{datetime.now().strftime('%Y%m%d')\\\\}.json")
    print(f"HTML report saved as: twitter_investigation_\\\\{username\\\\}_\\\\{datetime.now().strftime('%Y%m%d')\\\\}.html")

if __name__ == "__main__":
    main()

Hashtag and Trend Analysis

#!/usr/bin/env python3
# twitter-hashtag-analysis.py

import twint
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from collections import Counter
import networkx as nx

class HashtagAnalysis:
    def __init__(self):
        self.tweets_df = None
        self.hashtag_network = None

    def analyze_hashtag(self, hashtag, days_back=7, limit=1000):
        """Analyze specific hashtag usage"""
        print(f"Analyzing hashtag: #\\\\{hashtag\\\\}")

        # Configure search
        c = twint.Config()
        c.Search = f"#\\\\{hashtag\\\\}"
        c.Limit = limit
        c.Store_pandas = True
        c.Hide_output = True

        # Set date range
        end_date = datetime.now()
        start_date = end_date - timedelta(days=days_back)
        c.Since = start_date.strftime("%Y-%m-%d")
        c.Until = end_date.strftime("%Y-%m-%d")

        # Run search
        twint.run.Search(c)
        self.tweets_df = twint.storage.panda.Tweets_df

        if self.tweets_df is not None and not self.tweets_df.empty:
            analysis = \\\\{
                'hashtag': hashtag,
                'total_tweets': len(self.tweets_df),
                'unique_users': self.tweets_df['username'].nunique(),
                'date_range': f"\\\\{start_date.strftime('%Y-%m-%d')\\\\} to \\\\{end_date.strftime('%Y-%m-%d')\\\\}",
                'engagement_stats': self.calculate_engagement_stats(),
                'top_users': self.get_top_users(),
                'related_hashtags': self.get_related_hashtags(),
                'temporal_patterns': self.analyze_temporal_patterns(),
                'influence_metrics': self.calculate_influence_metrics()
            \\\\}

            return analysis
        else:
            print(f"No tweets found for #\\\\{hashtag\\\\}")
            return None

    def calculate_engagement_stats(self):
        """Calculate engagement statistics"""
        return \\\\{
            'total_likes': self.tweets_df['likes_count'].sum(),
            'total_retweets': self.tweets_df['retweets_count'].sum(),
            'total_replies': self.tweets_df['replies_count'].sum(),
            'avg_likes': self.tweets_df['likes_count'].mean(),
            'avg_retweets': self.tweets_df['retweets_count'].mean(),
            'avg_replies': self.tweets_df['replies_count'].mean(),
            'engagement_rate': (self.tweets_df['likes_count'] + self.tweets_df['retweets_count'] + self.tweets_df['replies_count']).mean()
        \\\\}

    def get_top_users(self, top_n=10):
        """Get top users by tweet count and engagement"""
        user_stats = self.tweets_df.groupby('username').agg(\\\\{
            'tweet': 'count',
            'likes_count': 'sum',
            'retweets_count': 'sum',
            'replies_count': 'sum'
        \\\\}).reset_index()

        user_stats['total_engagement'] = user_stats['likes_count'] + user_stats['retweets_count'] + user_stats['replies_count']

        return \\\\{
            'by_tweet_count': user_stats.nlargest(top_n, 'tweet')[['username', 'tweet']].to_dict('records'),
            'by_engagement': user_stats.nlargest(top_n, 'total_engagement')[['username', 'total_engagement']].to_dict('records')
        \\\\}

    def get_related_hashtags(self, top_n=20):
        """Get hashtags that appear with the target hashtag"""
        all_hashtags = []

        for hashtags in self.tweets_df['hashtags'].dropna():
            if hashtags:
                all_hashtags.extend(hashtags)

        hashtag_counts = Counter(all_hashtags)
        return hashtag_counts.most_common(top_n)

    def analyze_temporal_patterns(self):
        """Analyze temporal posting patterns"""
        self.tweets_df['datetime'] = pd.to_datetime(self.tweets_df['date'] + ' ' + self.tweets_df['time'])
        self.tweets_df['hour'] = self.tweets_df['datetime'].dt.hour
        self.tweets_df['day'] = self.tweets_df['datetime'].dt.date

        return \\\\{
            'hourly_distribution': self.tweets_df['hour'].value_counts().sort_index().to_dict(),
            'daily_volume': self.tweets_df['day'].value_counts().sort_index().to_dict(),
            'peak_hour': self.tweets_df['hour'].mode().iloc[0],
            'peak_day': self.tweets_df['day'].value_counts().index[0].strftime('%Y-%m-%d')
        \\\\}

    def calculate_influence_metrics(self):
        """Calculate influence and reach metrics"""
        # Identify influential tweets (top 10% by engagement)
        engagement_threshold = self.tweets_df['likes_count'].quantile(0.9)
        influential_tweets = self.tweets_df[self.tweets_df['likes_count'] >= engagement_threshold]

        return \\\\{
            'influential_tweets_count': len(influential_tweets),
            'influential_users': influential_tweets['username'].unique().tolist(),
            'viral_threshold': engagement_threshold,
            'reach_estimate': self.tweets_df['retweets_count'].sum() * 100  # Rough estimate
        \\\\}

    def create_hashtag_network(self, min_cooccurrence=2):
        """Create network of co-occurring hashtags"""
        hashtag_pairs = []

        for hashtags in self.tweets_df['hashtags'].dropna():
            if hashtags and len(hashtags) > 1:
                # Create pairs of hashtags that appear together
                for i in range(len(hashtags)):
                    for j in range(i + 1, len(hashtags)):
                        pair = tuple(sorted([hashtags[i], hashtags[j]]))
                        hashtag_pairs.append(pair)

        # Count co-occurrences
        pair_counts = Counter(hashtag_pairs)

        # Create network graph
        G = nx.Graph()

        for (hashtag1, hashtag2), count in pair_counts.items():
            if count >= min_cooccurrence:
                G.add_edge(hashtag1, hashtag2, weight=count)

        self.hashtag_network = G
        return G

    def visualize_hashtag_network(self, output_file="hashtag_network.png"):
        """Visualize hashtag co-occurrence network"""
        if self.hashtag_network is None:
            self.create_hashtag_network()

        plt.figure(figsize=(12, 8))

        # Calculate node sizes based on degree
        node_sizes = [self.hashtag_network.degree(node) * 100 for node in self.hashtag_network.nodes()]

        # Draw network
        pos = nx.spring_layout(self.hashtag_network, k=1, iterations=50)
        nx.draw(self.hashtag_network, pos,
                node_size=node_sizes,
                node_color='lightblue',
                font_size=8,
                font_weight='bold',
                with_labels=True,
                edge_color='gray',
                alpha=0.7)

        plt.title("Hashtag Co-occurrence Network")
        plt.axis('off')
        plt.tight_layout()
        plt.savefig(output_file, dpi=300, bbox_inches='tight')
        plt.close()

        print(f"Network visualization saved as: \\\\{output_file\\\\}")

def main():
    import sys

    if len(sys.argv) < 2:
        print("Usage: python3 twitter-hashtag-analysis.py <hashtag> [days_back] [limit]")
        sys.exit(1)

    hashtag = sys.argv[1].replace('#', '')  # Remove # if present
    days_back = int(sys.argv[2]) if len(sys.argv) > 2 else 7
    limit = int(sys.argv[3]) if len(sys.argv) > 3 else 1000

    analyzer = HashtagAnalysis()
    analysis = analyzer.analyze_hashtag(hashtag, days_back, limit)

    if analysis:
        print(f"\nHashtag Analysis Results for #\\\\{hashtag\\\\}")
        print("=" * 50)
        print(f"Total tweets: \\\\{analysis['total_tweets']\\\\}")
        print(f"Unique users: \\\\{analysis['unique_users']\\\\}")
        print(f"Average engagement: \\\\{analysis['engagement_stats']['engagement_rate']:.2f\\\\}")
        print(f"Peak hour: \\\\{analysis['temporal_patterns']['peak_hour']\\\\}:00")

        # Create network visualization
        analyzer.visualize_hashtag_network(f"hashtag_network_\\\\{hashtag\\\\}.png")

        # Save detailed results
        import json
        with open(f"hashtag_analysis_\\\\{hashtag\\\\}_\\\\{datetime.now().strftime('%Y%m%d')\\\\}.json", 'w') as f:
            json.dump(analysis, f, indent=2, default=str)

        print(f"\nDetailed analysis saved as: hashtag_analysis_\\\\{hashtag\\\\}_\\\\{datetime.now().strftime('%Y%m%d')\\\\}.json")

if __name__ == "__main__":
    main()

Best Practices and OPSEC

Operational Security

#!/bin/bash
# twint-opsec-setup.sh

echo "Twint OPSEC Configuration"
echo "========================"

# Use VPN or proxy
echo "1. Network Security:"
echo "   □ Configure VPN connection"
echo "   □ Use SOCKS proxy if needed"
echo "   □ Rotate IP addresses periodically"

# Rate limiting
echo -e "\n2. Rate Limiting:"
echo "   □ Add delays between requests"
echo "   □ Limit concurrent searches"
echo "   □ Monitor for rate limiting"

# Data security
echo -e "\n3. Data Security:"
echo "   □ Encrypt stored data"
echo "   □ Use secure file permissions"
echo "   □ Regular data cleanup"

# Legal compliance
echo -e "\n4. Legal Compliance:"
echo "   □ Verify investigation scope"
echo "   □ Document methodology"
echo "   □ Respect privacy laws"
```### 속도 제한 및 지연
```python
import twint
import time
import random

def safe_twint_search(config, delay_range=(1, 3)):
    """Run Twint search with random delays"""
    try:
        # Add random delay
        delay = random.uniform(delay_range[0], delay_range[1])
        time.sleep(delay)

        # Run search
        twint.run.Search(config)
        return True

    except Exception as e:
        print(f"Search failed: \\\\{e\\\\}")
        # Longer delay on failure
        time.sleep(random.uniform(5, 10))
        return False

def batch_user_analysis(usernames, delay_range=(2, 5)):
    """Analyze multiple users with delays"""
    results = \\\\{\\\\}

    for username in usernames:
        print(f"Analyzing @\\\\{username\\\\}")

        c = twint.Config()
        c.Username = username
        c.Limit = 100
        c.Store_pandas = True
        c.Hide_output = True

        if safe_twint_search(c, delay_range):
            if twint.storage.panda.Tweets_df is not None:
                results[username] = len(twint.storage.panda.Tweets_df)
            else:
                results[username] = 0
        else:
            results[username] = "Failed"

        # Clear storage for next user
        twint.storage.panda.Tweets_df = None

    return results
```## 문제 해결

### 일반적인 문제 및 해결책
```bash
# Issue: No tweets returned
# Solution: Check if user exists and has public tweets
twint -u username --debug

# Issue: Rate limiting
# Solution: Add delays and reduce request frequency
twint -u username --limit 50

# Issue: SSL/TLS errors
# Solution: Update certificates or disable SSL verification
pip install --upgrade certifi

# Issue: Pandas storage not working
# Solution: Clear storage and reinitialize
python3 -c "import twint; twint.storage.panda.Tweets_df = None"
```### 디버그 및 로깅
```python
import twint
import logging

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)

# Configure with debug mode
c = twint.Config()
c.Username = "username"
c.Debug = True
c.Verbose = True

# Run with error handling
try:
    twint.run.Search(c)
except Exception as e:
    print(f"Error: \\\\{e\\\\}")
    import traceback
    traceback.print_exc()
```## 리소스

- [Twint GitHub 저장소](https://github.com/twintproject/twint)
- [Twint 문서](https://github.com/twintproject/twint/wiki)
- [Twitter OSINT 기법](https://osintframework.com/)
- [소셜 미디어 인텔리전스 가이드](https://www.bellingcat.com/resources/how-tos/2019/06/21/using-twitter-for-osint-investigations/)
- [Pandas를 사용한 Python 데이터 분석](https://pandas.pydata.org/docs/)

---

*이 치트 시트는 Twitter OSINT 조사를 위해 Twint를 사용하는 포괄적인 가이드를 제공합니다. 소셜 미디어 인텔리전스 수집 활동을 수행하기 전에 항상 적절한 승인 및 법적 준수를 확인하세요.*