Chuyển đến nội dung chính

Convert HTML Entities to normal text

import htmlentitydefs as html
import re

def unescapeHTML(str):
    str = re.sub(r"<.+?>|</.+?>", '', str)
    str = re.sub(r'&#(\d+);', lambda m: unichr(int(m.group(1))).encode('utf-8'), str)
    str = re.sub(r'&(\w+);', lambda m: unichr(html.name2codepoint[m.group(1)]).encode('utf-8'), str)

    return str

======================================
#using beautifulsoup
import re, copy
from BeautifulSoup import BeautifulSoup

hexentityMassage = copy.copy(BeautifulSoup.MARKUP_MASSAGE)
hexentityMassage = [(re.compile('&#x([^;]+);'), lambda m: '&#%d' % int(m.group(1), 16))]

def unescapeHTML2(str):
    str = re.sub(r"<.+?>|</.+?>", '', str)
    try:
        return BeautifulSoup(str, convertEntities=BeautifulSoup.HTML_ENTITIES, markupMassage=hexentityMassage).contents[0].string
    except:
        return str

Nhận xét

Bài đăng phổ biến từ blog này

Xcode 8 support devices ios 7

Tải XCode 5.1.1 về, copy 2 folder 7.0 & 7.1 trong thư mục /Volumes/Xcode/ Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/DeviceSupport sang thư mục DeviceSupport của phiên bản XCode 8 (hiện tại) Mở file SDKSetting.plist trong thư mục /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk Thêm giá trị 7.0 & 7.1 vào key Root->DefaultProperties->DEPLOYMENT_TARGET_SUGGESTED_VALUE Thoát XCode & khởi động lại XCode

Change Timezone in CentOS | Thay đổi múi giờ trong CentOS

Timezone of VietNam is Asia/Ho_Chi_Minh. To change default timezone in centos, follow this struction.   mv /etc/localtime /etc/localtime.bak ln -s /usr/share/zoneinfo/Asia/Ho_Chi_Minh /etc/localtime Type date in terminal and you will see time in Viet Nam In Linux, the system time zone is determined by the symbolic link /etc/localtime . This link points to a time zone data file that describes the local time zone. The time zone data files are located at either /usr/lib/zoneinfo or /usr/share/zoneinfo depending on what distribution of Linux you use. For example, on a SuSE system located in New Jersey the /etc/localtime link would point to /usr/share/zoneinfo/US/Eastern . On a Debian system the /etc/localtime link would point to /usr/lib/zoneinfo/US/Eastern . If you fail to find the zoneinfo directory in either the /usr/lib or /usr/share directories, either do a find /usr -print | grep zoneinfo or consult your distribution's do...

Windows 10 enable long file/folder path

Windows 10 enable long file/folder path Registry Import Enabling Long Paths Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\FileSystem] "LongPathsEnabled"=dword:00000001 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem] "LongPathsEnabled"=dword:00000001 Enable via group policy editor Local Computer Policy -> Computer Configuration -> Administrative Templates -> System -> FileSystem -> Double click Enable Win32 long paths and select Enabled